2018
Report  Open Access

Revisiting distributional correspondence indexing: a Python reimplementation and new experiments

Moreo Fernandez A. D., Esuli A., Sebastiani F.

Distributional correspondence indexing 

This paper introduces PyDCI, a new implementation of Distributional Correspondence Indexing (DCI) written in Python. DCI is a transfer learning method for cross-domain and cross-lingual text classification for which we had provided an implementation (here called JaDCI) built on top of JaTeCS, a Java framework for text classification. PyDCI is a stand-alone version of DCI that exploits scikit-learn and the SciPy stack. We here report on new experiments that we have carried out in order to test PyDCI, and in which we use as baselines new high-performing methods that have appeared after DCI was originally proposed. These experiments show that, thanks to a few subtle ways in which we have improved DCI, PyDCI outperforms both JaDCI and the above-mentioned high-performing methods, and delivers the best known results on the two popular benchmarks on which we had tested DCI, i.e., MultiDomainSentiment (a.k.a. MDS -- for cross-domain adaptation) and Webis-CLS-10 (for cross-lingual adaptation). PyDCI, together with the code allowing to replicate our experiments, is available at https://github.com/AlexMoreo/pydci

Source: Research report, 2018



Back to previous page
BibTeX entry
@techreport{oai:it.cnr:prodotti:401240,
	title = {Revisiting distributional correspondence indexing: a Python reimplementation and new experiments},
	author = {Moreo Fernandez A. D. and Esuli A. and Sebastiani F.},
	institution = {Research report, 2018},
	year = {2018}
}
CNR ExploRA

Bibliographic record

ISTI Repository

Preprint version Open Access

Also available from

arxiv.orgOpen Access