2019
Journal article  Open Access

Building automated survey coders via interactive machine learning

Esuli A., Moreo Fernandez A. D., Sebastiani F.

Computer Science - Machine Learning  Information Retrieval (cs.IR)  FOS: Computer and information sciences  Computer Science - Information Retrieval  Machine learning  Business and International Management  Survey coding  Machine Learning (cs.LG)  Economics and Econometrics  Text classification  Marketing 

Software systems trained via machine learning to automatically classify open-ended answers (a.k.a. verbatims) are by now a reality. Still, their adoption in the survey coding industry has been less widespread than it might have been. Among the factors that have hindered a more massive takeup of this technology are the effort involved in manually coding a sufficient amount of training data, the fact that small studies do not seem to justify this effort, and the fact that the process needs to be repeated anew when brand new coding tasks arise. In this article, we will argue for an approach to building verbatim classifiers that we will call 'Interactive Learning,' and that addresses all the above problems. We will show that, for the same amount of training effort, interactive learning delivers much better coding accuracy than standard "non-interactive" learning. This is especially true when the amount of data we are willing to manually code is small, which makes this approach attractive also for small-scale studies. Interactive learning also lends itself to reusing previously trained classifiers for dealing with new (albeit related) coding tasks. Interactive learning also integrates better in the daily workflow of the survey specialist and delivers a better user experience overall.

Source: International journal of market research 61 (2019): 1–22. doi:10.1177/1470785318824244

Publisher: NTC Publications., Henley-on-Thames, Regno Unito


Angluin, D.: 1988, `Queries and Concept Learning'. Machine Learning 2(4), 319{342.
Auer, P.: 2011, `Online Learning'. In: C. Sammut and G. I. Webb (eds.): Encyclopedia of Machine Learning. Heidelberg, DE: Springer, pp. 736{743.
Baek, Y. M., J. N. Cappella, and A. Bindman: 2011, `Automating Content Analysis of OpenEnded Responses: Wordscores and A ective Intonation'. Communication Methods and Measures 5(4), 275{296.
Berardi, G., A. Esuli, and F. Sebastiani: 2014, `Optimising human inspection work in automated verbatim coding'. International Journal of Market Research 56(4), 489{512.
Clarke, F. R. and S. Brooker: 2011, `Use of Machine Learning for Automated Survey Coding'. In: Proceedings of the 58th ISI World Statistics Congress. Dublin, IE.
Cohn, D.: 2011, `Active Learning'. In: C. Sammut and G. I. Webb (eds.): Encyclopedia of Machine Learning. Heidelberg, DE: Springer, pp. 10{14.
Crammer, K., O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer: 2006, `Online PassiveAggressive Algorithms'. Journal of Machine Learning Research 7, 551{585.
de Vaus, D.: 2014, Surveys in social research. New York, NY: Routledge, 6th edition.
Esuli, A. and F. Sebastiani: 2010, `Machines that Learn how to Code Open-Ended Survey Data'. International Journal of Market Research 52(6), 775{800.
Gamon, M.: 2004, `Sentiment classi cation on customer feedback data: Noisy data, large feature vectors, and the role of linguistic analysis'. In: Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004). Geneva, CH, pp. 841{847.
Giorgetti, D., I. Prodanof, and F. Sebastiani: 2003, `Automatic Coding of Open-Ended Surveys Using Text Categorization Techniques'. In: Proceedings of the 4th International Conference of the Association for Survey Computing (ASCIC 2003). Warwick, UK, pp. 173{184.
Giorgetti, D. and F. Sebastiani: 2003, `Automating Survey Coding by Multiclass Text Categorization Techniques'. Journal of the American Society for Information Science and Technology 54(14), 1269{1277.
Lewis, D. D. and W. A. Gale: 1994, `A sequential algorithm for training text classi ers'. In: Proceedings of the 17th ACM International Conference on Research and Development in Information Retrieval (SIGIR 1994). Dublin, IE, pp. 3{12.
Macchia, S. and M. Murgia: 2002, `Coding of textual responses: Various issues on automated coding and computer assisted coding'. In: Proceedings of the 6th International Conference on the Statistical Analysis of Textual Data (JADT 02). St-Malo, FR, pp. 471{482.
Macer, T., M. Pearson, and F. Sebastiani: 2007, `Cracking the Code: What customers say, in their own words'. In: Proceedings of the 50th Annual Conference of the Market Research Society (MRS 2007). Brighton, UK.
Mantecon, J. G., H. A. Ghavidel, A. Zouaq, J. Jovanovic, and J. McDonald: 2018, `A Comparison of Features for the Automatic Labeling of Student Answers to Open-ended Questions'. In: Proceedings of the 11th International Conference on Educational Data Mining (EDM 2018). Bu alo, US.
Murphy, K. P.: 2012, Machine learning. A probabilistic perspective. Cambridge, US: The MIT Press.
Oza, N. C. and S. J. Russell: 2001, `Online Bagging and Boosting'. In: Proceedings of the 8th International Workshop on Arti cial Intelligence and Statistics (AISTATS 2001). Key West, US.
Patil, S. and G. K. Palshikar: 2013, `SurveyCoder: A System for Classi cation of Survey Responses'. In: Proceedings of the 18th International Conference on Applications of Natural Language Processing to Information Systems (NLDB 2013). Salford, UK, pp. 417{420.
Patil, S. and B. Ravindran: 2015, `Active Learning Based Weak Supervision for Textual Survey Response Classi cation'. In: Proceedings of the 16th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing 2015). Cairo, EG, pp. 309|320.
Rosenblatt, F.: 1958, `The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain'. Psychological Reviews 65(6), 386{408.
Schapire, R. E. and Y. Singer: 1999, `Improved boosting algorithms using con dence-rated predictions'. Machine Learning 37(3), 297{336.
Schierholz, M.: 2014, `Automating Survey Coding for Occupation'. Technical Report FDZMethodenreport 10/2014, Institute for Employment Research, Nuremberg, DE.
Schonlau, M. and M. P. Couper: 2016, `Semi-automated categorization of open-ended questions'. Survey Research Methods 10(2), 143{152.
Spasic, I., D. Owen, A. Smith, and K. Button: 2018, `Closing in on open{ended patient questionnaires with text mining'. In: Proceedings of the UK Healthcare Text Analytics Conference (HealTAC). Manchester, UK.
Viechnicki, P.: 1998, `A Performance Evaluation of Automatic Survey Classi ers'. In: Proceedings of the 4th International Colloquium on Grammatical Inference (ICGI 1998). Ames, US, pp. 244{256.
Zhang, X.: 2011, `Support Vector Machines'. In: C. Sammut and G. I. Webb (eds.): Encyclopedia of Machine Learning. Heidelberg, DE: Springer, pp. 941{946.

Metrics



Back to previous page
BibTeX entry
@article{oai:it.cnr:prodotti:401327,
	title = {Building automated survey coders via interactive machine learning},
	author = {Esuli A. and Moreo Fernandez A. D. and Sebastiani F.},
	publisher = {NTC Publications., Henley-on-Thames, Regno Unito},
	doi = {10.1177/1470785318824244 and 10.48550/arxiv.1903.12110},
	journal = {International journal of market research},
	volume = {61},
	pages = {1–22},
	year = {2019}
}