A Weakly Supervised and Deep Learning Method for an Additive Topic Analysis of Large Corpora

Authors

  • Yair Fogel-Dror The Hebrew University of Jerusalem
  • Shaul R. Shenhav Department of Political Science, The Hebrew University, Jerusalem
  • Tamir Sheafer Department of Communications and Political Science, The Hebrew University, Jerusalem

DOI:

https://doi.org/10.5117/CCR2021.1.002.FOGE

Keywords:

Weak supervision, deep learning, topic analysis, computational content analysis

Abstract

The collaborative effort of a theory-driven content analysis can benefit significantly from the use of topic analysis methods, which allow researchers to add more categories while developing or testing a theory. Additivity also enables the reuse of previous efforts or the merging of separate research projects, thereby increasing the accessibility of such methods and the ability of the discipline to create shareable content analysis capabilities. This paper proposes a weakly supervised topic analysis method, which combines a low-cost unsupervised method to compile a training-set and supervised deep learning as an additive and accurate text classification method. We test the validity of the method, specifically its additivity, by comparing the results of the method after adding 200 categories to an initial number of 450. We show that the suggested method is a solid starting point for a low-cost and additive solution for a large-scale topic analysis.

Downloads

Published

2021-04-13

Issue

Section

Articles

How to Cite

Fogel-Dror, Y., Shenhav, S. R., & Sheafer, T. (2021). A Weakly Supervised and Deep Learning Method for an Additive Topic Analysis of Large Corpora. Computational Communication Research, 3(1), 29-59. https://doi.org/10.5117/CCR2021.1.002.FOGE