Four best practices for measuring news sentiment using ‘off-the-shelf’ dictionaries: a large-scale p-hacking experiment

Authors

  • Chung-hong Chan Mannheimer Zentrum für Europäische Sozialforschung https://orcid.org/0000-0002-6232-7530
  • Joseph Bajjalieh Cline Center for Advanced Social Research, University of Illinois Urbana-Champaign, USA
  • Loretta Auvil Cline Center for Advanced Social Research, University of Illinois Urbana-Champaign, USA
  • Hartmut Wessler Institute for Media and Communication Studies, University of Mannheim, Germany
  • Scott Althaus Cline Center for Advanced Social Research, University of Illinois Urbana-Champaign, USA
  • Kasper Welbers Department of Communication Science, Vrije Universiteit Amsterdam, Netherlands
  • Wouter van Atteveldt Department of Communication Science, Vrije Universiteit Amsterdam, Netherlands
  • Marc Jungblut Department of Media and Communication, Ludwig Maximilian University of Munich, Germany

DOI:

https://doi.org/10.5117/CCR2021.1.001.CHAN

Keywords:

sentiment analysis, p-hacking, news sentiment, agenda setting, text-as-data, validity

Abstract

We examined the validity of 37 sentiment indicators based on dictionary-based methods using a large news corpus and demonstrate the risk of generating a spectrum of results with different levels of statistical significance by presenting an analysis of relationships between news sentiment and U.S. presidential approval. We summarize our findings into four best practices: 1) use a theory-informed sentiment dictionary; 2) do not assume that the validity and reliability of the dictionary is built-in’; 3) check for the influence of content length and 4) do not use multiple dictionaries to test the same statistical hypothesis.

Published

2021-04-13

Issue

Section

Articles

How to Cite

Chan, C.- hong, Bajjalieh, J., Auvil, L., Wessler, H., Althaus, S., Welbers, K., van Atteveldt, W., & Jungblut, M. (2021). Four best practices for measuring news sentiment using ‘off-the-shelf’ dictionaries: a large-scale p-hacking experiment. Computational Communication Research, 3(1), 1-27. https://doi.org/10.5117/CCR2021.1.001.CHAN