Semantic Apparatus – Latent Semantic Scaling: A Semisupervised Text Analysis Technique for New Domains and Languages

Cited by Lee Sonogan

Is The Man Who Is Tall Happy? - Double Exposure Blog

Abstract by Kohei Watanabe

Many social scientists recognize that quantitative text analysis is a useful research methodology, but its application is still concentrated in documents written in European languages, especially English, and few sub-fields of political science, such as comparative politics and legislative studies. This seems to be due to the absence of flexible and cost-efficient methods that can be used to analyze documents in different domains and languages. Aiming to solve this problem, this paper proposes a semisupervised document scaling technique, called Latent Semantic Scaling (LSS), which can locate documents on various pre-defined dimensions. LSS achieves this by combining user-provided seed words and latent semantic analysis (word embedding). The article demonstrates its flexibility and efficiency in large-scale sentiment analysis of New York Times articles on the economy and Asahi Shimbun articles on politics. These examples show that LSS can produce results comparable to that of the Lexicoder Sentiment Dictionary (LSD) in both English and Japanese with only small sets of sentiment seed words. A new heuristic method that assists LSS users to choose a near-optimal number of singular values to obtain word vectors that best capture differences between documents on target dimensions is also presented.

Publication: Communication Methods and Measures (Peer-Reviewed Journal)

Pub Date: 1 Nov, 2021 Doi:

Keywords: Latent, Semantic Scaling, Text Analysis, Domains and Language (Plenty more sections and references in this research article)

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.