Cited by Lee Sonogan
Compared with conventional word embeddings, sentiment embeddings can distinguish words with similar contexts but opposite sentiment. They can be used to incorporate sentiment information from labeled corpora or lexicons by either end-to-end training or sentiment refinement. However, these methods present two major limitations. First, traditional approaches provide a fixed representation to each word but ignore the alternation of word meaning in different contexts. As a result, the polarity of a certain emotional word may vary with context, but will be assigned with a same representation. Another problem is the handling of out-of-vocabulary (OOV) or informal-writing sentiment words that would be assigned generic vectors (e.g., <UNK>). In addition, if affective words are not included in affective corpora or lexicons, they would be treated as neutral. Using such low-quality embeddings for building a neural model will reduce performance. This study proposes a training model of contextual sentiment embeddings. A stacked two-layer GRU model was used as the language model, simultaneously trained to incorporate semantic and sentiment information from labeled corpora and lexicons. To deal with OOV or informal-writing sentiment words, the WordPiece tokenizer was used to divide the text into subwords. The resulting model can be transferred to downstream applications by either feature extractor or fine-tuning. The results show that the proposed model can handle unseen or informal writing sentiment words and thus outperforms previously proposed methods.
Publication: Knowlege-Based Systems (Peer-Reviewed Journal)
Pub Date: Nov 1, 2021 Doi: https://doi.org/10.1016/j.knosys.2021.107663
Keywords: Contextual sentiment embeddings, Sentiment analysis, Pre-trained language model Gated recurrent unit
https://www.sciencedirect.com/science/article/pii/S0950705121009254 (Plenty more sections and references in this research article)