#WebSci20 – Paper Session 8: NLP+CSS by Robert Thorburn
Posted on behalf of Robert Thorburn
The eighth and second last paper session at Web Science 2020 dealt with the use of Natural Language Processing (NLP) in the Computational Social Siences (CSS). Unsurprisingly, NLP techniques were employed by a number of researchers in other paper sessions, but the need for a more focused session was clear, given the utility of such techniques for CSS studies. Papers presented during this session included predicting symptoms of depression, studying political stance changes, online identity and content propagation, and the reception of education reforms through the Blogosphere.
Although each paper put the combination of publicly available data and powerful NLP techniques good use, the standout feature was the depth of divergent knowledge which could be gained through the application of similar techniques. So, for instance, the first paper developed a representation of social media users’ mood over time as a predictor for depression. This time wise study being key in differentiating between mood and short-term emotion. The final paper on the other hand explored the way in which teachers and connected individuals discussed, reacted to, and lobbied against education reforms over a set time period. In addition to the differing areas of inquiry it is also interesting to note that differing data sources yield different ethical concerns. Per example, Twitter may be used anonymously and contains very short messages intended for a global audience while blogs can be deeply personal and do not necessarily have the same anticipation of a global audience. Although blogs can be anonymous, this functions differently and caries different expectations.
A further interesting issue raised by presenters was that of silence. When an individual is not posting content, this could simply be seen as a blank space and ignored or averaged out in calculations. However, if the issue at hand is one of mental health then such silence could actually be meaningful and must be considered with greater care and possibly also directly reflected in the data under analysis.
Lastly, during open discussion, the presenters focused on the issues of data persistence and platform limitations. Data persistence is a nuanced and changeable issue in as much as the retention and deletion of data might be beyond the control of any one party. Major role players include the data subject, the platform hosting the data, data repositories, the Internet Archive, legislation, etc. The impact of platforms is specifically notable since they not only play a major role in data storage but also in data creation. This is not only due to potential content restrictions but also due to format limitations such as Twitter’s character limit.