Can we forecast social trends, wars, election results etc based on social media data?

17 October

We cannot forecast social trends based on the processing of social media data. The reasons behind this are ethical.

Data-mining and the analysis of how people express themselves online around current events cannot produce a robust approach to ascertaining attitudes if it is premised on 1) automation, and 2) using open-access data without the full and informed consent of participants.

"Informed consent" is the gold standard for ethical social science research today, and it means that participants should be informed in advance that their responses will be used as data for research, and they must agree to participate in the research project. Informed consent can given by a participant when they sign a form that explains the aims and objectives of the research, or when they click through the first explanatory page of a survey when providing responses online, for example.

So by treating all online interactions as data, we would not be complying with basic premises of ethical social research. After all, people engage in public debates online and offline, in various settings, for a range of reasons. They express a range of views and sentiments ad hoc, off the cuff, and without considering that their acts online might be treated in the same way that their responses to a formal questionnaire would.

Despite these ethical problems, some social media data is used by companies to draw conclusions about their users and about potential consumer markets. There is ongoing debate about how transparent the actions of these companies are, and how far users are aware of how their data is used. Some social scientists also engage with open access data, but this engagement is guided by special ethical standards at every step of the research process - from access, to analysis, to eventual citation in the final research outcome. Researchers need to make sure that their engagement will online data does not violate the privacy, anonymity, or rights of research participants, and for this reason it is difficult at present to use social media data on a mass scale to draw conclusions about society. And it is even more complicated to use any conclusions we can draw to then formulate predictions.

