from social media to expert reports: automatically validating and extending complex conceptual models using machine learning approaches
abstract
given the importance of developing accurate models of any complex system, the modeling
process often seeks to be comprehensive by including experts and community members.
while many qualitative modeling processes can produce models in the form of
maps (e.g., cognitive/concept mapping, causal loop diagrams), they are generally conducted
with a facilitator. the limited capacity of the facilitators limits the number
of participants. the need to be either physically present (for face-to-face sessions) or
at least in a compatible time zone (for phone interviews) also limits the geographical
diversity of participants. in addition, participants may not openly express their beliefs
(e.g., weight discrimination, political views) when perceiving that they may not be well
received by a facilitator or others in the room. in contrast, the naturally occurring exchange
of perspectives on social media provides an unobtrusive approach to collecting
beliefs on causes and consequences of such complex systems. mining social media also
supports a scalable approach and a geographically diverse sample. while obtaining a
conceptual model via social media can inform policymakers about popular support for
possible policies, the model may stand in stark contrast with an expert-based model.
identifying and reconciling these differences is an important step to integrate social computing
with policy making.
the pipeline to automatically validate large conceptual models, here of obesity and
politics using large text data-set (academic reports or social media like twitter) comprise
technical innovation of applying machine learning approaches. this is achieved
by generating relevant keywords using wordnet interface from nltk, articulating topic
modelling using gensim lda model, entity recognition using google cloud natural language
processing api and categorizing themes by count vectorizer and tf-idf transformer
using scikit-learn library. once the pipeline validates the model, it is further suggested
for extension by mining literature or twitter conversations and using granger causality
tests on the time series gained from respective sources of data. later we realize the
impact of the shift in public opinion on twitter, which can alter the results of validation
and extension of conceptual models while using our computational methods. so we
finally compare the sentiment analysis and sarcasm detection results on these conceptual
models. analyzing these results we discuss whether the confirmed and extended associations
in our conceptual model are an artifact of our method or an accurate reflection
of events related to that complex conceptual model. the combination of these machine
learning approaches will help us automatically confirm and extend complex conceptual
models with less hassle of money, time and resources. it can be used for automatically
formulating public policies which are created in response to issues brought before decision
makers, instead we create them using issues discussed everyday on social media
platform.