Date of Award
Fall 2021
Type
Thesis
Major
Computer Science - Applied Computing Track
Degree Type
Master of Science in Applied Computer Science
Department
TSYS School of Computer Science
First Advisor
Dr. Lydia Ray
Second Advisor
Dr. Rania Hodhod
Third Advisor
Dr. Lixin Wang
Abstract
Fake news and conspiracy theories have become largely abundant in the expanding world of social media. They predominantly affect the beliefs and thoughts of the public, resulting in chaos. They have always existed throughout the last few decades. They have been linked to prejudice, revolutions and genocide across history. They have also been known to have propelled people to reject mainstream medicines to an extent where some diseases are recurring in some parts of the world. They impose a serious impact since they are capable of spreading very fast Thus, it is very important to find suitable ways to detect fake news and conspiracy theories in social media, which requires a thorough analysis of their features. This study presents a survey on the various techniques of feature extraction and classification that can be implemented to classify and detect fake news and conspiracy theories from twitter datasets. The results indicate that the tf-idf method of feature extraction, when implemented with the svm classification algorithm, yields the highest accuracy of 99.6% in comparison to the other algorithms i.e. multinomial naive bayes, logistic regression and decision tree. The Bag of Words model yields an accuracy of 52.3% for both multinomial naive bayes and logistic regression algorithms and a lower range of accuracies for the other two algorithms i.e. svm and decision tree . TF-IDF has thus performed better than Bag of Words.
Recommended Citation
Shana, Deb, "A Comparative Study on Feature Extraction and Classification/Clustering of Fake News and Conspiracy Theories from Twitter Data" (2021). Theses and Dissertations. 446.
https://csuepress.columbusstate.edu/theses_dissertations/446