Determination of Authetic News Sources based on Machine Learning.
This project uses a "Real and Fake News" dataset.
analysis.R
Runs the data implemented data analysis and produces measures of accuracy for the generated models.report.Rmd
is an R-markdown file that generates a final report.report.pdf
is the final report generated for this project.citations.bibtex
is the Bibliography/Citations database file for this report.citation_style.csl
is a Citation Style Language xml file to appropriately define how to format citations.
The Real and Fake news data used in this analysis was accessed and downloaded from the University of Victoria ISOT Research Lab where they have published several datasets from their own catalogue. This dataset comes with the following citations:
Ahmed H, Traore I, Saad S. “Detecting opinion spams and fake news using text classification”,Journal of Security and Privacy, Volume 1, Issue 1, Wiley, January/February 2018.
Ahmed H, Traore I, Saad S. (2017) “Detection of Online Fake News Using N-Gram Analysis and Machine Learning Techniques. In: Traore I., Woungang I., Awad A. (eds) Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments. ISDDC 2017. Lecture Notes in Computer Science, vol 10618. Springer, Cham(pp. 127-138).
This dataset leverages three English sentiment lexicons, cited below:
For the NRC VAD
Lexicon
Mohammad, Saif M. 2018. “Obtaining Reliable Human Ratings of Valence, Arousal, and Dom- inance for 20,000 Englishwords.” Melbourne, Australia.
For the NRC
Lexicon
Mohammad, Saif M., and Peter D. Turney. 2013. “CROWDSOURCING a Word–Emotion Association Lexicon.” Computational Intelligence 29 (3): 436–65. https://doi.org/10.1111/j.1467-8640.2012.00460.x.
For the Afinn
Lexicon
Nielsen, Finn Årup. 2011. “A New ANEW: Evaluation of a Word List for Sentiment Analysis in Microblogs.” CoRR abs/1103.2903. http://arxiv.org/abs/1103.2903.