The Disinfo Radar pipeline consists of these main steps:
- Scrape articles
- Preprocess articles
- Load sentences from articles
- Extract technology-related spans from sentences
- Predict disinformation potential factor scores for each span
- Analyze overall disinformation potential for tech topics
Here is an example of how to set up and run the pipeline:
conda create -n disinfo_pipeline python=3.9
conda activate disinfo_pipeline
pip install -r requirements.txt
python DisinfoPipeline.py
A couple of NLTK packages are also needed, which can be installed from a Python shell like this:
import nltk
nltk.download('punkt')
nltk.download('stopwords')
For now, output data is stored locally in the Output
directory and a subset is uploaded to Google Drive to feed into Infogram visualizations.
To see or edit file locations or pipeline settings, use the config.ini
file.
For a quick test run of the pipeline with a small data sample, set test_run_small_sample
to true in config.ini
.