The sentiment analysis model used in this project comes from huggingface: it's the bert-base-multilingual-uncased-sentiment
model. More info about it here.
To use this model, you'll have to first download its files (binary file and configurations).
Create a model
folder at the same location of es_data
(the root of this project) and download pretrained weights and configuration files from huggingface in it.
mkdir model
cd model
wget https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment/resolve/main/config.json
wget https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment/blob/main/pytorch_model.bin
wget https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment/blob/main/special_tokens_map.json
wget https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment/blob/main/tokenizer_config.json
wget https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment/blob/main/vocab.txt
The folder will be mounted to the container.
-
elasticsearch: this will start an Elasticsearch container listening at port 9200 with a data folder mounted to es_data
-
kibana: this will start a Kibana container listening at port 5601 and linked to the elasticsearch service
-
backend: this will a container that will upload the initial data to ES
docker-compose up --build ## ommit the build option if you've already built the images
Then visit http://<public-ip-of-aws-instance>:5601 to access the Kibana.
When you visit the previous link you won't see the dashboard yet, that's because you have to;
-
configure the index in Kibana
- Go Management panel
- Select Stack Management
- Under the Kibana header, click on Index Patterns
- You'll see a search bar: type in "tweets" then select it as index
-
upload the objects: visuals and dashboard:
- Go Management panel
- Select Stack Management
- Under the Kibana header, click on Import
- Pick the
export.ndjson
from the kibana folder and upload it
To run the scraper and update the database
docker-compose -f docker-compose.crawler.yml up --build