This project is a simple data pipeline implemented in Python that utilizes Kafka and Elasticsearch for data streaming and storage. The pipeline is designed to handle data from a Kafka topic and index it in Elasticsearch, allowing for easy querying and analysis using Kibana.
Before running the program, make sure you have the following services installed and running:
-
Kafka:
./bin/kafka-server-start.sh ./config/kraft/server.properties
-
Elasticsearch:
./bin/elasticsearch
-
Kibana:
./bin/kibana
Once the services are up and running, you can start the data pipeline:
python main.py
To query and visualize the data, access Kibana using the following link:
http://localhost:5601/app/dev_tools#/console
Here are some sample queries that you can use for analyzing the data:
GET /firsttopic/_search
{
"query": {
"match_all": {}
}
}
POST /firsttopic/_search
{
"query": {
"match": {
"state": "North Carolina"
}
}
}
POST /firsttopic/_search
{
"query": {
"term": {
"state.keyword": "North Carolina"
}
}
}
Feel free to customize and expand these queries based on your data and analysis requirements.