Streaming Data to Hbase with Apache Flume

Big ISSUE : Zookeeper port permission denied 50700

This repository provides guidance and commands for setting up Apache Flume to stream data to HDFS in real-time. The focus of this lecture is to configure a Flume Agent to monitor a specified directory and stream data to a corresponding directory in HDFS. The example uses SpoolDir as a source.

Prerequisites

Before you begin, ensure that you have the following components installed:

Java 8
Apache HBase 1.4.9
Apache Hadoop 2.7.2
Apache Flume 1.9.0
InfluxDB
Phoneix
Grafana

Installation and Configuration

First try to install docker deamon
Pull the docker image of hadoop and create a docker network to communicate between the containers

  docker pull liliasfaxi/spark-hadoop:hv-2.7.2

  docker network create --driver=bridge hadoop

Create Three Container 1 master 2 slaves

  docker run -itd --net=hadoop -p 50070:50070 -p 8088:8088 -p 7077:7077 -p 16010:16010 \
            --name hadoop-master --hostname hadoop-master \
            liliasfaxi/spark-hadoop:hv-2.7.2

  docker run -itd -p 8040:8042 --net=hadoop \
        --name hadoop-slave1 --hostname hadoop-slave1 \
              liliasfaxi/spark-hadoop:hv-2.7.2

  docker run -itd -p 8041:8042 --net=hadoop \
        --name hadoop-slave2 --hostname hadoop-slave2 \
              liliasfaxi/spark-hadoop:hv-2.7.2

Use VS code docker extension to attach with the docker container for the hadoop-master container or use this alternative commands

    docker exec -it hadoop-master bash

Install Java in the container and modify ~/.bashrc for JAVA_HOME
You will see that hadoop and hbase is already installed

if you want to check hadoop is working go into this workshop link below :

Install Flume

cd ~/usr/local

# Download Apache Flume
wget https://www.apache.org/dyn/closer.lua/flume/1.9.0/apache-flume-1.9.0-bin.tar.gz

# Extract the downloaded tar.gz file
tar -zxvf apache-flume-1.9.0-bin.tar.gz

# Rename the extracted directory to "flume"
mv apache-flume-1.9.0-bin flume

Configure Flume and FLUME_HOME

You can change in ~.bashrc for the exportation FLUME_HOME , if the configuration is not refresed try to export in the terminal

export FLUME_HOME=$HOME/usr/local/flume
export PATH=$PATH:$FLUME_HOME/bin
export CLASSPATH=$CLASSPATH:$FLUME_HOME/lib/*

see other available video youtube until this command work in any path from the user

flume-ng version

Start Hadoop deamon and Hbase thrift server

cd
./start-hadoop.sh
start-hbase.sh

You can check the ports to see how datanodes are working

start the Hbase Thrift server

hbase thrift start

Creat a table in hbase

CREATE 'logs' , 'cf'

Configure Flume conf and add file logs in spool directory

flume-ng agent --conf conf --conf-file ./conf/flumelogs.conf --name a1 -Dflume.root.logger=INFO,console

Check if hbase table data is added

scan 'logs'

Install python 3.9

wget https://www.python.org/ftp/python/3.9.7/Python-3.9.7.tgz

tar -zxvf Python-3.9.7.tgz

cd Python-3.9.7

./configure

make
make install

Create and Activate python virtual environement

python3 -m venv venv_1

source venv_1/bin/activate

Project Directory Structure

log.txt: Sample log file to be streamed.
flumelogs.conf: Configuration file for the Flume Agent.
spool: Directory to monitor for incoming data.
etl .py: Extract data logs from hbase and load it into influxdb for Grafana
main.py: Generator
api: a folder wich contains a web application

iblahlou / data_log_analysis Goto Github PK

data_log_analysis's Introduction

Streaming Data to Hbase with Apache Flume

Prerequisites

Installation and Configuration

Project Directory Structure

Contributors

Video

Presentation

Project Architecture

data_log_analysis's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent