Giter Site home page Giter Site logo

spark_project's Introduction

Spark Configuration in window 10

  1. Downlaod all required file from below URL:
https://drive.google.com/drive/folders/1rBauyUVCRTbnKXgkMGh4l9MdIOVj8CQc?usp=sharing
  1. Install java .exe file

note: choose installtion path of java to "C:" drive

  1. Extract spark file in C drive

  2. Extract kafka file in C drive

  3. Add environment variable

ENVIRONMENT VARIABLE NAME VALUE
HADOOP_HOME C:\winutils
JAVA_HOME C:\Java\jdk1.8.0_202
SPARK_HOME C:\spark-3.0.3-bin-hadoop2.7
  1. select path variable from environment variable and add below values.
%SPARK_HOME%\bin
%HADOOP_HOME%\bin
%JAVA_HOME%\bin
C:\Java\jre1.8.0_281\bin

Create conda environment

  1. open conda terminal execute below command
conda create -n <env_name> python=3.8 -y
  1. select <env_name> created in previous step for project interpreter in pycharm.

  2. Install all necessary python library specified in requirements.txt file using below command.

pip install -r requirements.txt
  1. To upload your code to gihub repo
git init
git add .
git commit -m "first commit"
git branch -M main
git remote add origin <github_repo_link>
git push -u origin main

Train random forest model on insurance dataset

python training\stage_00_data_loader.py
python training\stage_01_data_validator.py
python training\stage_02_data_transformer.py
python training\stage_03_data_exporter.py
spark-submit training\stage_04_model_trainer.py

Prediction using random forest of insurance dataset

python prediction\stage_00_data_loader.py
python prediction\stage_01_data_validator.py
python prediction\stage_02_data_transformer.py
python prediction\stage_03_data_exporter.py
spark-submit prediction\stage_04_model_predictor.py

start zookeeper and kafka server

start kafka producer using below command

spark-submit csv_to_kafka.py

start pyspark consumer using below command

spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1  spark_consumer_from_kafka.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.