Giter Site home page Giter Site logo

mahmoudparsian / pyspark-tutorial Goto Github PK

View Code? Open in Web Editor NEW
1.1K 55.0 454.0 9.19 MB

PySpark-Tutorial provides basic algorithms using PySpark

Home Page: http://mapreduce4hackers.com

License: Other

Python 18.52% Shell 2.52% Jupyter Notebook 78.96%
big-data big-data-analytics data-algorithms pyspark spark spark-dataframes spark-rdd

pyspark-tutorial's Introduction

PySpark Tutorial

  • PySpark is the Python API for Spark.

  • The purpose of PySpark tutorial is to provide basic distributed algorithms using PySpark.

  • PySpark supports two types of Data Abstractions:

    • RDDs
    • DataFrames
  • PySpark Interactive Mode: has an interactive shell ($SPARK_HOME/bin/pyspark) for basic testing and debugging and is not supposed to be used for production environment.

  • PySpark Batch Mode: you may use $SPARK_HOME/bin/spark-submit command for running PySpark programs (may be used for testing and production environemtns)




PySpark Examples and Tutorials


Books


Miscellaneous


PySpark Tutorial and References...


Questions/Comments

Thank you!

best regards,
Mahmoud Parsian

Data Algorithms with Spark Data Algorithms with Spark PySpark Algorithms Data Algorithms

pyspark-tutorial's People

Contributors

dennisqi avatar mahmoudparsian avatar pyspark-in-action avatar randommm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyspark-tutorial's Issues

why stop() function doesn't work correctly?

`conf = (SparkConf()
#.setMaster("local")
.setMaster("spark://master:7077")
.setAppName("My_APP")
.set("spark.executor.cores", '1')
.set("spark.scheduler.mode", "FAIR")
)

sess = SparkSession.builder.config(conf=conf).getOrCreate()
sess.stop()
DataFrame = sess.sql("show tables")
print(DataFrame)
print(sess)`

This is the program output:
DataFrame[database: string, tableName: string, isTemporary: boolean] <pyspark.sql.session.SparkSession object at 0x7fe3b8e709b0>

Why is sparksesion not closed and what can I do to close it? Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.