Giter Site home page Giter Site logo

pnavaro / big-data Goto Github PK

View Code? Open in Web Editor NEW
51.0 51.0 30.0 170.53 MB

Python tools for big data

Home Page: https://pnavaro.github.io/big-data

Dockerfile 0.18% Jupyter Notebook 99.82% CSS 0.01%
dask data-science hadoop jupyter-book notebooks python spark

big-data's Introduction

Bonjour 👋

Cover

Pierre Navaro, scientific computing engineer

I work at IRMAR in Rennes, France.

Languages

Python Fortran Julia

Github stats

Top Langs

Python courses

Python for scienticific computing

Python for Big Data

Python with Fortran

Projects

GeometricClusterAnalysis.jl

GEMPIC.jl

VlasovSolvers.jl

AnalogDataAssimilation.jl

HOODESolver.jl

SemiLagrangian.jl

OTRecod

big-data's People

Contributors

pnavaro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

big-data's Issues

Problème Ouverture csv

Bonjour Mr Navarro,

Je rencontre des difficultés à ouvrir mon dossier csv.
En effet, j'ai bien réussi à installer Pyspark sur mon Pycharm

from pyspark.sql import SparkSession
spark = SparkSession
.builder
.appName('Convert CSV to parquet')
.master('local')
.config('spark.hadoop.parquet.enable.summary-metadata', 'true')
.getOrCreate()

spark.read.format('csv').options(header='true').load('data_small.csv')
df = spark.read.csv(u'D:\AILIS\bigdata\yellow_tripdata_2012_01.csv', header="true", inferSchema="true")

Je reçois le message d'erreur suivant : pyspark.sql.utils.AnalysisException: Path does not exist: file:/D:/AILIS/bigdata/yellow_tripdata_2012_01.csv (alors que la base s'y trouve bien !)

C'est apparement un problème d'Hadoop mais je ne comprends pas comment le résoudre malgré mes nombreuses recherches sur internet... peut-être est-ce au niveau du master ('local ') ? Vous indiquez également qu'il faut le convertir en parquet dans mon hdfs homedirectory, à quoi cela correspond ?

Puis-je ne télécharger que 2 mois, je n'ai malheureusement pas la place de télécharger les 12 mois sur mon ordinateur.

Merci d'avance

En vous souhaitant de belles fêtes

Cordialement

Aïlis THOMAS

Correction

Bonsoir,
Excusez-moi de vous déranger.
Est-ce possible de mettre les corrections sur Github ?

Cordialement

wrong use of pyarrow for import

In notebook 14-FileFormats, the code imports pyarrow as pa but uses it as pq.

import pyarrow as pa

pq.write_to_dataset(table, root_path="test", filesystem=hdfs)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.