Giter Site home page Giter Site logo

donnemartin / data-science-ipython-notebooks Goto Github PK

View Code? Open in Web Editor NEW
26.4K 1.6K 7.7K 47.88 MB

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

License: Other

Python 99.99% CSS 0.01% Makefile 0.01% Dockerfile 0.01%
python machine-learning deep-learning data-science big-data aws tensorflow theano caffe scikit-learn

data-science-ipython-notebooks's People

Contributors

38elements avatar amaanc avatar amarrella avatar anishshah avatar besirkurtulmus avatar donnemartin avatar greninja avatar incessantmeraki avatar kaihuahuang avatar poornas avatar readmecritic avatar tuanavu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data-science-ipython-notebooks's Issues

Python 3 compatibly issues

flake8 testing of https://github.com/donnemartin/data-science-ipython-notebooks on Python 3.7.0

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./scipy/thinkplot.py:475:14: F821 undefined name 'xp'
        xs = xp.delete(xs, 0)
             ^
./data/titanic/myfirstforest.py:84:19: E999 SyntaxError: invalid syntax
print 'Training...'
                  ^
./data/titanic/gendermodel.py:47:46: E999 SyntaxError: invalid syntax
print 'Proportion of women who survived is %s' % proportion_women_survived
                                             ^
./data/titanic/genderclassmodel.py:42:10: F821 undefined name 'xrange'
for i in xrange(number_of_classes):
         ^
./data/titanic/genderclassmodel.py:43:14: F821 undefined name 'xrange'
    for j in xrange(number_of_price_brackets):
             ^
./data/titanic/genderclassmodel.py:84:14: F821 undefined name 'xrange'
    for j in xrange(number_of_price_brackets):
             ^
./deep-learning/keras-tutorial/deep_learning_models/vgg19.py:163:9: F821 undefined name 'preprocess_input'
    x = preprocess_input(x)
        ^
./deep-learning/keras-tutorial/deep_learning_models/vgg19.py:167:25: F821 undefined name 'decode_predictions'
    print('Predicted:', decode_predictions(preds))
                        ^
./deep-learning/keras-tutorial/deep_learning_models/resnet50.py:242:9: F821 undefined name 'preprocess_input'
    x = preprocess_input(x)
        ^
./deep-learning/keras-tutorial/deep_learning_models/resnet50.py:246:25: F821 undefined name 'decode_predictions'
    print('Predicted:', decode_predictions(preds))
                        ^
./deep-learning/keras-tutorial/deep_learning_models/vgg16.py:161:9: F821 undefined name 'preprocess_input'
    x = preprocess_input(x)
        ^
./deep-learning/keras-tutorial/deep_learning_models/vgg16.py:165:25: F821 undefined name 'decode_predictions'
    print('Predicted:', decode_predictions(preds))
                        ^
./deep-learning/keras-tutorial/solutions/sol_112.py:2:1: E999 SyntaxError: invalid syntax
%timeit -n 1 -r 1 ann.train(zip(X,y), iterations=100)
^
./deep-learning/keras-tutorial/solutions/sol_111.py:2:1: E999 SyntaxError: invalid syntax
%timeit -n 1 -r 1 ann.train(zip(X,y), iterations=2)
^
./deep-learning/theano-tutorial/rnn_tutorial/lstm_text.py:255:22: E999 SyntaxError: invalid syntax
        print 'epoch:', epoch
                     ^
./deep-learning/tensor-flow-examples/multigpu_basics.py:84:37: E999 SyntaxError: invalid syntax
print "Single GPU computation time: " + str(t2_1-t1_1)
                                    ^
./deep-learning/tensor-flow-examples/input_data.py:95:34: F821 undefined name 'xrange'
      fake_image = [1.0 for _ in xrange(784)]
                                 ^
./deep-learning/tensor-flow-examples/input_data.py:97:35: F821 undefined name 'xrange'
      return [fake_image for _ in xrange(batch_size)], [
                                  ^
./deep-learning/tensor-flow-examples/input_data.py:98:31: F821 undefined name 'xrange'
          fake_label for _ in xrange(batch_size)]
                              ^
6     E999 SyntaxError: invalid syntax
13    F821 undefined name 'xrange'
19

Add notebook for Bokeh

"Bokeh is a Python interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of novel graphics in the style of D3.js, but also deliver this capability with high-performance interactivity over very large or streaming datasets. Bokeh can help anyone who would like to quickly and easily create interactive plots, dashboards, and data applications."

Bokeh seems like a good candidate to feed data from Spark streaming and sharing results to stakeholders who don't use visualization tools like Tableau.

Bokeh at Pycon

PyTorch tutorials

Hey, I see that there are no tutorial notebooks for implementing machine learning algorithms and neural networks in PyTorch in this repo yet. PyTorch is gaining a lot of traction lately and is really going to be one of the most popular frameworks due to its dynamic computational graph & eager execution.

I would like to add such tutorial notebooks in PyTorch.

Trax Tutorials

Hey,

I see that there are no tutorial notebooks for Trax implementations in this repository yet. Trax is an end-to-end library for deep learning that focuses on clear code and speed. It is actively used and maintained in the Google Brain team.

I would like to add such tutorial notebooks in Trax

Data preprocessing

I'm trying to clean my data and do some preprocessing but i don't have much understanding of the Columns if the zero in them are normal or missing values i'm using the dataset cic-collection on kaggle if any expert would help i'd be much thankful

Update Spark notebook to use DataFrames

Spark 1.3 introduced DataFrames:

A DataFrame is a distributed collection of data organized into named columns. It is conceptually equivalent to a table in a relational database or a data frame in R/Python, but with richer optimizations under the hood. DataFrames can be constructed from a wide array of sources such as: structured data files, tables in Hive, external databases, or existing RDDs.

https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#dataframes

Spark 1.6 introduces DataSets:

A new Dataset API. This API is an extension of Spark’s DataFrame API that supports static typing and user functions that run directly on existing JVM types (such as user classes, Avro schemas, etc). Dataset’s use the same runtime optimizer as DataFrames, but support compile-time type checking and offer better performance. More details on Datasets can be found on the SPARK-9999 JIRA.

https://databricks.com/blog/2015/11/20/announcing-spark-1-6-preview-in-databricks.html

Note: PySpark will included DataSets in the upcoming Spark 2.0.

Command to run mrjob s3 log parser is incorrect

Current:

python mr-mr_s3_log_parser.py -r emr s3://bucket-source/ --output-dir=s3://bucket-dest/"

Should be:

python mr_s3_log_parser.py -r emr s3://bucket-source/ --output-dir=s3://bucket-dest/"

"Error 503 No healthy backends"

Hello,

When I try to open the hyperlinks which should direct me to the correct ipython notebook, it returns me "Error 503 No healthy backends"
"No healthy backends
Guru Mediation:
Details: cache-fra1236-FRA 1462794681 3780339426
Varnish cache server"

capture

capture2

Thanks
Jiahong Wang

code

sir ,plz send me code along with churn dataset

pandas - Working with Strings

In the code cell 3:
data = ['peter', 'Paul', None, 'MARY', 'gUIDO']
data.map(lambda s: s.capitalize())
was changed to
[s.capitalize() for s in data if s!= None ]
and now the code woks fine.

Add instructions to configure IPython/PySpark for python 3, now supported with Spark 1.4

Reported by core_dumpd on Reddit /r/DataScience.

Solution seems to be discussed in Stack Overflow here.

core_dumpd reports the following works, need to confirm and update repo:

I end up running this:
PYSPARK_DRIVER_PYTHON_OPTS="notebook --profile=pyspark" /usr/local/spark/bin/pyspark

With:
PYSPARK_PYTHON=/opt/anaconda/bin/ipython PYSPARK_DRIVER_PYTHON=/opt/anaconda/bin/ipython

I'm running on docker based on sequenceiq/hadoop-docker:latest with Spark/MiniConda added on top. The only real config options in the profile are for the ip = '*' and open_browser = False.

Notice: Maintainer Temporarily Unavailable

Hi Everyone,

I'll be temporarily unavailable starting April 17 and won't be able to respond to issues or pull requests. I'm hoping I'll be available in a couple of weeks, but I might not be responsive for several weeks.

Sorry for the inconvenience.

-Donne

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.