Giter Site home page Giter Site logo

datatableton's Introduction

DatatableTon

๐Ÿ’ฏ datatable exercises

License GitHub

Mission ๐Ÿš€

To provide 100 Python Datatable exercises over different sections structured as a course or tutorials to teach and learn for beginners, intermediates as well as experts.

Datatable

The datatable package in Python is a library for efficient data processing, feature engineering and simple modelling of tabular data. It is synonymous with R's data.table library and heavily inspired by it.

It closely resembles pandas but is more focused on speed and multi-threaded data operations being particularly useful on large datasets.

Exercises ๐Ÿ“–

There are a total of 100 datatable exercises divided into 10 sets of Jupyter Notebooks with 10 exercises each. It is recommended to go through the exercises in order but you may start with any set depending on your expertise.

โœ… Structured as exercises & tutorials - Choose your style
โœ… Suitable for beginners, intermediates & experts - Choose your level
โœ… Available on Colab, Kaggle, Binder & GitHub - Choose your platform

The exercises are best experienced using datatable's v1.0.0 (Released on 1st July, 2021) & above but recommended to use the latest available version.

Set 01 โ€ข Datatable Introduction โ€ข Beginner โ€ข Exercises 1-10

Style Colab Kaggle Binder GitHub
Exercises Open in Colab Open in Kaggle Open in Binder Open in GitHub
Solutions Open in Colab Open in Kaggle Open in Binder Open in GitHub

Set 02 โ€ข Files and Formats โ€ข Beginner โ€ข Exercises 11-20

Style Colab Kaggle Binder GitHub
Exercises Open in Colab Open in Kaggle Open in Binder Open in GitHub
Solutions Open in Colab Open in Kaggle Open in Binder Open in GitHub

Set 03 โ€ข Data Selection โ€ข Beginner โ€ข Exercises 21-30

Style Colab Kaggle Binder GitHub
Exercises Open in Colab Open in Kaggle Open in Binder Open in GitHub
Solutions Open in Colab Open in Kaggle Open in Binder Open in GitHub

Set 04 โ€ข Frame Operations โ€ข Beginner โ€ข Exercises 31-40

Style Colab Kaggle Binder GitHub
Exercises Open in Colab Open in Kaggle Open in Binder Open in GitHub
Solutions Open in Colab Open in Kaggle Open in Binder Open in GitHub

Set 05 โ€ข Column Aggregations โ€ข Beginner โ€ข Exercises 41-50

Style Colab Kaggle Binder GitHub
Exercises Open in Colab Open in Kaggle Open in Binder Open in GitHub
Solutions Open in Colab Open in Kaggle Open in Binder Open in GitHub

Set 06 โ€ข Grouping Methods โ€ข Intermediate โ€ข Exercises 51-60

Style Colab Kaggle Binder GitHub
Exercises Open in Colab Open in Kaggle Open in Binder Open in GitHub
Solutions Open in Colab Open in Kaggle Open in Binder Open in GitHub

Set 07 โ€ข Multiple Frames โ€ข Intermediate โ€ข Exercises 61-70

Style Colab Kaggle Binder GitHub
Exercises Open in Colab Open in Kaggle Open in Binder Open in GitHub
Solutions Open in Colab Open in Kaggle Open in Binder Open in GitHub

Set 08 โ€ข Time Series โ€ข Intermediate โ€ข Exercises 71-80

Style Colab Kaggle Binder GitHub
Exercises Open in Colab Open in Kaggle Open in Binder Open in GitHub
Solutions Open in Colab Open in Kaggle Open in Binder Open in GitHub

Set 09 โ€ข Native FTRL โ€ข Expert โ€ข Exercises 81-90

Style Colab Kaggle Binder GitHub
Exercises Open in Colab Open in Kaggle Open in Binder Open in GitHub
Solutions Open in Colab Open in Kaggle Open in Binder Open in GitHub

Set 10 โ€ข Capstone Projects โ€ข Expert โ€ข Exercises 91-100

Style Colab Kaggle Binder GitHub
Exercises Open in Colab Open in Kaggle Open in Binder Open in GitHub
Solutions Open in Colab Open in Kaggle Open in Binder Open in GitHub

The Jupyter Notebooks can also be run locally by cloning the repo and running on your local jupyter server.

git clone https://github.com/vopani/datatableton.git
python3 -m pip install notebook
jupyter notebook

P.S. The notebooks will be periodically updated to improve the exercises and support the latest version.

Contribution ๐Ÿ› ๏ธ

Please create an Issue for any improvements, suggestions or errors in the content.

You can also tag @vopani on Twitter for any other queries or feedback.

Credits ๐Ÿ™

Collaborators

Datatable

License ๐Ÿ“‹

This project is licensed under the Apache License 2.0.

datatableton's People

Contributors

parulnith avatar shrinidhin avatar vopani avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datatableton's Issues

Include an exercise for qcut()

Describe Topic
Include an exercise to implement the qcut() function in datatables which is a function for binning columns

Suggested Set
It can possibly be included in Set 5: Column Aggregations

Sample Exercise
Exercise : Create a new column B in frame data which contains deciles of column A
Solution : data['B'] = dt.qcut([dt.f.A])

Typo in Set 6: `Set 6` mentioned instead of `Set 7` at the end of the exercise and solution notebook

Describe Error
There is a Typo at the end of both the exercise and solution notebook where Set 7 is inadvertently mentioned as Set 6.

โœ… This completes Set 6: Grouping Methods (Exercises 51-60) of DatatableTon: ๐Ÿ’ฏ datatable exercises
Set 6 is coming soon!

Exercise / Set
Set 6

Suggested Fix

โœ… This completes Set 6: Grouping Methods (Exercises 51-60) of DatatableTon: ๐Ÿ’ฏ datatable exercises
Set 7 is coming soon!

Additional Comments
Any other information about the error.

Exercise 25 error

Describe Error
"Select the element in the 4th row and 2nd column in data and assign it to value_1..."
Provided solution:
value_1 = data[4, 3]

Exercise / Set
Exercise 25 / Set 3

Suggested Fix
Shouldn't it be like value_1 = data[3, 1] or I'm missing something?
4th row considering numeration from 0 -> 3
2nd column considering numeration from 0 -> 1

Exercise 45 error

Describe Error
It is asked in the task to find the mode for several columns.
The right answer is:
alcohol | proline
12.37 | 520.0
13.05 | 680.0
Because the mode is not necessarily unique to a given discrete distribution, the probability mass function may take the same maximum value at several points.
In this case, alcohol values are repeated 6 times and proline values are repeated 5 times.

Exercise / Set
Exercise 45 / Set 5

Suggested Fix
I don't know - maybe you can specify in the task that only the first value should be returned or the mode function can be improved so it will return multiple values.

datatable version 1.0.0 fails in Google Colab

The latest version of datatable, i.e., 1.0.0, fails to install in the Google Colaboratory environment. It throws the following error:

ERROR: Exception:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/pip/_internal/cli/base_command.py", line 153, in _main
    status = self.run(options, args)
  File "/usr/local/lib/python3.7/dist-packages/pip/_internal/commands/install.py", line 382, in run
    resolver.resolve(requirement_set)
  File "/usr/local/lib/python3.7/dist-packages/pip/_internal/legacy_resolve.py", line 201, in resolve
    self._resolve_one(requirement_set, req)
  File "/usr/local/lib/python3.7/dist-packages/pip/_internal/legacy_resolve.py", line 365, in _resolve_one
    abstract_dist = self._get_abstract_dist_for(req_to_install)
  File "/usr/local/lib/python3.7/dist-packages/pip/_internal/legacy_resolve.py", line 313, in _get_abstract_dist_for
    req, self.session, self.finder, self.require_hashes
  File "/usr/local/lib/python3.7/dist-packages/pip/_internal/operations/prepare.py", line 224, in prepare_linked_requirement
    req, self.req_tracker, finder, self.build_isolation,
  File "/usr/local/lib/python3.7/dist-packages/pip/_internal/operations/prepare.py", line 49, in _get_prepared_distribution
    abstract_dist.prepare_distribution_metadata(finder, build_isolation)
  File "/usr/local/lib/python3.7/dist-packages/pip/_internal/distributions/source/legacy.py", line 37, in prepare_distribution_metadata
    self._setup_isolation(finder)
  File "/usr/local/lib/python3.7/dist-packages/pip/_internal/distributions/source/legacy.py", line 90, in _setup_isolation
    reqs = backend.get_requires_for_build_wheel()
  File "/usr/local/lib/python3.7/dist-packages/pip/_vendor/pep517/wrappers.py", line 152, in get_requires_for_build_wheel
    'config_settings': config_settings
  File "/usr/local/lib/python3.7/dist-packages/pip/_vendor/pep517/wrappers.py", line 255, in _call_hook
    raise BackendUnavailable(data.get('traceback', ''))
pip._vendor.pep517.wrappers.BackendUnavailable: Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/pip/_vendor/pep517/_in_process.py", line 63, in _build_backend
    obj = import_module(mod_path)
  File "/usr/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'ext'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.