Giter Site home page Giter Site logo

aimskigalimentorshipcourse's Introduction

Big Data Analytics with Python

The material in this repository was presented at a training workshop in Kigali, Rwanda on July 15 to July 19, 2019. The training was organized by The African Institute for Mathematical Sciences (AIMS) as part of the mentorship for the winners of Big Data Challenge.

Course Outline and Goals

The goal of the course is to introduce participants to the use of Python to perfom data science tasks such as data ingestion, data analysis and machine learning with focus on processing of large scale datasets. This course is different from regular online courses as it uses real life datasets and case studies to challenge participants with real world data science problems, instead of solving toy problems. The course has four main components as follows:

  • Introduction to Python: the focus here is to provide participants with skills in Python programming which they can utilize in the rest of the course.
  • Python for Data Science: here, the course provides a tour of the essential Python tools for data science so that the participants are familiar with them.
  • Big Data Processing with Pyspark: this component introduces tools for handling large scale data. The focus is on Apache Spark as distributed data processing engine.
  • Machine Learning in Python: the course introduces participants to essential Python libraries for ML: sciki-learn and TensorFlow.
  • Case Studies: in order to go beyond hello world and toy problems, the case studies challenges participants with real life data science problems.

Repository Setup

The materials are organised into folders by day. All the code live in the src folder. Due to large size of powerpoint files, these are not included in the repository, instead you can find uptodate powerpoint slides here. Also, some datasets arent included in the repository. All the code use Python 3.

Pre-course Training Materials

In the Big Data Analytics with Python course, we will use the Python programming language to interact with data. To ensure that participants gain the most out of the course, we require that you have basic skills in Python. To this end, I have suggested course materials which you should complete in preparation for the course.

Introduction to Python

See below two links for free Python courses. You need only do one of the courses, but you can do both if you will. They are both free and will take less than 5 hours of your time. Once you finish the course(s), you will have the prerequisite Python knowledge to enable you gain the most out of the 5-day course.

  1. Free Udemy Python Course

  2. Another Free Udemy Python Course

Github

We will use Github for tracking our code and submitting exercises. As such, its important that you make yourself familiar with Github. Refer to the links below for Github training materials.

  1. Github tutorial on Youtube
  2. Github tutorial

aimskigalimentorshipcourse's People

Contributors

dmatekenya avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.