Giter Site home page Giter Site logo

leriomaggio / python-data-science Goto Github PK

View Code? Open in Web Editor NEW
37.0 6.0 32.0 93.8 MB

Lecture notes and materials for Python Data Science course

License: MIT License

Jupyter Notebook 63.16% HTML 36.80% Python 0.04%
data-science jupyter-notebooks materials machine-learning python-tutorials

python-data-science's Introduction

Programming for Data Science @ FBK Academy

This is a programming tutorial aimed at researchers and practitioners with (potentially) no prior programming experience, as well as with previous programming skills.

We will walk through several concepts to give you an introduction to some of the principal programming concepts like conditionals, functions, iterations, as well as more specialised topics like classes, objects and what's sometimes called defensive programming.

If all these terms sounds like gibberish to you, don't worry!

I'll try to show everything with simple code examples: no long and complicated explanations with fancy words. At the end of this tutorial, I am sure you will master all these concepts like a pro ๐Ÿ™Œ

Why Programming for Data Science ?

In this tutorial we will be using Python 3. Python is nowadays considered as "the" language of choice for Data Science. There are indeed many reasons for that, and many articles have been written on the subject. This article looks like a good and clear example on the topic.

A Few notes before we start

  • Q: Yes, ok.. but.. is this a tutorial on Data Science?

  • A: No. This is a tutorial on programming with Python. The perspective though is of a wanna-be data scientists.

  • Q: Cool... but.. is this a tutorial on the Python Language ?

  • A: Ehm, No again. Sorry. We will focus on programming concepts using Python as a language. Most of the concepts you will learn are shared in most of other languages (just the syntax will be different, ed.) Although there is a section in the Lecture materials named Python Extras that is specifically focusing on features of the Python language. You could read it, if interested :)

Here is what I have in mind for this course (HTH)

lecture sketch

I do hope that this (very simple) mind-map look-alike clarifies a bit the perspective I chose when I thought about this course.

tl,dr; We will dive into programming focusing on two main aspects: the Algorithmic perspective, that is "what are the steps we need to implement to solve a specific problem", and the Data Structure perspective, that is "what is the data structure that would simplify as much as possible our algorithm implementation". These two perspectives led in the past decades to two completely different approaches to programming: Procedural vs Object-Oriented, respectively.

Python allows for a lot of flexibility, and this flexibility will be our swiss-knife. In fact, Python supports multiple programming paradigms at once (i.e imperative, OOP, functional [1]), and we will be (seemingly) shifting our focus on those as we go along with the lecture materials.


1: functional programming only for the intrepid programmers of you :) See this video

Outline of the Course (at a glance)

The course is organised into six parts lectures, with the following learning path in mind:

  1. Python Programming (part 1): Introduction to Python Main Data structures, and functions;

  2. Python Programming (part 2): Advanced Data Structures and Object-Oriented Programming

  3. Scientific Python Programming and Data Processing: Numerical Processing with NumPy & Data Processing withPandas

  4. Advanced Data Objects and Data Plotting: Introduction to dataclasses and matplotlib / bokeh for interactive plotting

  5. Introduction to Scikit-Learn (sklearn) and Machine Learning Modules

  6. Project-Team work on real-cases Data Science scenarios

Lecture Materials

Note: The following section is currently incomplete, and will be updated throughout the rest of the course.

Introductory Readings (intro folder)

This part will introduce to the concept of computer programming, and to the very basics of the Python programming language:

  1. The Way of the Program
  2. Variables, Statements and Expressions
  3. Introduction to Functions
  4. Setting up an editor
  5. Conditional Statements

Regardless you have already programmed before, using Python or not, I would suggest to take a look at this introductory section anyway. There is always time to skip, based on your learning pace.

Alternatively, a good starting point would be this online course: Intro to Python by Microsoft

Programming with Python (programming_with_python folder)

This section contains the materials for the main topics that will be covered in our first two lectures. These are (in no specific order):

  1. Pythonic Functions
  2. Collections and Sequences
  3. Dictionaries
  4. Iterators, Generators, Comprehensions
  5. Classes and OOP
  6. Errors and Exceptions

Python Extras (pyhton_extras folder)

This section contains some extra notebooks you could go through to read more about some specific aspects of the Python programming language.

Note: This is the only part of the course spefically focused on how Python does things

  1. Modules
  2. Python Data Model
  3. Function as Objects
  4. Magic Methods
  5. Pythonic Coding Style

Instructions

1. Get the material

Option A: Clone (or fork) the Repository using git (Recommended)

โš ๏ธ Note: It is necessary to have Git installed in order to proceed. If you don't have git installed on your system, you need to install git first. Instructions to Install Git

๐Ÿ’ก Please also consider looking at Git CheatSheet

To acquire the lecture material it is highly recommended using git to clone the current repository. Since the repository will be constantly updated after each lesson, using git method will allow for an easier synchronisation of the material.

To clone the repository, type the following command in the terminal prompt:

git clone https://github.com/leriomaggio/python-data-science.git

โš ๏ธ Note for Windows users: Once installed git, please make sure to run the Git Terminal (or Git Prompt)

Once completed, this will create a new folder named python-data-science (presumably in your Home folder).

Well done! Now you should bear with me another few minutes, following instructions reported below ๐Ÿ™

Please now proceed to 2. Setting up your Environment

Option B: Downloading the material in a ZIP archive from GitHub (Not Recommended)

It is indeed possible to download the whole material from GitHub as a ZIP archive. Link here

However, this method is not recommended as it will be required to download the archive everytime there is an update (which means at the end of each lesson)!

2. Setting up your Environment

We will be using Jupyter lab as our interactive programming environment for this course.

This will have the great advantage of lowering the barriers in setting up the environment, and installing specialised tools. If you're not familiar with jupyter notebooks, no worries: we will get the time to familiarise with the environment as the first thing we will do!

Meanwhile, it is necessary to setup the Python Virtual Environment to run the code contained in this repository smoothly and with no headaches.

If you don't know what a Python virtual environment is, think of it as a sandbox Python installation you can have on your machine that is fully controllable and fully independent from any other Python environment you may have on your local machine.

To execute the notebooks in this repository, a few packages are required, but installing them in your Conda environment is super easy.

Step 1: Download Anaconda Python Distribution.

Note for Windows Users: More information here on the official documentation

Step 2: Set up the virtual environment:

Open a Terminal (or Anaconda Prompt on Windows) and move to the python-data-science folder, i.e. the main folder of this repository.

cd python-data-science

Now create the conda environment by typing the following command:

conda env create -f pyds.yml

This will install a new Conda environment named pyds.

Step 2.1: If you'd like to double check that the creation of the environment completed successfully, you can type:

conda info --envs

This will list all the virtual environments conda can found within your installation. pyds should appear in the list as well.

Step 3:: Activate the environment:

Once the environment is set, we need to activate it in order to use it.

conda activate pyds

๐ŸŽ‰ You should be now ready to go!

The last bit is to run your jupyter lab server, and open the notebooks:

jupyter lab

(Alternative) Setup Environment via pip

The repository also includes a requirements.txt file that can be used to install all the required packages using pip:

pip install -r requirements.txt

However this is recommended only if (A) it is not possible to install Anaconda on your machine; (B) The setup of Anaconda environment is unsuccessfull.

โš ๏ธ Either is the case it is important that the version of Python used will be Python >=3.9

Colophon

Author: Valerio Maggio (@leriomaggio), Senior Research Associate, University of Bristol.

All the Code material is distributed under the terms of the GNU GPLv3 License. See LICENSE file for additional details.

All the instructional materials in this repository is free to use, and made available under the [Creative Commons Attribution license][https://creativecommons.org/licenses/by/4.0/]. The following is a human-readable summary of (and not a substitute for) the full legal text of the CC BY 4.0 license.

You are free:

  • to Share---copy and redistribute the material in any medium or format
  • to Adapt---remix, transform, and build upon the material

for any purpose, even commercially.

The licensor cannot revoke these freedoms as long as you follow the license terms.

Under the following terms:

  • Attribution---You must give appropriate credit (mentioning that your work is derived from work that is Copyright ยฉ Software Carpentry and, where practical, linking to http://software-carpentry.org/), provide a [link to the license][cc-by-human], and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

No additional restrictions---You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.

Contacts

For any questions or doubts, feel free to open an issue in the repository, or drop me an email @ valerio.maggio_at_bristol.ac.uk

python-data-science's People

Contributors

dependabot[bot] avatar leriomaggio avatar phauly avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

python-data-science's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.