Giter Site home page Giter Site logo

scalableml's Introduction

COM6012 Scalable Machine Learning - University of Sheffield

Spring 2022 by Haiping Lu and Mauricio A Álvarez

In this module, we will learn how to do machine learning at large scale using Apache Spark. We will use the High Performance Computing (HPC) cluster systems of our university. If you are NOT on the University's network, you must use VPN (Virtual Private Network) to connect to the HPC.

This edition uses PySpark 3.2.1, the latest stable release of Spark (Jan 26, 2022), and has 10 sessions below. You can refer to the overview slides for more information, e.g. timetable and assessment information.

  • Session 1: Introduction to Spark and HPC
  • Session 2: RDD, DataFrame, ML pipeline, & parallelization
  • Session 3: Scalable decision trees and ensemble models
  • Session 4: Scalable logistic regression
  • Session 5: Scalable generalized linear models
  • Session 6: Scalable neural networks
  • Session 7: Scalable matrix factorisation for collaborative filtering in recommender systems
  • Session 8: Scalable k-means clustering and Spark configuration
  • Session 9: Scalable PCA for dimensionality reduction and Spark data types
  • Session 10: Apache Spark in the Cloud (guest lecture by Dr Michael Smith)

You can also download the Spring 2021 version for preview or reference.

Acknowledgement

The materials are built with references to the following sources:

Many thanks to

  • Mike Croucher, Neil Lawrence, Will Furnass, Twin Karmakharm, and Vamsi Sai Turlapati for their inputs and inspirations since 2016.
  • Our teaching assistants and students who have contributed in many ways since 2017.

scalableml's People

Contributors

haipinglu avatar maalvarezl avatar gyr0tron avatar jkwmoore avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.