Raazesh Sainudiin's Projects
Project MEP: Meme Evolution programme. A terraformed multi-language library to do statistical experiments in Twitter.
Show-case of examples in Project MEP
This is a github repository of mobility research
Scala and Spark code for analysis of co-trajectories, in particular privacy analysis of SwapMob
Module 1 – Introduction to Data Science: Introduction to fault-tolerant distributed file systems and computing. The whole data science process illustrated with industrial case-studies. A practical introduction to the scalable data processing to ingest, extract, load, transform, and explore (un)structured datasets. Scalable machine learning pipelines to model, train/fit, validate, select, tune, test, and predict or estimate in an unsupervised and supervised setting using nonparametric and partitioning methods such as random forests. Introduction to distributed vertex-programming.
Module 2 – Distributed Deep Learning: Introduction to the theory and implementation of distributed deep learning: Classification and regression using generalized linear models, including different learning, regularization, and hyperparameters tuning techniques. The feedforward deep network as a fundamental network, and the advanced techniques to overcome its main challenges, such as overfitting, vanishing/exploding gradient, and training speed. Various deep neural networks for various kinds of data. For example, the CNN for scaling up neural networks to process large images, RNN to scale up deep neural models to long temporal sequences, and autoencoder and GANs. In this course module, we aim to ensure that all students understand the basic concepts and tools in distributed deep learning.
Information for setting up Spark Course
Spark MOOC setup and labs for DBC users
a C++ class library for statistical set processing and computer-aided proofs in statistics.
Track live sentiment for stocks from Reddit and Twitter and identify growing stocks
This is a repository for instructions on how to do Operating System Agnostic Data Engineering Science Operations
Snippets and programs from the Parallel Programming lectures.
The rooster crows immediately before sunrise, the rooster causes the sun to rise
examples to demonstrate trend-calculus and pathogen
Python Lambda Chrome Automation (naming pending)
The HTML Presentation Framework
Scalable Data Science and Distributed Machine Learning Course Book written by Raazesh Sainudiin and his WASP AI-Track PhD Students
Zeppelin version of ScaDaMaLe via docker-compose
Scalable Data Science, course sets in big data Using Apache Spark over databricks and their mathematical, statistical and computational foundations using SageMath.
public datasets
Apache Spark
Binding the GDELT universe in a Spark environment
Example applications of GDELT mass media intelligence data
This project has been created in a 4h time for the purpose of the Texata Big Data world championship.
To detect trends in time series using Andrew Morgan's trend calculus algorithms in Apache Spark and Scala from Antoine Amend's initial implementation
Example applications of spark-trend-calculus
Notebooks showcasing results using sparkDensityTree
A framework for online learning using spark structured streaming