Name: MICHAEL NYAGA
Type: User
Company: @traveladventureyacht
Bio: Aspiring data scientist with a passion for data, reading, sports, and a whole plethora of stuff! I believe anyone can do or study, and learn anything they wish!
Twitter: michaelnyaga1
Location: https://www.pinterest.com/traveladventureyacht/
Blog: https://www.linkedin.com/in/michaelnyaga-kenya/
MICHAEL NYAGA's Projects
A first look at a neural network (Deep Learning With Python, Francois Chollet Coursework). Let’s look at a concrete example of a neural network that uses the Python library Keras to learn to classify handwritten digits. The problem we’re trying to solve here is to classify grayscale images of handwritten digits (28 × 28 pixels) into their 10 categories (0 through 9). We’ll use the MNIST dataset, a classic in the machine-learning community, which has been around almost as long as the field itself and has been intensively studied. It’s a set of 60,000 training images, plus 10,000 test images, assembled by the National Institute of Standards and Technology (the NIST in MNIST) in the 1980s.
A convnet takes as input tensors of shape (image_height, image_width, image_channels) (not including the batch dimension). In this case, we’ll configure the convnet to process inputs of size (28, 28, 1), which is the format of MNIST images. We’ll do this by passing the argument input_shape=(28, 28, 1) to the first layer.
Data on births in the United States, provided by the Centers for Disease Control (CDC). This data can be found at https://raw.githubusercontent.com/jakevdp/data-CDCbirths/master/ births.csv (this dataset has been analyzed rather extensively by Andrew Gelman and his group. We will focus on the pivot table object again, sigma clipping method and incorporating the datetime index to explore the data further.
Practice dataset (Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow Concepts, Tools, and Techniques to Build Intelligent Systems, 2E, Aurélien Géron). Build a model of housing prices in California using the California census data. This data (California Housing Prices dataset from the StatLib repository based on data from the 1990 California census) has metrics such as the population, median income, median housing price (added a categorical attribute), and so on for each block group in California. Block groups are the smallest geographical unit for which the US Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people).¶ Model should learn from this data and be able to predict the median housing price in any district, given all the other metrics.
The best way to think about data within Scikit-Learn is in terms of tables of data. For example, consider the Iris dataset, famously analyzed by Ronald Fisher in 1936. # Supervised learning example: Iris classification. Our question. Given a model trained on a portion of the Iris data, how well can we predict the remaining labels?
Using the power of the groupby object on the planets dataset on seaborn.
Machine learning has a phenomenal range of applications, including in health and diagnostics. This tutorial will explain the complete pipeline from loading data to predicting results, and it will explain how to build an X-ray image classification model from scratch to predict whether an X-ray scan shows presence of pneumonia. This is especially useful during these current times as COVID-19 is known to cause pneumonia. This tutorial will explain how to utilize TPUs efficiently, load in image data, build and train a convolution neural network, finetune and regularize the model, and predict results. Data augmentation is not included in the model because X-ray scans are only taken in a specific orientation, and variations such as flips and rotations will not exist in real X-ray images.
Coursework from Deep Learning With Python by Francois Chollet. We will work with the IMDB dataset: a set of 50,000 highly polarized reviews from the Internet Movie Database. They’re split into 25,000 reviews (sequences of words have been turned into sequences of integers, where each integer stands for a specific word in a dictionary), for training and 25,000 reviews for testing, each set consisting of 50% negative and 50% positive reviews.
As a more involved example of working with some time series data, let’s take a look at bicycle counts on Seattle’s Fremont Bridge. This data comes from an automated bicycle counter, installed in late 2012, which has inductive sensors on the east and west sidewalks of the bridge. The hourly bicycle counts can be downloaded from https://data.seattle.gov/Transportation/Fremont-Bridge-Bicycle-Counter/65db-xm6k; here is the direct link to the dataset.
Merge and join operations come up most often when one is combining data from different sources. Here we will consider an example of some data about US states and their populations. The data files can be found at http://github.com/jakevdp/data-USstates/: Given this information, say we want to compute a relatively straightforward result: rank US states and territories by their 2012 population density. We clearly have the data here to find this result, but we’ll have to combine the datasets to get it.
We’ll use the database of passengers on the Titanic, available through the Seaborn library to motivate the pivot table object. This contains a wealth of information on each passenger of that ill-fated voyage, including gender, age, class, fare paid, and much more.