Giter Site home page Giter Site logo

analytics_projects's Introduction

Data Science Projects

Repository containing portfolio of data science projects completed by me for academic, self learning, and hobby purposes. Presented in the form of Jupyter Notebooks.

For a more contents visit https://www.christianhaller.me/

Contents

Machine Learning

  • Equinor Volve LogML: Predicting missing geophysical logs from an open real-world dataset.
    Modules: Scikit-Learn.

  • Detecting Parkinson’s Disease: Classifying speech recordings of PD patients and healthy candidates.
    Modules: XGBoost.

  • SEG Facies Classifiacation: Training a model to predict sedimentary facies in a Kansas gas field for the classic SEG competition.
    Modules: XGBoost, SciKit-Optimize.

  • House Sales and Price Prediction in King County (Seattle): The project explores different house sales features and regression modelling techniques for optimizing price prediction.
    Modules: SciKit-Learn.

  • Permeability Prediction from Thin Sections: Evaluation of various Machine Learning/Deep Learning using Cross Validation algorithms trained on reservoir-rock thin sections. Deployment on cloud for inference.
    Modules: SciKit-Learn, TensorFlow, Keras, NumPy, Pandas, Matplotlib, Seaborn.

  • Sonar (chirp) Data Classification of Underwater Mines and Rocks: Train neural networks on sonar data. Prediction will distinguish two classes: rock and mine (i.e., metal surface). Use various neural network designs and a grid searches on each design to find an optimal model.
    Modules: SciKit-Learn, TensorFlow, Keras, NumPy, Pandas, Matplotlib.

  • Time Series Modeling - Sunspot Activity: The Sunspot Activity project examines making time series predictions using LSTM and other deep learning networks. Sunspots are dark spots on the sun, associated with lower temperature that were recorded scientifically since the 1700s.
    Modules: TensorFlow, Keras, NumPy, Pandas, Matplotlib.

  • Brent Crude Oil price prediction with LSTM: Price time-series modeling with LSTM and comparison of performance of Mean Absolute Error and Mean Square Error.
    Modules: TensorFlow, Keras.

Natural Language Processing

  • Three-Way Sentiment Analysis for Twitter Tweets: Twitter sentiment analysis (positive, negative, neutral) classification model for tweets, without using NLTK's sentiment analysis engine.
    Modules: NLTK, SciKit-Learn.

  • Two-Way IMDB Film Database Sentiment Analysis: Analyze 25,000 movie reviews in IMDB if positive (1) or negative (0) sentiment with a relatively simple LSTM (Recurrent Neural Network).
    Modules: TensorFlow, Keras.

  • Medical Chatbot with NLTK: Ingest communication and responses to train an NLP model. Then implement a GUI to make inferences and interact with the catbot and get respones.
    Modules: NLTK, TensorFlow, Keras, Numpy, tkinter.

Data Analysis and Visualisation

  • Smart AirBnB booking in Berlin (dataset 2020-08-30): Analysis of the price variability in Berlin's AirBnB listings scraped in August 2020. A huge data set. Which district, which amenities, and what time of the year are best value?.
    Modules: Pandas, Matplotlib, Seaborn, Scikit-Learn.

  • Exploring US Economic Data with a Dashboard: This project intents to visualize simple time-series data in a dashboard and makes it permanently available in an S3 bucket.
    Modules: Bokeh.

  • Toronto Neighborhoods Analysis: The project explores the Wikipedia data on Toronto (Canada) neighbourhoods with the post code M and will create labelled, interactive maps.
    Modules: Pandas, BeautifulSoup, Folium.

  • Shopping Mall development in Charlotte, North Carolina, U.S.A.: City and Foursquare data on shopping malls is compared and knn-clustered by shopping-mall density per neighborhood to offer insight where new shopping malls may be a good fit.
    Modules: Pandas, BeautifulSoup, SciKit-Learn, Foursquare-API, ESRI geocoding, Folium.

  • Market Analysis for Tech Stocks: Ingest, visualize, evaluate risk, and Monte-Carlo simulate prices for some technology stocks: Apple, Google, Microsoft, Amazon.
    Modules: Pandas, Numpy, Matplotlib, Seaborn.

  • 911 Calls Exploration (dataset 2020-07-29): This exploration will analyze the emergency call (911) dataset from Kaggle containing Fire, Traffic, Emergency Medical Services (EMS) incidents for Montgomery County, Pennsylvania.
    Modules: Pandas, Matplotlib, Seaborn.

Computer Vision

analytics_projects's People

Contributors

christianhallerx avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.