Giter Site home page Giter Site logo

Raj Mehrotra's Projects

housing-prices-eda-and-regression-models icon housing-prices-eda-and-regression-models

The famous Housing Price Advanced Regression competition on Kaggle. The dataset contains of training and testing sets each with about 1.46K rows and 81 features pertaining to a house. I have first performed an exhaustive EDA to identify the underlying trends in the data. I have also removed outliers to make the regression models more robust. Also proper missing values treatment has been done with imputation being done wherever needed. Lastly I have deployed various regression models like Lasso,Ridge etc... from scikit and have also tuned their parameters from the GridSearchCV module. Finally achieved a RMSE of little more than 0.12 which is pretty decent.

ibm-hr-analytics-employee-attrition-performance icon ibm-hr-analytics-employee-attrition-performance

The IBM HR Analytics Employee Attrition & Performance dataset from the Kaggle. I have first performed Exploratory Data Analysis on the data using various libraries like pandas,seaborn,matplotlib etc.. Then I have plotted used feature selection techniques like RFE to select the features. The data is then oversampled using the SMOTE technique in order to deal with the imbalanced classes. Also the data is then scaled for better performance. Lastly I have trained many ML models from the scikit-learn library for predictive modelling and compared the performance using Precision, Recall and other metrics.

mne-python icon mne-python

MNE : Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python

mnist-digit-recognizer-using-convnet-keras-accuracy-0.9943- icon mnist-digit-recognizer-using-convnet-keras-accuracy-0.9943-

The MNIST DIGIT RECOGNIZER COMPETITION ON KAGGLE. The training dataset consists of 42000 rows each of 784 pixel values thus representing 28 x 28 sized 42000 images of different digits from 0 to 9 . I have trained Convolutional Neural Networks written in Keras to train the model and predicted on the 28000 images of the test dataset, Also achieved 99.43% accuracy on Kaggle with 20 epochs . Also used ImageDataGenerator to augment the training set and avoid overfitting problem .

movie-reviews-nltk-sentiment-analysis- icon movie-reviews-nltk-sentiment-analysis-

The Movie Reviews dataset. The dataset is imported from the NLTK libray. It has 1000 positive and 1000 negative reviews. I have first imported the dataset into a pandas data frame which makes it easier to do the processing. The next step is to analyze the (+) and ( - ) reviews. I have also preprocessed the dataset using Lemmatizing and other standard NLP techniques. To extract the features from the text I have used the Tfidf vectorizer from the scikit. Lastly I have used various modellig algos from scikit to train on this data.

neurodsp icon neurodsp

Digital signal processing for neural time series.

object-recognition-cifar-10-cnn-keras icon object-recognition-cifar-10-cnn-keras

The famous CIFAR-10 dataset. The dataset contains of images of different objects like airplane, horse ,ship etc... that needs to be classified. The training set contains of 50000 images of 32*32 pixels each. Similarly the validation set contains 10000 images of 32*32 pixels too. I have used a self laid ConvNet to correctly classify the images into 10 classes each pertaining to one object. I have also used data augmentation using the ImageGenerator class provided in the Keras library to further increase the size of the training set and thus reduce overfitting chances. Finally I have used the ConvNet to make predictions onto the validation set and achieved a decent accuracy of near about 86%.

pokemon-data-exploration-visualization icon pokemon-data-exploration-visualization

Pokemon with stats.Data analysis and exploration is performed on the dataset. Visualization is done using the libraries seaborn,matplotlib. Bar plot,box plot,swarm plot,scatter plot,violin plot, heat map etc... were used to analyze the data.

project icon project

The Project is an Android application that displays the level of various gases in the atmosphere. The volume of gases in the atmosphere is stored in an Excel file. The data values stored in an Excel file is updated periodcally with data fetched from the sensors.The application reads the contents of the file and displays the results fetched in the application.

red-wine-quality-accuracy-0.9175- icon red-wine-quality-accuracy-0.9175-

The Red Wine Quality dataset from kaggle. Data is provided of the composition of the wine having different chemicals. I have used pandas to manipulate the data and seaborn to visualize the data. Finally I have made predictions on the wine quality by using various models from the scikit-learn.

sad_project icon sad_project

A blood bank mobile application where the user can register and login. A blood donor can register with the application and earn points. The receiver can search for donors and either call donor or locate him on the Google Maps. The application uses Java , XML and the Firebase API as backend and Google Maps API to locate the donor on the Google Maps.

spooky-author-identification icon spooky-author-identification

The notebook on famous Kaggle competition : Spooky Author Identification. The task is to identify the authors from their respective texts or work. I have first cleaned and pre-processed the text using standard NLP techniques like tokenization , stemming or lemmatization , stop-word removal etc.... I have also tried to create some meta features or hand-crafted features based on the author writing pattern. Then I have used the traditional BOW approach with TFIDF Vectorizer and the Count Vectorizer and then deployed ML algos like LogisticRegression and Naive Bayes which are well suited for text data. For me tfidf on count vectorizer gave best results till now ; My submission scored a multi-class log loss of 0.46 on kaggle private LB which is quite decent.

the-iris-species-dataset icon the-iris-species-dataset

The famous Iris Species Dataset from Kaggle. I have normalized the features and also seen their distribution. I have also deployed many algos from scikit to predict on the dataset.

tictactoe icon tictactoe

Tic Tac Toe is simple tic tac toe game developed on the android platform. The application was developed in just 2 hours for the International Organisation of Software Developers(IOSD) Hackathon.

titanic-survivor-prediction icon titanic-survivor-prediction

The Titanic: Machine Learning from Disaster competiton. With data being provided of varoius passengers traveling on the ship I have used libraries like numpy,pandas to manipulate , explore and analyze the data and libraries like matplotlib and seaborn to visualise the data. Lastly I have used various machine learning models to make predictions on the formerly cleaned and preprocessed data. Then I used GridSearchCV to optimise the parameters of the various models

topic-modelling-using-lda-and-lsa-in-sklearn icon topic-modelling-using-lda-and-lsa-in-sklearn

I have performed topic modelling on the dataset : "A Million News Headlines' on the kaggle. I have first pre-processed and cleaned the data. Then I have used the implementations of the LDA and the LSA in the sklearn library. Also the distribution of words in a topic is shown.

word-embeddings-in-gensim-and-keras icon word-embeddings-in-gensim-and-keras

A simple implementation of word embeddings in Gensim and Keras libraries. I have implemented famous Word2Vec in Gensim library. As an alternative I have also used Keras embedding layer to generate the word embeddings.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.