Giter Site home page Giter Site logo

ml-projects's Introduction

Hi! I'm a data scientist and this my project repository.

About me:

I am a formally trained, certified data scientist from INSOFE, Hyderabad with a year of experience as a Software Developer in Standard Chartered GBS, Bengaluru. Since college, I have been in love with coding. It was there that I found out about data science and AI and I've been actively learning and working in this field ever since. In my two years (as of writing this) of experience in data science, I have acquired skills and knowledge in: R, Python, SQL, Java, Machine Learning, Deep Learning, RDBMS and big data tools like Hadoop, Spark, Hive and Impala.

Projects:

Built a script to visit company url, scrape data and apply Natural Language Processing techniques and clustering to identify and differentiate between Venture Capital firms and Private companies and extract keywords associated with each to improve query detection via google search. Steps taken:

  • Found out the company type from it's name by identifying keywords such as LLC, LLP, Inc etc. and added it as feature.
  • Built a webscraper that visits the company's website and scrapes data from html tags.
  • Collected the data into a dictionary corpus of bag of words, applied lemmatization, tokenization, stop word removal and tfidf vectorizer to process the text data.
  • Iteratively viewed top ranking words and identified more noise to be added to the list of stop words and vectorized via tfidf.
  • Conducted Principal Component Analysis to reduce feature length from ~8100+ to 175 while minimising loss of information.
  • Split the data via stratified splitting to maintain class proportions.
  • Applied Logistic Regression and conducted hyperparameter tuning to achieve optimal AUC scores on both train and test sets and reduce overfitting. Also made changes to stop words list iteratively to achieve the same.
  • Extracted top 20 keywords associated with each class.

Project on predicting the appropriate demand of different products by analysing sources and seller channels to minimise losses incurred due to buyback/wastage. This was my qualifying project used for graduating from INSOFE.

Collected S&P 500 stock data by scraping ticker labels from Wikipedia, analysed stock trends and applied machine learning to automate decision making on whether to buy, sell or hold a given company's stocks.

ml-projects's People

Contributors

sn9691 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.