Giter Site home page Giter Site logo

rlew631 / nlp_stock_analysis Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 7.98 MB

Used NLP techniques to determine the sentiment and similarity between 10-Ks which are filed annually by publicly traded companies. These similarities were used as metrics for predicting subsequent price movements

Jupyter Notebook 100.00%
ciks tickers stock nlp sentiment-analysis similarity-score

nlp_stock_analysis's Introduction

NLP Stock Analysis

Project Overview

This project is a continuation of the Lazy Prices paper published by Lauren Cohen of HBS which speculated that a magnitude of changes in the content of a company's quarterly filing would indicate that their stock is likely to drop.

This is based on the theory that companies are reporting the bare minimum required by the SEC, that any changes would more likely than not be to disclose risk to stakeholders and that one could make a profitable trading strategy by shorting companies whose quarterly filings change substantially compared to their typical reporting pattern.

The following steps were taken to generate the data required for testing Cohen's hypothesis:

scrape_and_score.ipynb was used for:

  • download list of tickers from NASDAQ, NYSE, AMEX and the corresponding ciks (SEC labels for tickers)
  • scrape the data from the SEC(only the 10ks were downloaded for this project)
  • preprocess: HTMLs to txt
  • get cos similarity and jaccard scores

process_scores.ipynb was used for:

  • searching the 10k folders to see which ciks where processed from our stock list
  • adding the dates to the to the scores
  • generating a list of ciks/tickers/scores/date ranges
  • create final_ciks.txt containing ciks worth looking into for further analysis

Additional exploratory data analysis:

topic_modeling.ipynb was used for:

  • aggregating all the 10ks
  • cleaning/lemmatizing the documents
  • topic modeling using LDA

nlp_stock_analysis's People

Contributors

rlew631 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.