Giter Site home page Giter Site logo

lorransutter / predictstock-svm Goto Github PK

View Code? Open in Web Editor NEW
27.0 4.0 13.0 283 KB

ML model for stock trend prediction using Python

License: GNU General Public License v3.0

Python 100.00%
stock-trend-prediction support-vector-machine python3 sklearn machine-learning random-forest kfold-cross-validation numpy pandas matplotlib

predictstock-svm's Introduction

Predict Stock

Classification model for predicting stock market trending, based on machine learning techniques, such as Extremely Randomized Trees, K-Means, Support Vector Machines and K-Fold Cross-Validation.

Coding presented as part of the Capstone Project in Computational Engineering of the Universidade Federal de Juiz de Fora

Problem presentation  |   Dependencies  |   How to run  |   Technologies  |   References  |   Credits  

📈 Problem presentation

This project aims to demonstrate an application of machine learning methods in predicting the oscillation of the stock market. Different techniques will be employed in order to create a more robust model and improve the predictions accuracy.

The pipeline and short description of the employed methods are as follows:

  1. Data acquiring: Acquire stock history value using pandas-datareader.

  2. Data preparation: Remove missing and unecessary data using pandas.

  3. Apply indicators: Apply financial indicators in data collected using pandas.

  4. Feature selection: Extremely Randomized Trees

    Supervised method used to solve classification and regression problems. It is a variation of the classic Random Forests, which adds more randomization in node partition and choice of training sets. These changes reduce the bias and the variance of the model, proposing to alleviate the problems of underfitting and overfitting, respectively.

    In the present problem, this method was used as a feature selector, measuring the importance of each financial indicator in the prediction.

  5. Clusterization: K-Means

    Unsupervised method used in partitioning or clustering, which organizes the elements of a set into groups (clusters) so that the elements resemble each other. The number of clusters must be defined initially and this becomes the starting point of the method.

    This method was employed to clusterize the data and reduce the number of support vectors in the next step.

  6. Classification: Support Vector Machines

    Supervised method used to solve classification and regression problems with linear or nonlinear data. This methods aims to find the hyperplane that separates the training samples of the problem in their respective classes.

    This is the main step of this pipeline, where the classified data stands for upward or downward stock oscillation.

  7. Parameter tuning: K-Fold Cross-validation

    Finally, we need a method to evaluate the parameters of the chosen model and tell what is the best combination of them.

    This method randomly split the data set in K subsets. In each iteration, one set is used for test and the remaining K-1 sets are employed for training, make possible to measure the accuracy and tuning the parameters.

📝 Dependencies

Besides, of course, Python, you will need NumPy library for numerical operations, Matplotlib library for plotting, pandas and pandas-datareader to deal with datasets, and scikit-learn to perform the machine learning algorithms itself.

You may install all dependencies with the following command:

pip3 install numpy matplotlib pandas pandas-datareader scikit-learn

🏃 How to run

After install dependencies, open your terminal in the folder you want to clone the project:

git clone https://github.com/LorranSutter/PredictStock-SVM.git

First, you will need to acquire stocks data. The following command uses the file db/NASDAQ.csv as reference to list all stocks to get data. However, if you do not want to get the data from all the available stocks, just change the file removing unwanted stocks.

python3 initGetData.py

After acquire the stocks data, results will be stored in db/stocks folder. Then, you may run the main code changing the variable ticker inside the code with the desired ticker.

python3 main.py

💻 Technologies

  • Python - interpreted, high-level, general-purpose programming language
  • Pandas - data analysis and manipulation tool
  • Pandas datareader - data access for pandas
  • Sklearn - machine learning library
  • NumPy - general-purpose array-processing package
  • Matplotlib - plotting library for the Python

📖 Main references

  • VO, V.; LUO, J.; VO, B. Time series trend analysis based on k-means and support vector machine. v. 35, p. 111–127, 1 2016
  • LEE, M.-C. Using support vector machine with a hybrid feature selection method to the stock trend prediction. Expert Systems with Applications, v. 36, n. 8, p. 10896 – 10904, 2009. ISSN 0957-4174
  • XU, Y.; LI, Z.; LUO, L. A study on feature selection for trend prediction of stock trading price. Jun 2013
  • LIMA, M. L. Um modelo para predição de bolsa de valores baseado em mineração de opinião, 2016. Dissertação de Mestrado (Programa de Pós-Graduação em Engenharia de Eletricidade), UFMA (Universidade Federal do Maranhão), São Luı́s, Brasil

🍪 Credits

Thanks for indicators implementation of Bruno Franca pandasImpl.py

predictstock-svm's People

Contributors

lorransutter avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.