Giter Site home page Giter Site logo

stock_market_prediction's Introduction

################################################################################

Description

################################################################################

Our code and process notebook for our analysis and predictive modeling approaches to understand directional stock movements. We have launched a website to showcase our work at https://sites.google.com/site/predictingstockmovement/

################################################################################

Objective

################################################################################ The objective was the predict the directional movement of a stock on day 10, given the opening, closing, min, max, and volume of a stock in the previous 9 days (and given the opening price of a stock on day 10)

################################################################################

Team

################################################################################ Team Name: Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo

Team Members: William Chen '14, Sebastian Chiu '14, Salena Cui '15, Carl Gao '15

################################################################################

Result

################################################################################ We submitted our Ridge-Random Forest model to the Boston Data Week hackathon hosted at Hack/Reduce. Information about the competition is available at https://inclass.kaggle.com/c/boston-data-festival-hackathon

We placed 1st out of 21 teams, and were able to achieve a 94.119% AUC on the private leaderboard

################################################################################

Files

################################################################################ process.ipynb notebook Notebook describing our work and our main contributions

model_tuner.py or model_tuner.ipynb Find the parameters for the ridge regression and random forest regression that we used

model_stacker.py or model_stacker.ipynb Stack our two final models

test.csv and training.csv Official data

predictions/ contains both final submissions. Winning submission is in the stacker directory

################################################################################

Official Data Description

################################################################################

training.csv - time series for 94 stocks (94 rows). First number in each row is the stock ID. Then data for 500 days. Data for each day contain - day opening price, day maximum price, day minimum price, day closing price, trading volume for the day. Price data normalised to the first day opening price.

test.csv - data to create prediction. Data provided for 25 time segments. Each segment contains data for the same 94 stocks. Each segment has opening, max, min, closing, volume data for 9 days and opening for day #10. Each line of the file starts with segment number following by stock ID and then price and volume data organized by day the same way as training set. Price data normalised to the first day opening price.

Each line in train.csv and test.csv contains consecutive trading days. Days when market was closed were excluded. Thus day N may be Friday and day N+1 may be Monday or even Tuesday if Monday was a holiday.

Value to predict - probability of stock moving up from opening of day 10 to closing of day 10. Prediction should be in 0-1 range, where 1 - "stock surely will go up", 0- "stock surely will go down".

Test set is randomly sampled without overlapping from year following training data time period.

stock_market_prediction's People

Contributors

wzchen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stock_market_prediction's Issues

test.csv 25 time segments question

@wzchen how are these calculated (i thought it was the past 9 days and calculating for the 10th) however i have just been informed that it's 25 time segments, how are you calculating and what sort of data?

thanks

Algorithm works as expected when dataframe is reversed by date

Hi, I try your scripts and get 50% AUC. If I try it with reversed dataframe,it will works as expected as your results. Did I do something false in my codes?

import pandas as pd
import numpy as np
from Score import auc
from sklearn.ensemble import RandomForestClassifier,AdaBoostClassifier,VotingClassifier
from sklearn.metrics import accuracy_score,classification_report
def MakeDataframe(csv,reverse):
    df = pd.read_csv(csv)
    if reverse:
        df = df[::-1]
    df = df[['open','close']]
    #print df.head(12)
    return df.values

def Window(dataset):
    dataX,dataY = [],[]
    for i in range(len(dataset) - 9):
        item = dataset[i:(i+10),0:2]
        item = item.ravel()
        if item[19] >= item[18]:
            target = 1
        else:
            target = 0
        dataY.append(target)
        item = item[:19]
        item = item / item[0]
        dataX.append(item)

    return np.array(dataX), np.array(dataY)

reverse = True
train = MakeDataframe('./CSV/EURUSD-2016-10-1W_1M.csv',reverse)
test = MakeDataframe('./CSV/EURUSD-2016-10-2W_1M.csv',reverse)

X,Y = Window(train)
X_test,Y_test = Window(test)

Y = Y.astype('int')
Y_test = Y_test.astype('int')
model = RandomForestClassifier(n_jobs=-1,n_estimators=100,oob_score=True,verbose=False)
#model2 = AdaBoostClassifier()

#model = VotingClassifier([('RF',model1),('Ada',model2)],n_jobs=-1,voting='soft')
model.fit(X,Y)

Y_pred = model.predict_proba(X_test)[:,1]
predicted = model.predict(X_test)
score = accuracy_score(Y_test,predicted)
report = classification_report(Y_test,predicted)
print score
print auc(Y_test,Y_pred)
print report
print model.oob_score_

I also attach two CSV files I used in the script.

Thank you so much and best regards

CSV.zip

Hi,I found a problem with your data.

Hi, We're graduate student from Dalian University of Technology. we are doing a research project relate to stock prediction, and incorporate your method described in this github project. But we met some issues, which described as below.

  1. We redo the experiment as your method, but we can't get the expected result, which the AUC is about 90%, we can just get a accuracy about 50%.
  2. We did some analysis about our results and found that if we reverse the data set by date, that is we use the stock data of next 9 days to predict the previous day, and we get the expected result as yours.

So, could you give me any suggestions about my issues? we are waiting for your response.
My email is: [email protected]
Wechat account is: gkdlll

Thanks and best regards for you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.