wzchen / stock_market_prediction Goto Github PK

View Code? Open in Web Editor NEW

344.0 344.0 177.0 932 KB

Team Buffalox8 predicts directional movement of stock prices.

Home Page: https://sites.google.com/site/predictingstockmovement

Python 100.00%

stock_market_prediction's Introduction

################################################################################

Description

################################################################################

Our code and process notebook for our analysis and predictive modeling approaches to understand directional stock movements. We have launched a website to showcase our work at https://sites.google.com/site/predictingstockmovement/

################################################################################

Objective

################################################################################ The objective was the predict the directional movement of a stock on day 10, given the opening, closing, min, max, and volume of a stock in the previous 9 days (and given the opening price of a stock on day 10)

################################################################################

Team

################################################################################ Team Name: Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo

Team Members: William Chen '14, Sebastian Chiu '14, Salena Cui '15, Carl Gao '15

################################################################################

Result

################################################################################ We submitted our Ridge-Random Forest model to the Boston Data Week hackathon hosted at Hack/Reduce. Information about the competition is available at https://inclass.kaggle.com/c/boston-data-festival-hackathon

We placed 1st out of 21 teams, and were able to achieve a 94.119% AUC on the private leaderboard

################################################################################

Files

################################################################################ process.ipynb notebook Notebook describing our work and our main contributions

model_tuner.py or model_tuner.ipynb Find the parameters for the ridge regression and random forest regression that we used

model_stacker.py or model_stacker.ipynb Stack our two final models

test.csv and training.csv Official data

predictions/ contains both final submissions. Winning submission is in the stacker directory

################################################################################

Official Data Description

################################################################################

training.csv - time series for 94 stocks (94 rows). First number in each row is the stock ID. Then data for 500 days. Data for each day contain - day opening price, day maximum price, day minimum price, day closing price, trading volume for the day. Price data normalised to the first day opening price.

test.csv - data to create prediction. Data provided for 25 time segments. Each segment contains data for the same 94 stocks. Each segment has opening, max, min, closing, volume data for 9 days and opening for day #10. Each line of the file starts with segment number following by stock ID and then price and volume data organized by day the same way as training set. Price data normalised to the first day opening price.

Each line in train.csv and test.csv contains consecutive trading days. Days when market was closed were excluded. Thus day N may be Friday and day N+1 may be Monday or even Tuesday if Monday was a holiday.

Value to predict - probability of stock moving up from opening of day 10 to closing of day 10. Prediction should be in 0-1 range, where 1 - "stock surely will go up", 0- "stock surely will go down".

Test set is randomly sampled without overlapping from year following training data time period.

stock_market_prediction's People

Contributors

Stargazers

Watchers

Forkers

saryamane korriliam natron34 wilzh40 anoopcse silky wuxj1991 geert yuhuiyang xzflin archaia xingfet guanlongtianzi cfandy cheaster einsnull orangelpai keyua-cisco zhchxi11 starkmchen oliverhsiung stevenlee-belief duthchao louxi11 chrinide tianbaoyang fujianhai oftensmile rodneyawhite thundertrick yliuhb yiiwood c3754541652 rlkelly simiden oradba2014 mathew-kurian prabhanshuattri raghavjoshi davidderhy giovireg suchitgupta60 omkara209 codeashu fabrizioff shudder timelf123 asrivastava1273 zluo lijie6504961 jsonbao anji-vaidyula hykinel shuaiyan akansal1 fredwang222 2453929471 nakulrk95 simudream wizlee sajagshakya jesmine0902 jude2014 digitake tpopenfoose 21stock zigengliu neverspill kookro imyin ayushchopra96 suzhiba jnakkash zjtsunshine mitchryan3 hahachat karangautam jgabriellima rishimadhok chengyongice hubfire huangjicong ianmadlenya subigyapanta awoll42 mfouda vanova scoppio jeetkarsh gucasbrg jawkhan andythewhale lijielife prodahouse renangbarreto sanpam spark-lin seanzhou1023 wangximei xxx975

stock_market_prediction's Issues

test.csv 25 time segments question

@wzchen how are these calculated (i thought it was the past 9 days and calculating for the 10th) however i have just been informed that it's 25 time segments, how are you calculating and what sort of data?

thanks

IndexError: too many indices for array

Algorithm works as expected when dataframe is reversed by date

Hi, I try your scripts and get 50% AUC. If I try it with reversed dataframe,it will works as expected as your results. Did I do something false in my codes?

import pandas as pd
import numpy as np
from Score import auc
from sklearn.ensemble import RandomForestClassifier,AdaBoostClassifier,VotingClassifier
from sklearn.metrics import accuracy_score,classification_report
def MakeDataframe(csv,reverse):
    df = pd.read_csv(csv)
    if reverse:
        df = df[::-1]
    df = df[['open','close']]
    #print df.head(12)
    return df.values

def Window(dataset):
    dataX,dataY = [],[]
    for i in range(len(dataset) - 9):
        item = dataset[i:(i+10),0:2]
        item = item.ravel()
        if item[19] >= item[18]:
            target = 1
        else:
            target = 0
        dataY.append(target)
        item = item[:19]
        item = item / item[0]
        dataX.append(item)

    return np.array(dataX), np.array(dataY)

reverse = True
train = MakeDataframe('./CSV/EURUSD-2016-10-1W_1M.csv',reverse)
test = MakeDataframe('./CSV/EURUSD-2016-10-2W_1M.csv',reverse)

X,Y = Window(train)
X_test,Y_test = Window(test)

Y = Y.astype('int')
Y_test = Y_test.astype('int')
model = RandomForestClassifier(n_jobs=-1,n_estimators=100,oob_score=True,verbose=False)
#model2 = AdaBoostClassifier()

#model = VotingClassifier([('RF',model1),('Ada',model2)],n_jobs=-1,voting='soft')
model.fit(X,Y)

Y_pred = model.predict_proba(X_test)[:,1]
predicted = model.predict(X_test)
score = accuracy_score(Y_test,predicted)
report = classification_report(Y_test,predicted)
print score
print auc(Y_test,Y_pred)
print report
print model.oob_score_

I also attach two CSV files I used in the script.

Thank you so much and best regards

CSV.zip

Hi,I found a problem with your data.

Hi, We're graduate student from Dalian University of Technology. we are doing a research project relate to stock prediction, and incorporate your method described in this github project. But we met some issues, which described as below.

We redo the experiment as your method, but we can't get the expected result, which the AUC is about 90%, we can just get a accuracy about 50%.
We did some analysis about our results and found that if we reverse the data set by date, that is we use the stock data of next 9 days to predict the previous day, and we get the expected result as yours.

So, could you give me any suggestions about my issues? we are waiting for your response.
My email is: [email protected]
Wechat account is: gkdlll

Thanks and best regards for you.

wzchen / stock_market_prediction Goto Github PK

stock_market_prediction's Introduction

Description

Objective

Team

Result

Files

Official Data Description

stock_market_prediction's People

Contributors

Stargazers

Watchers

Forkers

stock_market_prediction's Issues

test.csv 25 time segments question

IndexError: too many indices for array

Algorithm works as expected when dataframe is reversed by date

Hi,I found a problem with your data.

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent