Because the dataset is extremely large, we do not upload them to github. The sturcture of whole project folder in our working environment (remote ubuntu) is shown as what follows,
code/someCode
data/mr_1_loc
data/raw
where data/raw
contains raw datasets downloaded from kaggle.
data/mr_1_loc
contains small dataset components splitted by code/splitByLoc.sh
. In order to run the code/splitByLoc.sh
correctly, please run it after setting working dirctory to 605project
.
In order to submit parallel computation correctly, please run command after setting working dirctory to 605project/code
.
Download this repository by git clone https://github.com/YezhouLi/605project.git
.
Yezhou Li: [email protected], github link: https://github.com/YezhouLi
Xiaoxiang Hua: [email protected], github link: https://github.com/tomtomhua
Qiaoyu Wang: [email protected], github link: https://github.com/silencekk1
Lu Chen: [email protected], github link: https://github.com/LuChen525
Zheng Ni: [email protected], github link: https://github.com/TroubleZN