Use NLP to extract features from EXRX.net exercise information and classify them into 8 muscle groups. Train the data with XGboost, Word2Vec, and Tensorflow.
bodybuilding web crawler 67k data entries - 440MB exrx web crawler 308 data entries - 360kB
- Chest
- Biceps
- Triceps
- Glutes
- Back
- Leg
- Abs
- Shoulders
- Remove numbers, punctuations, and https://
- TFIDFVectorizer
- Benchmark : random forest
- XGboost : gradient boosting tree
- Tensorflow NN single layer
- Tensorflow RNN : recurrent neural network
- Word2Vec: gensim word2vec model
- sklearn
- numpy
- pandas
- xgboost
- tensorflow
- json
- gensim
Bodybuilding.com data are too generic for training purpose, we opted to use EXRX.net data. Tensorflow Neural Net gives 93% accuracy within its data set. We need to expand our model to predict potential user question and classify them into 8 categories. For this we use Word2Vec model for calculating the similarity between question sentence and muscle names. The accuracy so far is unclear.