Comments (3)
Hi Jared, I'd be very surprised if the multithreading would cause information leakage, because the different threads are completely independent of each other. Using joblib or not only helps to speed up the computations.
Unfortunately we didn't set a global random seed, so it might be impossible to obtain completely identical results. We did fix the random seed in some key functions that use random operations though.
I can see a few small differences between our notebook and your code. We use a StratifiedShuffleSplit strategy to split the data into training and test sets, whereas you just shuffle the data with NumPy and hence don't necessarily preserve the proportion of samples for each class.
If you want to reproduce the results but don't want to use multithreading, rather than running 100 different jobs you can just set the n_jobs
parameter for the joblib.Parallel
call to only use a single thread. Or you can remove the joblib wrapper and run the model like this:
train_index, test_index in sss.split(X, y):
result.append(run_predictor(predictor, X, y, train_index, test_index))
This requires minimal code changes, and I would expect you to get similar performance as we did.
from tcr-classifier.
I am getting the accuracy you report in your paper. I think I see what I did wrong! I had the training and test data flipped in my code. So basically I fit the model with 20% of the data and tested the model with 80% of the data. Now that I am using the data correctly my accuracy is approximately 75%
from tcr-classifier.
Aha, that would indeed influence the results significantly. 🙂 Thank you for tracking that down.
from tcr-classifier.
Related Issues (2)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tcr-classifier.