mitchellspryn / kagglequoraquestionsimilarity Goto Github PK
View Code? Open in Web Editor NEWGithub repo for kaggle quora question similarity problem
Github repo for kaggle quora question similarity problem
Get following information:
*Label distribution
*Number of unique words
*Average sentence length
*Word distributions
The notebook should be generic so it can be used after cleaning is done.
Investigate using LSTM for question similarity. Start with one-hot encoding until word embedding investigation is done.
Perform stratified split on provided dataset and upload to repo
Investigate word embedding model
https://github.com/seatgeek/fuzzywuzzy. Might be a useful feature.
Devise a consistent way to clean the text, map similar words together, etc.
In the Initial Analysis notebook (notebooks/Initial Analysis.ipynb, commit 89c4a74), we saw that there was a pattern that certain questions were linked against often as duplicates. Perhaps we can use this as a mechanism for augmenting our duplicate identification process.
There was the theory floated at the meeting on 4/29 that removing emotionally charged passages from the questions may lead to better identification of the core question being asked, in turn making our models more accurate.
Let's test that theory.
Create a notebook / code that predicts sentiment for given window sizes. Find a way that for a given window size / threshold value:
Use the above two to identify optimal window size / threshold value / other algorithm details.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.