mkonicek / nlp Goto Github PK
View Code? Open in Web Editor NEWSimple experiments with word embeddings
Simple experiments with word embeddings
Hi Martin,
Amazing work, thank you for making this more accessible to beginners. I've recently been fascinated by the idea of using word embeddings to 'intersect' or 'combine' word meanings and was curious if you had any ideas on how to apply this using your code.
I got this idea from this paper: https://www.aclweb.org/anthology/W16-0203.pdf
Where you'd give it 2 words like:
flame and caring and it would output: cook
Or:
life and road and it would output: journey
Due to my unfamiliarity with the math they're using in the article, and how to implement this, I'm curious to know if you have any ideas on how to apply this using your existing code here. Like some new function that takes 2 words, or n words and combines their meanings or intersects them?
I am using fasttext pre-trained model based on english wikipedia. It works as expected...
https://github.com/shantanuo/pandas_examples/blob/master/nlp/fasttext_english.ipynb
But when I try the same code with some other language, I get an error as shown on this page...
https://github.com/shantanuo/pandas_examples/blob/master/nlp/fasttext_marathi.ipynb
The error is related to unicode:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 15: invalid start byte
I tried to open the file using Raw Binary option. I changed the function load_words_raw in load.py file:
with open(file_path, 'rb') as f:
And now I get a different error:
ValueError: could not convert string to float: b'\x00l\x02'
I have no idea how to handle this.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.