Giter Site home page Giter Site logo

makcedward / nlp Goto Github PK

View Code? Open in Web Editor NEW
1.1K 1.1K 326.0 2.24 MB

:memo: This repository recorded my NLP journey.

Home Page: https://makcedward.github.io/

Python 61.82% Shell 1.43% sed 1.59% Jupyter Notebook 35.16%
ai data-science deep-learning machine-learning nlp

nlp's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nlp's Issues

ULMFiT from fastai?

Hi,

Just found your blog and I there are many useful information
I just found that you put ULMFiT under Openai in your readme table while according to I know, it is from fastai.
Thanks for compiling and sharing your knowledge.

InferSent error (help needed)

Hi, I am getting an error while generating InferSent embeddings. The error is as follows, with details at the end of this email

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 11: invalid start byte

The error occurs after I run infer_sent_embs.build_vocab(x_train, tokenize=True) .

Note that I ran your code in Google Colab. Also note that the links to InferSent in the python file infersent.py also need to be updated (expired links).

The new links are

INFERSENT_GLOVE_MODEL_URL = 'https://dl.fbaipublicfiles.com/infersent/infersent1.pkl'
INFERSENT_FASTTEXT_MODEL_URL = 'https://dl.fbaipublicfiles.com/infersent/infersent2.pkl'

`

UnicodeDecodeError Traceback (most recent call last)
in ()
----> 1 infer_sent_embs.build_vocab(x_train, tokenize=True)
2 x_train_t = infer_sent_embs.encode(x_train, tokenize=True)
3 x_test_t = infer_sent_embs.encode(x_test, tokenize=True)

3 frames
/usr/lib/python3.6/codecs.py in decode(self, input, final)
319 # decode input (taking the buffer into account)
320 data = self.buffer + input
--> 321 (result, consumed) = self._buffer_decode(data, self.errors, final)
322 # keep undecoded input until the next call
323 self.buffer = data[consumed:]

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 11: invalid start byte
`

Doc2Vec

@makcedward I am trying to retrieve similar documents from the given document. Here is the code snippet:

x_train_t = doc2vec_embs.encode(documents=x_train)
x_test_t = doc2vec_embs.encode(documents=x_test)

def similiar_docs(doc2vec_embs, test_sample):
sims = doc2vec_embs.model.docvecs.most_similar([test_sample], topn=1)
for s in sims:
print(x_train[s[0]])

test_sample = x_test_t[0]
print(x_test[0])
similiar_docs(doc2vec_embs, test_sample)

However, the retrieved docs aren't similar. Am I missing something here?

About DataSet

How can i get glove.6B.50d.vec which is imported in sample/nlp-word_mover_distance.ipynb of this repository

nlp-character_embedding.ipynb

Hi, thank you for the awesome code on character embedding model, I had a lot of fun playing with it.
One little suggestion on the code, in CharCNN, def build_char_dictionary, you put chars = list(set(chars)). This will mess up the chars order in char_dict. Every time I start a new notebook, the chars' order will be different, therefore result in a different dictionary. What happened to me is I tried to load my trained keras model in a new notebook and found out that the model is not working. In the end I figured out it is because of my char_indices in preprocess step in totally different than old one. I didn't save the old char_indices before so I have no choice but to retrain the model, lol.

Cell #31 ValueError master/sample/nlp-model_interpretation_shap.ipynb

explainer = shap.DeepExplainer(pipeline.model, encoded_x_train[:10])
shap_values = explainer.shap_values(encoded_x_test[:1])

x_test_words = prepare_explanation_words(pipeline, encoded_x_test)
y_pred = pipeline.predict(x_test[:1])
print('Actual Category: %s, Predict Category: %s' % (y_test[0], y_pred[0]))

shap.force_plot(explainer.expected_value[0], shap_values[0][0], x_test_words[0])

RETURNS:

ValueError: Dimensions must be equal, but are 10 and 100 for '{{node gradient_tape/functional_1/global_max_pooling1d/truediv_1}} = RealDiv[T=DT_FLOAT](gradient_tape/functional_1/global_max_pooling1d/sub_1, gradient_tape/functional_1/global_max_pooling1d/sub)' with input shapes: [10,512], [10,100,512].

No module named aion. Issue while importing aion

Hello Edward,

Thank you for the great article and detailed blog post about ELMO. I used one of the resources you posted for ELMO in keras before, I thought that implementation lacked some things, like ability to normalize vectors, your implementation is certainly better than the one in resources.

When I tried to import aion, I'm facing an error which says "ModuleNotFoundError: No module named 'aion'". I know that this is a fairly common packaging issue, but I wasn't able to resolve it, I didn't find any reliable online source either to be able to fix it. Please let me know if you have faced such an issue at all, any pointers will be appreciated.

I was able to successfully download aion by "pip install aion"

I'm using:

Python 3.6.7

Windows 10

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.