makcedward / nlp Goto Github PK

View Code? Open in Web Editor NEW

1.1K 1.1K 326.0 2.24 MB

:memo: This repository recorded my NLP journey.

Home Page: https://makcedward.github.io/

Python 61.82% Shell 1.43% sed 1.59% Jupyter Notebook 35.16%

ai data-science deep-learning machine-learning nlp

nlp's People

Stargazers

Watchers

Forkers

qss2012 ikhedr argenisleon sumhncku nishantsbi donaldtse118 egzonsyka sysujayce o-obigface paps272003 alfords viaconbodhi anuragreddygv323 rajpratim mandeepint ev-chan kabdeslam pmaxit jbdatascience firojalam ngo010 szawan shangcaiwangtao shameer-mohd amimul hieupbc saravananpsg abiraja2004 rohitn sgsunil antoinelypro assassinsurvivor benlilph iamjoshbinder sohamray19 hari4om denka0711 hamzahanif95 pappakrishnan topgrace wanggebi1 shamaahlawat ktp-forked-repos ldmichel ismailmu sandy4321 saradhix aosman96 sauravzach mejihero vamshi26 awasthi-swapnil almoslmi ruskinmanku rlthomasphd andrewpatterson2018 lvaleriu ye-man man0007 fred-hz yakshaving labco jayasuryajsk rastogi-sachin hidhineshraja hamedmx aascode nishant-sethi lightops31 abhishekbhat1 tuanthng xxzcool paulrigor zrajabi gedman4b yogi0421 pidugusundeep alvinrindra ravish0007 stjordanis little1tow allensmile shannonyu junluo-bit kyroad karen4tree mrjaggu fushengwuyu cuitengfei2006x castrol68 charlottesean hell-to-heaven qianxianyang brightodd colionx mbrukman awesome-archive crazyguitar cclauss aicodecamp

nlp's Issues

ULMFiT from fastai?

Hi,

Just found your blog and I there are many useful information
I just found that you put ULMFiT under Openai in your readme table while according to I know, it is from fastai.
Thanks for compiling and sharing your knowledge.

returns lda instead of lsa

nlp/sample/nlp-lsa_lda.ipynb

Line 105 in 2f12277

"tfidf_vec, svd, x_train_lda, x_test_lda = build_lsa(x_train, x_test)"

In the above line, you assign the result of the LSA to the variables x_train_lda, x_test_lda. Shouldn't it be lsa in both cases?

InferSent error (help needed)

Hi, I am getting an error while generating InferSent embeddings. The error is as follows, with details at the end of this email

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 11: invalid start byte

The error occurs after I run infer_sent_embs.build_vocab(x_train, tokenize=True) .

Note that I ran your code in Google Colab. Also note that the links to InferSent in the python file infersent.py also need to be updated (expired links).

The new links are

INFERSENT_GLOVE_MODEL_URL = 'https://dl.fbaipublicfiles.com/infersent/infersent1.pkl'
INFERSENT_FASTTEXT_MODEL_URL = 'https://dl.fbaipublicfiles.com/infersent/infersent2.pkl'

`

UnicodeDecodeError Traceback (most recent call last)
in ()
----> 1 infer_sent_embs.build_vocab(x_train, tokenize=True)
2 x_train_t = infer_sent_embs.encode(x_train, tokenize=True)
3 x_test_t = infer_sent_embs.encode(x_test, tokenize=True)

3 frames
/usr/lib/python3.6/codecs.py in decode(self, input, final)
319 # decode input (taking the buffer into account)
320 data = self.buffer + input
--> 321 (result, consumed) = self._buffer_decode(data, self.errors, final)
322 # keep undecoded input until the next call
323 self.buffer = data[consumed:]

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 11: invalid start byte
`

Doc2Vec

@makcedward I am trying to retrieve similar documents from the given document. Here is the code snippet:

x_train_t = doc2vec_embs.encode(documents=x_train)
x_test_t = doc2vec_embs.encode(documents=x_test)

def similiar_docs(doc2vec_embs, test_sample):
sims = doc2vec_embs.model.docvecs.most_similar([test_sample], topn=1)
for s in sims:
print(x_train[s[0]])

test_sample = x_test_t[0]
print(x_test[0])
similiar_docs(doc2vec_embs, test_sample)

However, the retrieved docs aren't similar. Am I missing something here?

No module named aion. Issue while importing aion

Hello,

Please help me understand how to execute "from aion.util.spell_check import SymSpell". I tried to use the sys,os but got lost.

About DataSet

How can i get glove.6B.50d.vec which is imported in sample/nlp-word_mover_distance.ipynb of this repository

Where did you pull the code from?

I can't find and use the exact code that you have used. any idea on that?

nlp-character_embedding.ipynb

Hi, thank you for the awesome code on character embedding model, I had a lot of fun playing with it.
One little suggestion on the code, in CharCNN, def build_char_dictionary, you put chars = list(set(chars)). This will mess up the chars order in char_dict. Every time I start a new notebook, the chars' order will be different, therefore result in a different dictionary. What happened to me is I tried to load my trained keras model in a new notebook and found out that the model is not working. In the end I figured out it is because of my char_indices in preprocess step in totally different than old one. I didn't save the old char_indices before so I have no choice but to retrain the model, lol.

Cell #31 ValueError master/sample/nlp-model_interpretation_shap.ipynb

explainer = shap.DeepExplainer(pipeline.model, encoded_x_train[:10])
shap_values = explainer.shap_values(encoded_x_test[:1])

x_test_words = prepare_explanation_words(pipeline, encoded_x_test)
y_pred = pipeline.predict(x_test[:1])
print('Actual Category: %s, Predict Category: %s' % (y_test[0], y_pred[0]))

shap.force_plot(explainer.expected_value[0], shap_values[0][0], x_test_words[0])

RETURNS:

ValueError: Dimensions must be equal, but are 10 and 100 for '{{node gradient_tape/functional_1/global_max_pooling1d/truediv_1}} = RealDiv[T=DT_FLOAT](gradient_tape/functional_1/global_max_pooling1d/sub_1, gradient_tape/functional_1/global_max_pooling1d/sub)' with input shapes: [10,512], [10,100,512].

No module named aion. Issue while importing aion

Hello Edward,

Thank you for the great article and detailed blog post about ELMO. I used one of the resources you posted for ELMO in keras before, I thought that implementation lacked some things, like ability to normalize vectors, your implementation is certainly better than the one in resources.

When I tried to import aion, I'm facing an error which says "ModuleNotFoundError: No module named 'aion'". I know that this is a fairly common packaging issue, but I wasn't able to resolve it, I didn't find any reliable online source either to be able to fix it. Please let me know if you have faced such an issue at all, any pointers will be appreciated.

I was able to successfully download aion by "pip install aion"

I'm using:

Python 3.6.7

Windows 10

Thanks

Cosine similarity typographic error

The Cosine Similarity code has a typographic error.

States transfaormed_results[i] and should be transformed_results[i]

makcedward / nlp Goto Github PK

nlp's People

Stargazers

Watchers

Forkers

nlp's Issues

ULMFiT from fastai?

returns lda instead of lsa

InferSent error (help needed)

`

Doc2Vec

No module named aion. Issue while importing aion

About DataSet

Where did you pull the code from?

nlp-character_embedding.ipynb

Cell #31 ValueError master/sample/nlp-model_interpretation_shap.ipynb

No module named aion. Issue while importing aion

Cosine similarity typographic error

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent