In the above Jupiter notebook, in function: <div class="snippet-clipboard-content

Thank you <a class="user-mention notranslate" data-hovercard-type="user" data-hovercar

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Looks good <a class="user-mention notranslate" data-hovercard-type="user" data-hoverca

Thank you <a class="user-mention notranslate" data-hovercard-type="user"

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

practical-nlp/Ch4/03_Word2Vec_Example.ipynb about practical-nlp-code HOT 5 CLOSED

sai-teja-ponugoti commented on August 29, 2024

practical-nlp/Ch4/03_Word2Vec_Example.ipynb

from practical-nlp-code.

Comments (5)

varunp2k commented on August 29, 2024

Thank you @sai-teja-ponugoti for bringing this to our notice.
The fix to this most likely would be just having
np.mean(w2v_model[token],axis=0)
instead of the current
w2v_model[token]

Would you like to send a PR to fix this issue?

from practical-nlp-code.

brianalexander commented on August 29, 2024

Hi @varunp2k,

I believe the fix would be a modification of

feats.append(feat_for_this)
to
feats.append(feat_for_this/count_for_this if count_for_this > 0 else feat_for_this)

In the example, we're categorizing the tweet by taking the average of all of the word vectors for each word in the sentence. This also makes sense because previously count_for_this is included in the code, but never ends up being used.

count_for_this if count_for_this > 0 else feat_for_this is needed to handle cases where there are no token matches and avoid dividing by zero.

Here is the updated cell in its entirety:

def embedding_feats(list_of_lists, dims=300):
    zero_vector = np.zeros(dims)
    feats = []
    for tokens in list_of_lists:
        feat_for_this = np.zeros(dims)
        count_for_this = 0
        for token in tokens:
            if token in w2v_model:
                feat_for_this += w2v_model[token]
                count_for_this += 1
        feats.append(feat_for_this/count_for_this if count_for_this > 0 else feat_for_this)
    return feats

train_vectors = embedding_feats(texts_processed)
print(len(train_vectors))

from practical-nlp-code.

varunp2k commented on August 29, 2024

Looks good @brianalexander
Please send a PR

from practical-nlp-code.

sai-teja-ponugoti commented on August 29, 2024

Thank you @sai-teja-ponugoti for bringing this to our notice.
The fix to this most likely would be just having
np.mean(w2v_model[token],axis=0)
instead of the current
w2v_model[token]

Would you like to send a PR to fix this issue?

@brainalexander have you sent one already?

from practical-nlp-code.

varunp2k commented on August 29, 2024

@sai-teja-ponugoti
A PR stands for Pull Request. This article will explain what it is in a detail.
https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests
Brainalexander is yet to send a PR.
You can check the active PR's for this repo under the pull requests section right next to the issues. The link to that is here.

from practical-nlp-code.

Recommend Projects

practical-nlp/Ch4/03_Word2Vec_Example.ipynb about practical-nlp-code HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent