Giter Site home page Giter Site logo

Comments (5)

varunp2k avatar varunp2k commented on August 29, 2024

Thank you @sai-teja-ponugoti for bringing this to our notice.
The fix to this most likely would be just having
np.mean(w2v_model[token],axis=0)
instead of the current
w2v_model[token]

Would you like to send a PR to fix this issue?

from practical-nlp-code.

brianalexander avatar brianalexander commented on August 29, 2024

Hi @varunp2k,

I believe the fix would be a modification of

feats.append(feat_for_this)
to
feats.append(feat_for_this/count_for_this if count_for_this > 0 else feat_for_this)

In the example, we're categorizing the tweet by taking the average of all of the word vectors for each word in the sentence. This also makes sense because previously count_for_this is included in the code, but never ends up being used.

count_for_this if count_for_this > 0 else feat_for_this is needed to handle cases where there are no token matches and avoid dividing by zero.

Here is the updated cell in its entirety:

def embedding_feats(list_of_lists, dims=300):
    zero_vector = np.zeros(dims)
    feats = []
    for tokens in list_of_lists:
        feat_for_this = np.zeros(dims)
        count_for_this = 0
        for token in tokens:
            if token in w2v_model:
                feat_for_this += w2v_model[token]
                count_for_this += 1
        feats.append(feat_for_this/count_for_this if count_for_this > 0 else feat_for_this)
    return feats

train_vectors = embedding_feats(texts_processed)
print(len(train_vectors))

from practical-nlp-code.

varunp2k avatar varunp2k commented on August 29, 2024

Looks good @brianalexander
Please send a PR

from practical-nlp-code.

sai-teja-ponugoti avatar sai-teja-ponugoti commented on August 29, 2024

Thank you @sai-teja-ponugoti for bringing this to our notice.
The fix to this most likely would be just having
np.mean(w2v_model[token],axis=0)
instead of the current
w2v_model[token]

Would you like to send a PR to fix this issue?

@brainalexander have you sent one already?

from practical-nlp-code.

varunp2k avatar varunp2k commented on August 29, 2024

@sai-teja-ponugoti
A PR stands for Pull Request. This article will explain what it is in a detail.
https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests
Brainalexander is yet to send a PR.
You can check the active PR's for this repo under the pull requests section right next to the issues. The link to that is here.

from practical-nlp-code.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.