Giter Site home page Giter Site logo

Comments (7)

abelriboulot avatar abelriboulot commented on June 11, 2024

Hey @ankitkr3 !
So one easy way to calculate cosine similarity between two sentences which I have used in the past it to simply compute the encoder embeddings for each of them, average it across tokens, and then compute the cosine sim. I've added an example in examples/compute_cosine_similarity.py !

from onnxt5.

ankitkr3 avatar ankitkr3 commented on June 11, 2024

@abelriboulot Hey!
Thanks, i will check it out.

from onnxt5.

ankitkr3 avatar ankitkr3 commented on June 11, 2024

Hi @abelriboulot
The accuracy is not as good as bert models, how can we increase the accuracy for better contextualized embeddings?

from onnxt5.

abelriboulot avatar abelriboulot commented on June 11, 2024

Hey @ankitkr3 ,
Could you explain what you mean when you talk about accuracy? For embeddings often times adding the prefix of a task ("summarize: ") will help (and I'd be curious whether that improves things for you). Otherwise there are other models made to produce nicely structured embedding spaces like that one: https://tfhub.dev/google/universal-sentence-encoder-multilingual-qa/3

from onnxt5.

ankitkr3 avatar ankitkr3 commented on June 11, 2024

@abelriboulot i am comparing the similarity between two paragraphs here, and i want to achieve high semantic calculations.

from onnxt5.

abelriboulot avatar abelriboulot commented on June 11, 2024

So when you do the cosine similarity you basically measure how far away two embeddings are. So it's not a very relevant measure in the abstract. The way that you can go about figuring out whether things are well distanced are not is by evaluating how far two similar embeddings are compared to one of those embeddings vs. something that is dissimilar. So say for instance: "The sky is blue" and "The weather is clear" should be closer to each other than "The sky is blue" and "I ate pudding today".

from onnxt5.

ankitkr3 avatar ankitkr3 commented on June 11, 2024

@abelriboulot yes, but how can we achieve better accuracy for such semantic comparisons, can we train the model on such task and then just output the embeddings ?

from onnxt5.

Related Issues (16)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.