Giter Site home page Giter Site logo

ganjinzero / coder Goto Github PK

View Code? Open in Web Editor NEW
67.0 2.0 5.0 5.76 MB

CODER: Knowledge infused cross-lingual medical term embedding for term normalization. [JBI, ACL-BioNLP 2022]

Home Page: https://www.sciencedirect.com/science/article/pii/S1532046421003129

Python 97.73% Shell 2.27%
nlp medical pretrained-language-model umls multi-language embeddings

coder's People

Contributors

ganjinzero avatar zengsihang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

coder's Issues

coder++

您好,我了解到您一直在做生物编码这一块,在您的issue中看到您提到您新的研究coder++,是在这个研究的基础上做了一些改进,但是我没有找到相关的论文和github链接,如果方便请您告知一下,感谢!

Requirements file?

Could you please include a conda environment file or a requirements.txt file to list all of the dependencies to run your code? Thanks!

Impact of padding strategy on CODER embeddings

Dear Authors,

Thank you for the great work!

I was reviewing the code and noticed that the way you extract embeddings is a bit different than what is typically done in terms of padding to the max len (32 tokens). Normally I don't see others who extract embeddings to do it this way, e.g., they just tokenize and pass the inputs through the model.

I tried experimenting with different token lengths and using and not using any additional padding. The results are that there are significant differences in the ultimate cosine similarity scores between the embeddings depending on whether padding is used, and how much of it is (e.g., what the max token length is).

I re-read your coder papers and didn't find anything about padding, nor could I find anything more in this repo. Can you explain why you chose the padding strategy you did? Have you experimented with not using or adjusting the amount of padding and its impact ultimately on cosine similarity between embeddings and overall performance?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.