ganjinzero / coder Goto Github PK

View Code? Open in Web Editor NEW

67.0 2.0 5.0 5.76 MB

CODER: Knowledge infused cross-lingual medical term embedding for term normalization. [JBI, ACL-BioNLP 2022]

Home Page: https://www.sciencedirect.com/science/article/pii/S1532046421003129

Python 97.73% Shell 2.27%

nlp medical pretrained-language-model umls multi-language embeddings

coder's People

Contributors

Stargazers

Watchers

Forkers

pj0616 zengsihang jungel2star akarokr sravan212100

coder's Issues

coder++

您好，我了解到您一直在做生物编码这一块，在您的issue中看到您提到您新的研究coder++，是在这个研究的基础上做了一些改进，但是我没有找到相关的论文和github链接，如果方便请您告知一下，感谢！

pytorch_model.bin is not downloadable from hugging face GanjinZero/UMLSBert_ENG

I am trying to save UMLSBert_ENG on my machine but clicking the download icon https://huggingface.co/GanjinZero/UMLSBert_ENG/tree/main does not download the binary file and instead downloads an archive folder with data.pkl and a data folder.

Please kindly assist.

论文中计算MS loss的部分中，P和N是不是搞反了？

按照论文中的公式，P集合找的应该是和当前term不同concept的term，但理应该视作负例；而N则找的是和当前term相同concept下的term，但理应视作正例。

Requirements file?

Could you please include a conda environment file or a requirements.txt file to list all of the dependencies to run your code? Thanks!

Impact of padding strategy on CODER embeddings

Dear Authors,

Thank you for the great work!

I was reviewing the code and noticed that the way you extract embeddings is a bit different than what is typically done in terms of padding to the max len (32 tokens). Normally I don't see others who extract embeddings to do it this way, e.g., they just tokenize and pass the inputs through the model.

I tried experimenting with different token lengths and using and not using any additional padding. The results are that there are significant differences in the ultimate cosine similarity scores between the embeddings depending on whether padding is used, and how much of it is (e.g., what the max token length is).

I re-read your coder papers and didn't find anything about padding, nor could I find anything more in this repo. Can you explain why you chose the padding strategy you did? Have you experimented with not using or adjusting the amount of padding and its impact ultimately on cosine similarity between embeddings and overall performance?

ganjinzero / coder Goto Github PK

coder's People

Contributors

Stargazers

Watchers

Forkers

coder's Issues

coder++

pytorch_model.bin is not downloadable from hugging face GanjinZero/UMLSBert_ENG

论文中计算MS loss的部分中，P和N是不是搞反了？

Requirements file?

Impact of padding strategy on CODER embeddings

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent