scai-bio / index Goto Github PK
View Code? Open in Web Editor NEWIntelligent data steward toolbox using Large Language Model embeddings for automated Data-Harmonization
Home Page: https://index.bio.scai.fraunhofer.de
License: Apache License 2.0
Intelligent data steward toolbox using Large Language Model embeddings for automated Data-Harmonization
Home Page: https://index.bio.scai.fraunhofer.de
License: Apache License 2.0
Adapt code so that cosine similarity is computed in the DB layer, see:
https://stackoverflow.com/questions/42310655/sql-computation-of-cosine-similarity
During the Container build, the api version in routes.py should be uodated by action to the current version tag. This makes sure the resulting docker build will have the current release version as API version.
curl -X PUT "[https://index.bio.scai.fraunhofer.de/concepts/id001/mappings?terminology_id=test_ab3&concept_name=cough&text=erkaeltung"](https://index.bio.scai.fraunhofer.de/concepts/id001/mappings?terminology_id=test_ab3&concept_name=cough&text=erkaeltung%22) -H "accept: application/json"
{"detail":"Failed to create or update concept: (sqlite3.IntegrityError) UNIQUE constraint failed: concept.id\n[SQL: INSERT INTO concept (id, name, terminology_id) VALUES (?, ?, ?)]\n[parameters: ('id001', 'cough', 'test_ab3')]\n(Background on this error at: [https://sqlalche.me/e/20/gkpj)"}](https://sqlalche.me/e/20/gkpj)%22%7D)
{"detail":"Failed to create or update terminology: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (sqlite3.IntegrityError) UNIQUE constraint failed: concept.id\n[SQL: INSERT INTO concept (id, name, terminology_id) VALUES (?, ?, ?)]\n[parameters: ('id001', 'cough', 'test_ab3')]\n(Background on this error at: https://sqlalche.me/e/20/gkpj) (Background on this error at: [https://sqlalche.me/e/20/7s2a)"}](https://sqlalche.me/e/20/7s2a)%22%7D)
Also add parameter to determine number of closest matches (default = 1) and add those as columns to the resulting dataframe
Having the db file in the same directory as a python package will cause issues when mounting the directory in a data container or as a PVC
Implement a DB adapter for weaviate:
https://weaviate.io/developers/weaviate
Use the lokal in memory / file based DB in a first implementation
It should be possible to:
If you look at row number 8 in the target table below, 'var' name is 4220292, however after the text mapping (using both GPT4Adapter and MPNetAdapter) the output returns 4220292Â for the same target variable in the output table.
Source:
var | desc | |
---|---|---|
0 | edmmtyp | Multiples Myelom (symptomatisch) |
1 | edmmtyp | Smouldering Myeloma (asymptomatisch) |
2 | edmmtyp | MGUS - monoklonale Gammopathie unklarer Signifikanz |
3 | edmmtyp | Solitäres Plasmozytom |
4 | edmmtyp | Plasmazell-Leukämie |
5 | *sympmws | Schmerzenim Bereich der mittleren WS |
6 | *sympuws | Schmerzen im Bereich der unteren WS |
7 | *sympknoch | Knochenschmerzen |
8 | *sympleist | Leistungsverlust |
9 | *sympmued | Müdigkeit |
10 | *sympschwae | Schwäche |
Target:
var | desc | |
---|---|---|
0 | 437233 | Multiple myeloma |
1 | 4184985 | Smoldering myeloma |
2 | 4082463 | Monoclonal gammopathy of uncertain significance |
3 | 4216139 | Plasmacytoma |
4 | 133154 | Plasma cell leukemia |
5 | 4169580 | Pain in spine |
6 | 4169580 | Pain in spine |
7 | 4129418 | Bone pain |
8 | 4220292 | Impaired psychomotor performance |
9 | 4223659 | Fatigue |
10 | 437113 | Asthenia |
Output:
Source Variable | Target Variable | Similarity | |
---|---|---|---|
0 | edmmtyp1 | 437233 | 0.898915 |
1 | edmmtyp2 | 4184985 | 0.910589 |
2 | edmmtyp3 | 4082463 | 0.903709 |
3 | edmmtyp4 | 4216139 | 0.847672 |
4 | edmmtyp5 | 133154 | 0.897126 |
5 | *sympmws1 | 4169580 | 0.813263 |
6 | *sympuws2 | 4169580 | 0.81713 |
7 | *sympknoch | 4169580 | 0.844623 |
8 | *sympleist | 4220292Â | 0.790407 |
9 | *sympmued | 4223659 | 0.899128 |
10 | *sympschwae | 437113 | 0.828928 |
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.