Giter Site home page Giter Site logo

scai-bio / index Goto Github PK

View Code? Open in Web Editor NEW
3.0 3.0 0.0 9.38 MB

Intelligent data steward toolbox using Large Language Model embeddings for automated Data-Harmonization

Home Page: https://index.bio.scai.fraunhofer.de

License: Apache License 2.0

Python 99.56% Dockerfile 0.44%
data-harmonization data-stewardship embeddings large-language-models semantic-mapping

index's Issues

Update api version in routes.py during release

During the Container build, the api version in routes.py should be uodated by action to the current version tag. This makes sure the resulting docker build will have the current release version as API version.

Bug: Duplicate entries crash DB

curl -X PUT "[https://index.bio.scai.fraunhofer.de/concepts/id001/mappings?terminology_id=test_ab3&concept_name=cough&text=erkaeltung"](https://index.bio.scai.fraunhofer.de/concepts/id001/mappings?terminology_id=test_ab3&concept_name=cough&text=erkaeltung%22) -H "accept: application/json"
{"detail":"Failed to create or update concept: (sqlite3.IntegrityError) UNIQUE constraint failed: concept.id\n[SQL: INSERT INTO concept (id, name, terminology_id) VALUES (?, ?, ?)]\n[parameters: ('id001', 'cough', 'test_ab3')]\n(Background on this error at: [https://sqlalche.me/e/20/gkpj)"}](https://sqlalche.me/e/20/gkpj)%22%7D)
{"detail":"Failed to create or update terminology: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (sqlite3.IntegrityError) UNIQUE constraint failed: concept.id\n[SQL: INSERT INTO concept (id, name, terminology_id) VALUES (?, ?, ?)]\n[parameters: ('id001', 'cough', 'test_ab3')]\n(Background on this error at: https://sqlalche.me/e/20/gkpj) (Background on this error at: [https://sqlalche.me/e/20/7s2a)"}](https://sqlalche.me/e/20/7s2a)%22%7D)

Move db file to own directory

Having the db file in the same directory as a python package will cause issues when mounting the directory in a data container or as a PVC

Add DB adapter for Weaviate (vector db)

Implement a DB adapter for weaviate:
https://weaviate.io/developers/weaviate

Use the lokal in memory / file based DB in a first implementation

It should be possible to:

Store a computed embedding together with

  • A terminology label / ID (String)
  • Label of the Model used for generating this embedding (String)
  • The original String
  • A concept label / ID (String)

Retrieve an embedding

  • Based on the highest (cosine) similarity
  • Up to limit=n most similar vectors

Retrieve limit=n Random vectors from the DB for visualiazion

Target name mismatching in output

If you look at row number 8 in the target table below, 'var' name is 4220292, however after the text mapping (using both GPT4Adapter and MPNetAdapter) the output returns 4220292Â for the same target variable in the output table.

Source:

  var desc
0 edmmtyp Multiples Myelom (symptomatisch)
1 edmmtyp Smouldering Myeloma (asymptomatisch)
2 edmmtyp MGUS - monoklonale Gammopathie unklarer Signifikanz
3 edmmtyp Solitäres Plasmozytom
4 edmmtyp Plasmazell-Leukämie
5 *sympmws Schmerzenim Bereich der mittleren WS
6 *sympuws Schmerzen im Bereich der unteren WS
7 *sympknoch Knochenschmerzen
8 *sympleist Leistungsverlust
9 *sympmued Müdigkeit
10 *sympschwae Schwäche

Target:

  var desc
0 437233 Multiple myeloma
1 4184985 Smoldering myeloma
2 4082463 Monoclonal gammopathy of uncertain significance
3 4216139 Plasmacytoma
4 133154 Plasma cell leukemia
5 4169580 Pain in spine
6 4169580 Pain in spine
7 4129418 Bone pain
8 4220292 Impaired psychomotor performance
9 4223659 Fatigue
10 437113 Asthenia

Output:

  Source Variable Target Variable Similarity
0 edmmtyp1 437233 0.898915
1 edmmtyp2 4184985 0.910589
2 edmmtyp3 4082463 0.903709
3 edmmtyp4 4216139 0.847672
4 edmmtyp5 133154 0.897126
5 *sympmws1 4169580 0.813263
6 *sympuws2 4169580 0.81713
7 *sympknoch 4169580 0.844623
8 *sympleist 4220292Â 0.790407
9 *sympmued 4223659 0.899128
10 *sympschwae 437113 0.828928

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.