Giter Site home page Giter Site logo

candig / candigv2-ingest Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 1.0 565 KB

Ingest microservice for CanDIGv2

Home Page: https://editor.swagger.io/?url=https://raw.githubusercontent.com/CanDIG/candigv2-ingest/develop/ingest_openapi.yaml

License: GNU Lesser General Public License v3.0

Python 98.93% Dockerfile 0.99% Shell 0.08%

candigv2-ingest's People

Contributors

daisieh avatar dependabot[bot] avatar justin-ys avatar kcranston avatar mshadbolt avatar ordineu avatar snyk-bot avatar sonqbchau avatar yavyx avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

mshadbolt

candigv2-ingest's Issues

How to wipe out test data

Hi Marion @mshadbolt and dev team folks:

One quick question about the Ingest: In the CanDIG server.env, the KEEP_TEST_DATA was set to 'true' in order to see the post-install test results.

What would be the best way to clean up the test data?

Thanks,
Ray

Question about the post-install test data and schema

Hi @mshadbolt and all dev folks,

I got a question for the post installation test data and the schema. @mshadbolt mentioned last time that the date formats is an object in dateInterval format now instead of a string. The object has 'number of days and months since first diagnosis':

image

The 'date_of_diagnosis' object is supposed to be {days: 0, months: 0} as in most places in the json data file, but on line 200-205, the days and monts are not 0, see:
image

Why is the date_of_diagnosis object data not 0? what is the logic? I appreciate it if you can give me a hint.

Thanks,
Ray

test data issue

Hi @mshadbolt and the develop team,

I continue to try out the CanDIGv2 and still got some issues after the upgrade to 4.0.0.

  1. requires.txt
PyYAML==6.0.1
requests==2.31.0
requests-mock>=1.11.0
urllib3==1.26.18
minio==7.1.7
jsoncomparison~=1.1.0
candigv2-authx@git+https://github.com/CanDIG/[email protected]
awscli==1.29.5
python-dotenv==0.14.0
dateparser~=1.2.0
pandas~=2.1.4
numpy>=1.22.2 # not directly required, pinned by Snyk to avoid a vulnerability
clinical_etl@git+https://github.com/CanDIG/[email protected]

git submodule status
image

  1. make test-integration
    make-test-integration.txt
    make-test-integration.txt

  2. Issues:
    python katsu_ingest.py --input tests/clinical_ingest.json

image

There are still a lot fails in the make test-integration, please suggest what we did wrong with the setup and for the dates in the ingest test clinical data(see the above screen), how we can get rid of date-related errors.

Thanks a lot,
Ray

Date format in the tests/clinical_ingest.json

Hi there,

I ran into an issue when I was trying the data ingest. Here is what I did:

  1. CanDIGv2 is setting up and works fine;
  2. I checked out the 'development' branch of this 'candigv2-ingest';
  3. ran python katsu_ingest.py --input tests/clinical_ingest.json in a conda env.

I got the following date format related issue. In the date file tests/clinical_ingest.json, the date is in format:

            "date_of_birth": {
                "month_interval": -768,
                "day_interval": -23376
            },

What I missed here? Any suggestion is highly appreciated.

Thanks,
Ray

input string: {'month_interval': 70, 'day_interval': 2100} 
Traceback (most recent call last):
  File "/home/rocky/ingest-candigv2/katsu_ingest.py", line 335, in <module>
    main()
  File "/home/rocky/ingest-candigv2/katsu_ingest.py", line 330, in main
    result, status_code = ingest_clinical_data(ingest_json, headers)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rocky/ingest-candigv2/katsu_ingest.py", line 286, in ingest_clinical_data
    schemas_to_ingest = prepare_clinical_data_for_ingest(ingest_json)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rocky/ingest-candigv2/katsu_ingest.py", line 252, in prepare_clinical_data_for_ingest
    schema.validate_ingest_map(by_program[program_id])
  File "/home/rocky/CanDIGv2/bin/miniconda3/envs/raycandig/lib/python3.11/site-packages/clinical_etl/schema.py", line 325, in validate_ingest_map
    self.validate_schema(root_schema, map_json[root_schema][x])
  File "/home/rocky/CanDIGv2/bin/miniconda3/envs/raycandig/lib/python3.11/site-packages/clinical_etl/schema.py", line 411, in validate_schema
    eval(f"self.validate_{schema_name}({map_json})")
  File "<string>", line 1, in <module>
  File "/home/rocky/CanDIGv2/bin/miniconda3/envs/raycandig/lib/python3.11/site-packages/clinical_etl/mohschema.py", line 219, in validate_donors
    death = dateparser.parse(map_json["date_of_death"]).date()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rocky/CanDIGv2/bin/miniconda3/envs/raycandig/lib/python3.11/site-packages/dateparser/conf.py", line 100, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/home/rocky/CanDIGv2/bin/miniconda3/envs/raycandig/lib/python3.11/site-packages/dateparser/__init__.py", line 79, in parse
    data = parser.get_date_data(date_string, date_formats)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rocky/CanDIGv2/bin/miniconda3/envs/raycandig/lib/python3.11/site-packages/dateparser/date.py", line 511, in get_date_data
    raise TypeError("Input type must be str")
TypeError: Input type must be str

image

how do I get the authorized bearer token for API ingest?

Hi dev team folks,

I am testing the ingest function and ran into a question. In the README.md file, regarding using API method to ingest in How to use candigv2-ingest section, how do I get the authorized bearer token? This may be a basic question but it's out of my knowledge. I tried curl -k -c cookie -d username=user2 -d password=user2Password http://docker-container:8080/ but got nothing.

Any instruction is appreciate it.

Thanks a lot,
Ray

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.