candig / candigv2-ingest Goto Github PK
View Code? Open in Web Editor NEWIngest microservice for CanDIGv2
License: GNU Lesser General Public License v3.0
Ingest microservice for CanDIGv2
License: GNU Lesser General Public License v3.0
Hi Marion @mshadbolt and dev team folks:
One quick question about the Ingest: In the CanDIG server.env
, the KEEP_TEST_DATA
was set to 'true' in order to see the post-install test results.
What would be the best way to clean up the test data?
Thanks,
Ray
Hi @mshadbolt and all dev folks,
I got a question for the post installation test data and the schema. @mshadbolt mentioned last time that the date formats
is an object in dateInterval
format now instead of a string. The object has 'number of days and months since first diagnosis':
The 'date_of_diagnosis' object is supposed to be {days: 0, months: 0} as in most places in the json data file, but on line 200-205, the days and monts are not 0, see:
Why is the date_of_diagnosis object data not 0? what is the logic? I appreciate it if you can give me a hint.
Thanks,
Ray
Hi @mshadbolt and the develop team,
I continue to try out the CanDIGv2 and still got some issues after the upgrade to 4.0.0.
requires.txt
PyYAML==6.0.1
requests==2.31.0
requests-mock>=1.11.0
urllib3==1.26.18
minio==7.1.7
jsoncomparison~=1.1.0
candigv2-authx@git+https://github.com/CanDIG/[email protected]
awscli==1.29.5
python-dotenv==0.14.0
dateparser~=1.2.0
pandas~=2.1.4
numpy>=1.22.2 # not directly required, pinned by Snyk to avoid a vulnerability
clinical_etl@git+https://github.com/CanDIG/[email protected]
make test-integration
make-test-integration.txt
make-test-integration.txt
Issues:
python katsu_ingest.py --input tests/clinical_ingest.json
There are still a lot fails in the make test-integration
, please suggest what we did wrong with the setup and for the dates in the ingest test clinical data(see the above screen), how we can get rid of date-related errors.
Thanks a lot,
Ray
Hi there,
I ran into an issue when I was trying the data ingest. Here is what I did:
python katsu_ingest.py --input tests/clinical_ingest.json
in a conda env.I got the following date
format related issue. In the date file tests/clinical_ingest.json
, the date is in format:
"date_of_birth": {
"month_interval": -768,
"day_interval": -23376
},
What I missed here? Any suggestion is highly appreciated.
Thanks,
Ray
input string: {'month_interval': 70, 'day_interval': 2100}
Traceback (most recent call last):
File "/home/rocky/ingest-candigv2/katsu_ingest.py", line 335, in <module>
main()
File "/home/rocky/ingest-candigv2/katsu_ingest.py", line 330, in main
result, status_code = ingest_clinical_data(ingest_json, headers)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rocky/ingest-candigv2/katsu_ingest.py", line 286, in ingest_clinical_data
schemas_to_ingest = prepare_clinical_data_for_ingest(ingest_json)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rocky/ingest-candigv2/katsu_ingest.py", line 252, in prepare_clinical_data_for_ingest
schema.validate_ingest_map(by_program[program_id])
File "/home/rocky/CanDIGv2/bin/miniconda3/envs/raycandig/lib/python3.11/site-packages/clinical_etl/schema.py", line 325, in validate_ingest_map
self.validate_schema(root_schema, map_json[root_schema][x])
File "/home/rocky/CanDIGv2/bin/miniconda3/envs/raycandig/lib/python3.11/site-packages/clinical_etl/schema.py", line 411, in validate_schema
eval(f"self.validate_{schema_name}({map_json})")
File "<string>", line 1, in <module>
File "/home/rocky/CanDIGv2/bin/miniconda3/envs/raycandig/lib/python3.11/site-packages/clinical_etl/mohschema.py", line 219, in validate_donors
death = dateparser.parse(map_json["date_of_death"]).date()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rocky/CanDIGv2/bin/miniconda3/envs/raycandig/lib/python3.11/site-packages/dateparser/conf.py", line 100, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/rocky/CanDIGv2/bin/miniconda3/envs/raycandig/lib/python3.11/site-packages/dateparser/__init__.py", line 79, in parse
data = parser.get_date_data(date_string, date_formats)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rocky/CanDIGv2/bin/miniconda3/envs/raycandig/lib/python3.11/site-packages/dateparser/date.py", line 511, in get_date_data
raise TypeError("Input type must be str")
TypeError: Input type must be str
Hi dev team folks,
I am testing the ingest function and ran into a question. In the README.md file, regarding using API method to ingest in How to use candigv2-ingest
section, how do I get the authorized bearer token? This may be a basic question but it's out of my knowledge. I tried curl -k -c cookie -d username=user2 -d password=user2Password http://docker-container:8080/
but got nothing.
Any instruction is appreciate it.
Thanks a lot,
Ray
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.