pgscatalog / pgs_catalog Goto Github PK
View Code? Open in Web Editor NEWAn open database of polygenic scores and relevant metadata needed to apply and evaluate them correctly.
License: Apache License 2.0
An open database of polygenic scores and relevant metadata needed to apply and evaluate them correctly.
License: Apache License 2.0
Would it be possible to add the PGS Catalog header at the top of the tracker? That would make finding the validator easier.
I am guessing this is something that needs to be corrected at the database level, and not at the REST API server level.
Example:
curl -X GET "https://www.pgscatalog.org/rest/performance/search?pgs_id=PGS000004&offset=0&limit=20&format=json" -H "accept: application/json" | jq '.' | grep -in "unknown"
56: "name_full": "UNKNOWN"
76: "name_full": "UNKNOWN"
676: "name_full": "UNKNOWN"
Hi PGS Catalog team
This is a feature request.
While answering one of the comments to your medRxiv preprint, @smlmbrt mentioned that you were "(...) exploring ways to to add population reference calculations and distributions (e.g. percentiles) to aid end-user applications and interpretation , and hope to add those features in the future." (from a reply to Charles Warden).
So, I am just adding this request here to GitHub issues to keep track of its progress.
See EBISPOT/goci#435 for more information re: CORS. Also it appears the GCST endpoint they need is also not working (e.g. https://www.pgscatalog.org/rest/gwas/get_score_ids/GCST004988), despite the matched landing page working (https://www.pgscatalog.org/gwas/GCST004988/).
The endpoint /rest/cohort/
for cohort CARE_b returns 200 but is empty. However, this cohort exists, e.g., it is associated with PGS000010, PPM000015 and PSS000009.
Hi PGS Catalog Team
This is a feature request.
It would be really nice to have a way of retrieving scores by variant identifiers (variant_id
), and the reverse, to get variant identifiers by score ids (pgs_id
):
/rest/variant/search?pgs_id=PGS000001
/rest/score/search?variant_id=rs123213
As I understand it, currently, there is no way of finding these relationships between variants and scores in the metadata. So, to find this information, we have to download all PGS scoring files and look in there.
I understand though that you'd probably need to fix #150 before implementing such a feature.
Oftentimes that are missing values which come coded as "NR"
(for "Not Reported", I believe).
For the REST API user it would best if these came as null
in the JSON responses. I already have to re-code null
values to NA
(Not Available) anyway. I don't know I easy it would be for you to do this, but if you could do this re-coding server-side it would simplify and improve efficiency of parsing on the client side.
Some examples of object keys whose values I found to be "NR"
:
"ancestry_free": "NR"
"ancestry_country": "Australia, U.K., NR, U.S."
"variants_genomebuild": "NR"
"ancestry_free": "NR"
"ancestry_broad": "NR"
"ancestry_country": "NR"
"method_params": "NR"
"ancestry_country": "U.S., Australia, Canada, NR"
Not sure about those cases where the value is not a single "NR"
, e.g., "Australia, U.K., NR, U.S."
or "U.S., Australia, Canada, NR"
. It is probably best to leave these cases as it is.
For debugging reasons and for the sake of completeness it would be nice to have these endpoints:
/rest/cohort/all
/rest/sample_set/all
Web page currently broken because of the big amount of data + charts to display.
The idea is to add a search form to limit the scores displayed (for a given trait for instance)
As per the PGS scoring file schema: each row in the PGS scoring file pertains a variant.
Currently, each variant is identified either by its "rsID"
or by the combination of "chr_name"
and "chr_position"
. For analyses involving various scoring files it would be nice to have one single identifier column. So here is a suggestion:
Create a new column (to be the first one), named e.g. "variant_id"
whose value is preferably the "rsID"
if it exists, otherwise, it is the concatenation of the chromosome name and the position (e.g., " 1:757640"
). This way the reader of PGS scoring files would always know to look at this column for the identifier. Usually I find myself applying this logic on my side.
We should have a flag, badge, or something that marks a Publication as a preprint. Maybe a button with hover text that explains what a preprint is: "manuscript has not undergone peer review"
Ideas:
Hi PGS Catalog Team
Now that you have updated your ancestries' ontology by including the multi-ancestry categories, namely, Multi-Ancestry
(including Europeans) and Multi-Ancestry (excluding Europeans), and also added the display categories, I would like to take the opportunity to ascertain if my assumptions are correct about how each relate to one another, and whether some renaming would be appropriate.
As I see it, we have now three levels of ancestry description (using my own wording here to refer to these levels):
ancestry_free
in Sample
). This corresponds to the most basic level of description. Examples include Chinese, Japanese, Korean, Spanish, Swedish, Brazilian, Mexican, etc.. Essentially, the concepts given as examples in the third column of Table 1 of Morales et al..ancestry_broad
in Sample
). This is the ancestry category from the NHGRI-EBI GWAS Catalog framework, first column of Table 1 of Morales et al., plus a couple of new categories, i.e., the multi-ancestry categories defined by you guys: Multi-Ancestry (including Europeans) and Multi-Ancestry (excluding Europeans).As you can see from my bullet point no. 3, I am planning to refer to display categor(y/ies) as ancestry class(es) in quincunx. The reasons are:
Regarding the /rest/ancestry_categories
that provides the mapping of ancestry symbols and their name, I think there are two issues here:
/rest/ancestry_categories
are in some cases simplified, i.e., not including the terms in brackets, e.g., "Additional Asian Ancestries"
instead of "Additional Asian Ancestries (including Central, and South East Asian)"
. This is fine if all one is doing is consultation, but for programmatic analysis, where these values might be matched across the database, this can be a problem.I think it would be more useful if a full table of all these ancestries were returned. Currently, in quincunx, I have a saved dataset of ancestries that simultaneously provides the mapping between ancestry categories and classes, the mapping between ancestry classes and their symbol (as provided by /rest/ancestry_categories
), the hexadecimal colour code of the class, and the ancestry category definition as given in your documentation. Perhaps you could provide these data in /rest/ancestry_categories
instead?
# A tibble: 19 x 6
ancestry_category ancestry_class ancestry_class_sy… ancestry_class_co… definition examples
<chr> <chr> <chr> <chr> <chr> <chr>
1 Aboriginal Australian Additional Diverse Ancestries OTH #999999 "Includes individuals who either self-report or have been described by authors as Australian Aboriginal. These are expected to b… Martu Australian Ab…
2 African American or Afro-Caribbean African AFR #FFD900 "Includes individuals who either self-report or have been described by authors as African American or Afro-Caribbean. This categ… African American, A…
3 African unspecified African AFR #FFD900 "Includes individuals that either self-report or have been described as African, but there was not sufficient information to all… African, non-Hispan…
4 Asian unspecified Additional Asian Ancestries (including … ASN #B15928 "Includes individuals that either self-report or have been described as Asian but there was not sufficient information to allow … Asian, Asian Americ…
5 Central Asian Additional Asian Ancestries (including … ASN #B15928 "Includes individuals who either self-report or have been described by authors as Central Asian. We note that there does not app… Silk Road (founder/…
6 East Asian East Asian EAS #4DAF4A "Includes individuals who either self-report or have been described by authors as East Asian or one of the sub-populations from … Chinese, Japanese, …
7 European European EUR #377EB8 "Includes individuals who either self-report or have been described by authors as European, Caucasian, white, or one of the sub-… Spanish, Swedish
8 Greater Middle Eastern (Middle Eastern… Greater Middle Eastern (Middle Eastern,… GME #00CED1 "Includes individuals who self-report or were described by authors as Middle Eastern, North African, Persian, or one of the sub-… Tunisian, Arab, Ira…
9 Hispanic or Latin American Hispanic or Latin American AMR #E41A1C "Includes individuals who either self-report or are described by authors as Hispanic, Latino, Latin American, or one of the sub-… Brazilian, Mexican
10 Native American Additional Diverse Ancestries OTH #999999 "Includes indigenous individuals of North, Central, and South America, descended from the original human migration into the Amer… Pima Indian, Plains…
11 Not reported Ancestry Not Reported NR #BBBBBB "Includes individuals for which no ancestry or country of recruitment information is available" NA
12 Oceanian Additional Diverse Ancestries OTH #999999 "Includes individuals that either self-report or have been described by authors as Oceanian or one of the sub-populations from t… Solomon Islander, M…
13 Other Additional Diverse Ancestries OTH #999999 "Includes individuals where an ancestry descriptor is known but insufficient information is available to allow assignment to one… Surinamese, Russian
14 Other admixed ancestry Additional Diverse Ancestries OTH #999999 "Includes individuals who either self-report or have been described by authors as admixed and do not fit the definition of the o… NA
15 South Asian South Asian SAS #984EA3 "Includes individuals who either self-report or have been described by authors as South Asian or one of the sub-populations from… Bangladeshi, Sri La…
16 South East Asian Additional Asian Ancestries (including … ASN #B15928 "Includes individuals who either self-report or have been described by authors as South East Asian or one of the sub-populations… Thai, Malay
17 Sub-Saharan African African AFR #FFD900 "Includes individuals who either self-report or have been described by authors as Sub-Saharan African or one of the sub-populati… Yoruban, Gambian
18 Multi-Ancestry (including Europeans) Multi-Ancestry (including Europeans) MAE #A6CEE3 "Combined sample of multiple ancestries that includes European ancestry individuals. Used when ancestry-specific sample sizes ar… NA
19 Multi-Ancestry (excluding Europeans) Multi-Ancestry (excluding Europeans) MAO #FF7F00 "Combined sample of multiple ancestries that does not include any European ancestry individuals. Used when ancestry-specific sam… NA
I am happy to hear your thoughts on this.
curl -X GET "https://www.pgscatalog.org/rest/performance/search?pgs_id=PGS000004&offset=0&limit=20&format=json" -H "accept: application/json" | jq '.' | grep -n "\b \""
40: "name_full": "Agricultural Health Study "
44: "name_full": "Breakthrough Generations Study "
48: "name_full": "European Prospective Investigation into Cancer "
60: "name_full": "Nurses Health Study "
64: "name_full": "Nurses Health Studies II "
etc.
The description of some of the EFO traits are enclosed in ['...']
or ["..."]
.
Examples:
efo_id trait description
<chr> <chr> <chr>
1 EFO_0009460 ACPA-negative rheumatoid arthritis "['A subtype of rheumatoid arthritis defined by the abse…
2 EFO_0009459 ACPA-positive rheumatoid arthritis "['A subtype of rheumatoid arthritis defined by the pres…
3 EFO_0005056 age at death "['The age at which death occurs.']"
4 EFO_0007878 alcohol consumption measurement "['quantification of some aspect of alcohol consumption …
5 EFO_0007835 alcohol dependence measurement "['quantification of some aspect of alcohol dependence o…
6 EFO_0004533 alkaline phosphatase measurement "['Alkaline phosphatase measurement is a quantification …
7 EFO_1001870 late-onset Alzheimers disease "['This is the most common form of the disease, which ha…
8 EFO_0003913 angina pectoris "['The symptom of paroxysmal pain consequent to MYOCARDI…
9 EFO_0004614 apolipoprotein A 1 measurement "['Is a quantification of serum lipoprotein A. Apolipopr…
10 EFO_0004615 apolipoprotein B measurement "['The measurement of ApoB in blood. Apolipoprotein B is…
11 EFO_0004736 aspartate aminotransferase measurement "['Is a quantification of aspartate aminotransferase, an…
12 EFO_0010934 aspartate aminotransferase to alanine aminotransferase ratio "['The ratio between the levels of aspartate aminotransf…
13 EFO_0005090 basophil count "['quantification of basophils in the blood', 'The numbe…
14 EFO_0007992 basophil percentage of leukocytes "['A calculated measurement in which the number of basop…
15 EFO_0007992 basophil percentage of leukocytes "['A calculated measurement in which the number of basop…
16 EFO_0004570 bilirubin measurement "['A bilirubin measurement is a quantification of biliru…
17 EFO_0007937 blood protein measurement "['quantification of the levels of some protein in a blo…
18 EFO_0008036 BMI-adjusted fasting blood glucose measurement "[\"fasting blood glucose measurement that has been adju…
19 EFO_0008037 BMI-adjusted fasting blood insulin measurement "[\"fasting insulin measurement that has been adjusted f…
20 EFO_0007788 BMI-adjusted waist-hip ratio "['waist-hip ratio that has been adjusted by subjects’ b…
21 EFO_0004339 body height "['The distance from the sole to the crown of the head w…
22 EFO_0004340 body mass index "['An indicator of body density as determined by the rel…
23 EFO_0003923 bone density "['The amount of mineral per square centimeter of BONE. …
24 EFO_0007772 calcaneal bone quantitative ultrasound measurement "['bone quantitiave ultrasound of the main bone in the h…
25 EFO_0004838 calcium measurement "['Is a quantification of calcium, typically in serum. C…
26 EFO_1001958 high grade ovarian serous adenocarcinoma "['A rapidly growing serous adenocarcinoma that arises f…
27 EFO_1001516 ovarian serous carcinoma "['serous carcinoma located in the ovary']"
28 EFO_0008328 chronotype measurement "['quantification of some aspect of chronotype such as e…
29 EFO_0007710 cognitive decline measurement "[\"quantification of some aspect of cognitive decline s…
30 EFO_0009819 comparative body size at age 10, self-reported "[\"Description of an individual's body size at age 10 c…
31 EFO_0009518 complication "['Any disease or disorder that occurs during the course…
32 EFO_0004458 C-reactive protein measurement "['C-reactive protein (CRP) measurement is a measurement…
33 EFO_0007934 creatinine clearance measurement "['The clearance rate of creatinine, that is, the volume…
34 EFO_0004518 creatinine measurement "['A creatinine measurement is a measure of the metaboli…
35 EFO_0004617 cystatin C measurement "['is a quantification of serum cystatin C C (formerly g…
36 EFO_0007006 depressive symptom measurement "['quantification of the existence and severity of depre…
37 EFO_0006336 diastolic blood pressure "['The blood pressure after the contraction of the heart…
38 EFO_0004842 eosinophil count "['Is a quantification of eosinphils in blood.', 'The nu…
39 EFO_0007991 eosinophil percentage of leukocytes "['A calculated measurement in which the number of eosin…
40 EFO_0007991 eosinophil percentage of leukocytes "['A calculated measurement in which the number of eosin…
41 EFO_0004305 erythrocyte count "['The number of red blood cells\\xa0per unit volume in …
42 EFO_0004465 fasting blood glucose measurement "['An fasting blood glucose measurement is a measurement…
43 EFO_0008036 BMI-adjusted fasting blood glucose measurement "[\"fasting blood glucose measurement that has been adju…
44 EFO_0004466 fasting blood insulin measurement "['A fasting blood insulin measurement is a measurement …
45 EFO_0008037 BMI-adjusted fasting blood insulin measurement "[\"fasting insulin measurement that has been adjusted f…
46 PATO_0000383 female "['A biological sex quality inhering in an individual or…
47 EFO_1001958 high grade ovarian serous adenocarcinoma "['A rapidly growing serous adenocarcinoma that arises f…
48 EFO_1001516 ovarian serous carcinoma "['serous carcinoma located in the ovary']"
49 EFO_0006829 GFR change measurement "[\"A quantification of the variation in an individual's…
50 EFO_0005208 glomerular filtration rate "['measurement of the flow rate of filtered fluid throug…
51 EFO_0006829 GFR change measurement "[\"A quantification of the variation in an individual's…
52 EFO_0004468 glucose measurement "['Is any quantification of glucose.']"
53 EFO_0004465 fasting blood glucose measurement "['An fasting blood glucose measurement is a measurement…
54 EFO_0008036 BMI-adjusted fasting blood glucose measurement "[\"fasting blood glucose measurement that has been adju…
55 EFO_0004541 HbA1c measurement "['A quantification of glycated A1c hemoglobin in blood …
56 EFO_0004541 HbA1c measurement "['A quantification of glycated A1c hemoglobin in blood …
57 EFO_0004348 hematocrit "['The volume of packed RED BLOOD CELLS in a blood speci…
58 EFO_0004509 hemoglobin measurement "['hemoglobin levels', 'Hemoglobin measurement is a meas…
59 EFO_0004541 HbA1c measurement "['A quantification of glycated A1c hemoglobin in blood …
60 EFO_0004528 mean corpuscular hemoglobin concentration "['The mean corpuscular hemoglobin concentration is a me…
61 EFO_0004612 high density lipoprotein cholesterol measurement "['The measurement of HDL cholesterol in blood used as a…
62 EFO_1001958 high grade ovarian serous adenocarcinoma "['A rapidly growing serous adenocarcinoma that arises f…
63 EFO_0004627 IGF-1 measurement "['Is the quantification of Insulin-like growth factor 1…
64 EFO_0002614 insulin resistance "['diminished effectiveness of insulin in lowering plasm…
65 EFO_0008001 insulin secretion measurement "['Measurement of compounds, generally C-peptide or matu…
66 EFO_0004695 intraocular pressure measurement "['Is a quantification of intraocular pressure. Increase…
67 EFO_1001870 late-onset Alzheimers disease "['This is the most common form of the disease, which ha…
68 EFO_0008206 left ventricular systolic function measurement "['quantification of some aspect of the systolic functio…
69 EFO_0004308 leukocyte count "['The number of\\xa0WHITE BLOOD CELLS\\xa0per unit volu…
70 EFO_0007992 basophil percentage of leukocytes "['A calculated measurement in which the number of basop…
71 EFO_0007990 neutrophil percentage of leukocytes "['A calculated measurement in which the number of neutr…
72 EFO_0007989 monocyte percentage of leukocytes "['A calculated measurement in which the number of monoc…
73 EFO_0005091 monocyte count "['quantification of monocytes in the blood']"
74 EFO_0005090 basophil count "['quantification of basophils in the blood', 'The numbe…
75 EFO_0004587 lymphocyte count "['A quantification of lymphocytes in blood.']"
76 EFO_0004842 eosinophil count "['Is a quantification of eosinphils in blood.', 'The nu…
77 EFO_0004833 neutrophil count "['Is a quantification of neutrophils in blood.', 'The n…
78 EFO_0007993 lymphocyte percentage of leukocytes "['A calculated measurement in which the number of lymph…
79 EFO_0007991 eosinophil percentage of leukocytes "['A calculated measurement in which the number of eosin…
80 EFO_0006925 lipoprotein A measurement "['quantification of some lipoprotein A in a sample']"
81 EFO_0010821 liver fat measurement "['A quantification of the fat content of the liver such…
82 EFO_0004300 longevity "[\"The length of time of an organism's life.\"]"
83 EFO_0004611 low density lipoprotein cholesterol measurement "['The measurement of LDL cholesterol in blood used as a…
84 EFO_0004587 lymphocyte count "['A quantification of lymphocytes in blood.']"
85 EFO_0007993 lymphocyte percentage of leukocytes "['A calculated measurement in which the number of lymph…
86 EFO_0007993 lymphocyte percentage of leukocytes "['A calculated measurement in which the number of lymph…
87 PATO_0000384 male "['A biological sex quality inhering in an individual or…
88 EFO_0004527 mean corpuscular hemoglobin "['The MCH is the average mass of hemoglobin per red bl…
89 EFO_0004528 mean corpuscular hemoglobin concentration "['The mean corpuscular hemoglobin concentration is a me…
90 EFO_0004526 mean corpuscular volume "['A mean corpuscular volume is the result of calculatio…
91 EFO_0004584 mean platelet volume "['A measurement of mean platelet volume is a machine-ca…
92 EFO_0010701 mean reticulocyte volume "['Mean volume of reticulocyte cells']"
93 EFO_0005091 monocyte count "['quantification of monocytes in the blood']"
94 EFO_0007989 monocyte percentage of leukocytes "['A calculated measurement in which the number of monoc…
95 EFO_0007989 monocyte percentage of leukocytes "['A calculated measurement in which the number of monoc…
96 EFO_0004833 neutrophil count "['Is a quantification of neutrophils in blood.', 'The n…
97 EFO_0007990 neutrophil percentage of leukocytes "['A calculated measurement in which the number of neutr…
98 EFO_0007990 neutrophil percentage of leukocytes "['A calculated measurement in which the number of neutr…
99 EFO_0008421 non-alcoholic fatty liver disease severity measurement "['Quantification of the severity of non-alcoholic fatty…
100 EFO_1001516 ovarian serous carcinoma "['serous carcinoma located in the ovary']"
101 EFO_1001958 high grade ovarian serous adenocarcinoma "['A rapidly growing serous adenocarcinoma that arises f…
102 EFO_1001516 ovarian serous carcinoma "['serous carcinoma located in the ovary']"
103 EFO_1001958 high grade ovarian serous adenocarcinoma "['A rapidly growing serous adenocarcinoma that arises f…
104 EFO_1001516 ovarian serous carcinoma "['serous carcinoma located in the ovary']"
105 EFO_0010968 phosphate measurement "['Quantification of phosphate levels in a sample.']"
106 EFO_0007984 platelet component distribution width "['The determination of the amount of platelet shape cha…
107 EFO_0004309 platelet count "['The number of\\xa0PLATELETS\\xa0per unit volume in a …
108 EFO_0007985 platelet crit "['The proportion of blood volume that is occupied by pl…
109 EFO_0004462 PR interval "[\"A PR interval is an electrocardiography measurement…
110 EFO_0005055 QRS duration "[\"QRS duration is a measurement of the combined durati…
111 EFO_0004682 QT interval "[\"The QT interval is a measure of the time between the…
112 EFO_0010246 recurrent "['Episodes of disease that occur in individuals who hav…
113 EFO_0005192 red blood cell distribution width "['measure of the variation of red blood cell (RBC) volu…
114 EFO_0007766 response to beta blocker "['Any process that results in a change in state or acti…
115 GO_0097366 response to bronchodilator "['Any process that results in a change in state or acti…
116 EFO_0004351 resting heart rate "['quantification of the number of times the heart beats…
117 EFO_0007986 reticulocyte count "['The number of reticulocytes per unit volume of blood.…
118 EFO_0008579 risk-taking behaviour "['The tendency to take risks. Risk-taking behaviour is …
119 EFO_0009820 seeing a general practitioner for nerves, anxiety, tension or depression, self-reported "['Seeing a general practitioner for nerves, anxiety, te…
120 EFO_0009821 seeing a psychiatrist for nerves, anxiety, tension or depression, self-reported "['Seeing a psychiatrist for nerves, anxiety, tension or…
121 EFO_0009799 self-reported trait "['Characteristics of an individual that are reported by…
122 EFO_0009821 seeing a psychiatrist for nerves, anxiety, tension or depression, self-reported "['Seeing a psychiatrist for nerves, anxiety, tension or…
123 EFO_0009820 seeing a general practitioner for nerves, anxiety, tension or depression, self-reported "['Seeing a general practitioner for nerves, anxiety, te…
124 EFO_0009819 comparative body size at age 10, self-reported "[\"Description of an individual's body size at age 10 c…
125 EFO_0004735 serum alanine aminotransferase measurement "['Is a quantification of serum alanine aminotransferase…
126 EFO_0004535 serum albumin measurement "['An albumin measurement is a quantification of albumin…
127 EFO_0004532 serum gamma-glutamyl transferase measurement "['Serum gamma-glutamyl transferase level measurement is…
128 EFO_0004579 serum IgE measurement "[\"A serum immunoglobulin E measurement is the measurem…
129 EFO_0004568 serum non-albumin protein measurement "['The measurement of the non-albumin portion of blood p…
130 EFO_0009795 serum urea measurement "['Quantification of the amount of urea in serum.']"
131 EFO_0004696 sex hormone-binding globulin measurement "['Is a quantification of sex hormone binding globulin. …
132 EFO_0009282 sodium measurement "['A quantitative measurement of the amount of sodium pr…
133 EFO_0006335 systolic blood pressure "['The blood pressure during the contraction of the left…
134 EFO_0004908 testosterone measurement "['is a quantification of testosterone, typically in ser…
135 EFO_0009933 Thyroid preparation use measurement "['Quantification of some aspect of the use of thyroid p…
136 EFO_0004536 total blood protein measurement "['A total blood protein measurement is a quantification…
137 EFO_0004574 total cholesterol measurement "['A total cholesterol measurement is the quantification…
138 EFO_0004530 triglyceride measurement "['A triglyceride measurement is a quantification of tr…
139 EFO_0003761 unipolar depression "['A mood disorder having a clinical course involving on…
140 EFO_0004531 urate measurement "['A urate measurement is the quantification of some ura…
141 EFO_0004761 uric acid measurement "['Is a quantification of uric acid, typically in blood.…
142 EFO_0007778 urinary albumin to creatinine ratio "['quantification of the ratio of albumin to creatinine …
143 EFO_0005116 urinary metabolite measurement "['quantification of some metabolite in urine']"
144 EFO_0010952 urinary potassium measurement "['A quantitative measurement of the total amount of pot…
145 EFO_0010967 urinary microalbumin measurement "['The quantification of microalbumin in urine.']"
146 EFO_0007778 urinary albumin to creatinine ratio "['quantification of the ratio of albumin to creatinine …
147 EFO_0010967 urinary microalbumin measurement "['The quantification of microalbumin in urine.']"
148 EFO_0010952 urinary potassium measurement "['A quantitative measurement of the total amount of pot…
149 EFO_0004631 vitamin D measurement "['A quantification of Vitamin D levels, typically in bl…
150 EFO_0004342 waist circumference "['The measurement around the body at the level of the\\…
According to the schema, the Demographic
object may contain an interval
object, whose type
value must be one of: range
, iqr
or ci
. It would be nice if this is followed strictly, namely, case-sensitive.
curl -X GET "https://www.pgscatalog.org/rest/performance/all?offset=1260&limit=20&format=json" -H "accept: application/json" | jq '.' | grep -n iqr
1158: "type": "iqr",
1223: "type": "iqr",
1288: "type": "iqr",
1353: "type": "iqr",
curl -X GET "https://www.pgscatalog.org/rest/performance/all?offset=380&limit=20&format=json" -H "accept: application/json" | jq '.' | grep -n IQR
260: "type": "IQR",
325: "type": "IQR",
390: "type": "IQR",
455: "type": "IQR",
520: "type": "IQR",
585: "type": "IQR",
650: "type": "IQR",
715: "type": "IQR",
780: "type": "IQR",
845: "type": "IQR",
910: "type": "IQR",
975: "type": "IQR",
List of things to check/consider:
.
as is given to us in some files? Need to check how many PGS this affects.Hi PGS Catalog team,
perhaps it would be nice to have a bit more information about releases:
date
as a kind of key but this won't work if you make more than one release in one day. So perhaps returning a timestamp with resolution to the second would be best.catalog_version
and rest_server_version
.Currently this is how I parse your releases endpoints in R:
An object of class "releases"
Slot "releases":
# A tibble: 23 x 5
date n_pgs n_ppm n_pgp notes
<date> <int> <int> <int> <chr>
1 2021-02-03 58 265 8 This release contains 58 new Score(s), 8 new Publication(s) and 265 new Performance metric(s)
2 2021-01-07 6 31 6 This release contains 6 new Score(s), 6 new Publication(s) and 31 new Performance metric(s)
3 2020-12-15 306 313 4 This release contains 306 new Score(s), 4 new Publication(s) and 313 new Performance metric(s)
4 2020-12-08 9 65 6 This release contains 9 new Score(s), 6 new Publication(s) and 65 new Performance metric(s)
5 2020-11-20 4 50 5 This release contains 4 new Score(s), 5 new Publication(s) and 50 new Performance metric(s)
6 2020-11-05 4 19 4 This release contains 4 new Score(s), 4 new Publication(s) and 19 new Performance metric(s)
7 2020-10-19 79 79 1 This release contains 79 new Score(s), 1 new Publication(s) and 79 new Performance metric(s)
8 2020-10-16 1 1 1 This release contains 1 new Score(s), 1 new Publication(s) and 1 new Performance metric(s)
9 2020-09-18 10 34 3 This release contains 10 new Score(s), 3 new Publication(s) and 34 new Performance metric(s)
10 2020-09-04 6 17 3 This release contains 6 new Score(s), 3 new Publication(s) and 17 new Performance metric(s)
# … with 13 more rows
Slot "pgs_ids":
# A tibble: 721 x 2
date pgs_id
<date> <chr>
1 2021-02-03 PGS000668
2 2021-02-03 PGS000669
3 2021-02-03 PGS000670
4 2021-02-03 PGS000671
5 2021-02-03 PGS000672
6 2021-02-03 PGS000673
7 2021-02-03 PGS000674
8 2021-02-03 PGS000675
9 2021-02-03 PGS000676
10 2021-02-03 PGS000677
# … with 711 more rows
Slot "ppm_ids":
# A tibble: 1,533 x 2
date ppm_id
<date> <chr>
1 2021-02-03 PPM001396
2 2021-02-03 PPM001397
3 2021-02-03 PPM001398
4 2021-02-03 PPM001399
5 2021-02-03 PPM001400
6 2021-02-03 PPM001401
7 2021-02-03 PPM001402
8 2021-02-03 PPM001403
9 2021-02-03 PPM001404
10 2021-02-03 PPM001405
# … with 1,523 more rows
Slot "pgp_ids":
# A tibble: 133 x 2
date pgp_id
<date> <chr>
1 2021-02-03 PGP000128
2 2021-02-03 PGP000129
3 2021-02-03 PGP000130
4 2021-02-03 PGP000132
5 2021-02-03 PGP000133
6 2021-02-03 PGP000134
7 2021-02-03 PGP000135
8 2021-02-03 PGP000136
9 2021-01-07 PGP000122
10 2021-01-07 PGP000123
# … with 123 more rows
Sometimes we index GWAS studies which are pre-release and don’t have curated metadata (e.g. missing sample numbers that are reported as NR). We should check which can be updated periodically. A current example is: https://www.ebi.ac.uk/gwas/studies/GCST90137411
Create django admin infrastructure to track and annotate PGS Catalog publication eligibility and curation status. Requirements:
Restart the new web display as quite a lot of code have changed since the first PR
Possible mistake here?
curl -X GET "https://www.pgscatalog.org/rest/cohort/NICCC" -H "accept: application/json" | jq '.'
{
"size": 2,
"count": 2,
"next": null,
"previous": null,
"results": [
{
"name_short": "NICCC",
"name_full": "National Israeli Cancer Control Centre",
"associated_pgs_ids": {
"development": [
"PGS000721"
],
"evaluation": []
}
},
{
"name_short": "NICCC",
"name_full": "National Israeli Cancer Control Center",
"associated_pgs_ids": {
"development": [],
"evaluation": [
"PGS000004",
"PGS000005",
"PGS000006",
"PGS000351",
"PGS000352"
]
}
}
]
}
curl -s -X GET "https://www.pgscatalog.org/rest/score/PGS000737?format=json" -H "accept: application/json" | jq '.' | grep method_params
"method_params": "NR",
Add the PGS Catalog release date information for Publication and Score models in the API and downloads.
The score https://www.pgscatalog.org/score/PGS000737/ indicates a sample size of 1,427 individuals.
however, in the source publication 10.1093/eurheartj/ehz435, I only find the numbers 1,400 and 1,368 after filtering.
Mistake here?
In the PGS Catalog objects returned by REST API endpoints we see that the "stage" annotation of pgs_ids
and samples are always done implicitly, i.e., in the name of keys of parent JSON elements, e.g.:
/rest/score
includes objects samples_variants
and samples_training
, that then include the actual samples whose stage annotation has to be read from the parent elements, i.e, samples_variants
corresponds to samples annotated with stage being "gwas", or "gwas variants", or perhaps "discovery", and samples_training
ought to be annotated with "training". It would be better if these data were actually values, and not keys of parent elements, as it is now.associated_pgs_ids
, which is split into development
and evaluation
, again two stage annotations of pgs_ids
that would be best moved out of the parent elements' names by creating a new key:value pair ("stage": "development" or "stage": "evaluation").Would it not be better to settle down on a new categorical variable, named "stage", whose possible levels would be:
and then annotate PGSes and samples with "stage": .
This will decrease the current nesting level where this info is needed and make the parsing more straightforward on the user end. Here's a scribble indicating where I think changes would be required: out.pdf.
PGS000116 is anti-correlated to all of the other scores in EFO_0001645. Moreover, the journal article associated with it never mentions PGS000116, not even in supplementary text. So, it makes me think that the creator didn't upload it but someone else did. Was it mistakenly multiplied by -1 and is therefore a resilience rather than disease risk score? I also wonder about PGS003727 ...
Could the score overview web page have additional detail about precisely who uploaded the score?
Some PGS scoring files include an undocumented column in the schema: 'OR'.
Is it a mistake in these files or is it a missing schema documentation entry in here? In any case, it is not consistent across PGS scoring files. Here I provide two inconsistent examples, but this issue seems to happen throughout many of the PGS scoring files provided in the ftp server.
curl -s 'http://ftp.ebi.ac.uk/pub/databases/spot/pgs/scores/PGS000001/ScoringFiles/PGS000001.txt.gz' | gunzip -c | grep -P '\bOR\b'
rsID chr_name effect_allele reference_allele effect_weight locus_name OR
curl -s 'http://ftp.ebi.ac.uk/pub/databases/spot/pgs/scores/PGS000004/ScoringFiles/PGS000004.txt.gz' | gunzip -c | grep -P '\bchr_name\b'
chr_name chr_position effect_allele reference_allele effect_weight allelefrequency_effect
PGS000662.txt.gz
has the following columns:
rsID
chr_name
chr_position
effect_allele
reference_allele
effect_weight
weight_type
allelefrequency_effect_European
allelefrequency_effect_African
allelefrequency_effect_Asian
allelefrequency_effect_Hispanic
allelefrequency_effect
is part of the documented columns (10.1101/2020.05.20.20108217v1):
However these variations are not:
allelefrequency_effect_European
allelefrequency_effect_African
allelefrequency_effect_Asian
allelefrequency_effect_Hispanic
curl -sX GET 'http://ftp.ebi.ac.uk/pub/databases/spot/pgs/scores/PGS000662/ScoringFiles/PGS000662.txt.gz' | gunzip | grep rsID
rsID chr_name chr_position effect_allele reference_allele effect_weight weight_type allelefrequency_effect_European allelefrequency_effect_African allelefrequency_effect_Asian allelefrequency_effect_Hispanic
One of your latest changes to the REST API was the split of the array of the field associated_pgs_ids (from the /rest/publication/
endpoint) in 2 arrays development and evaluation. I believe this change should be accompanied by other changes to keep things consistent, or alternatively regress a bit here.
Take this example snippet of PGP000013
from a response to https://www.pgscatalog.org/rest/publication/PGP000013
:
{
"id": "PGP000013",
"title": "Type 1 Diabetes Risk in African-Ancestry Participants and Utility of an Ancestry-Specific Genetic Risk Score.",
"doi": "10.2337/dc18-1727",
"PMID": 30659077,
"journal": "Diabetes Care",
"firstauthor": "Onengut-Gumuscu S",
"date_publication": "2019-01-18",
"authors": "Onengut-Gumuscu S, Chen WM, Robertson CC, Bonnie JK, Farber E, Zhu Z, Oksenberg JR, Brant SR, Bridges SL, Edberg JC, Kimberly RP, Gregersen PK, Rewers MJ, Steck AK, Black MH, Dabelea D, Pihoker C, Atkinson MA, Wagenknecht LE, Divers J, Bell RA, SEARCH for Diabetes in Youth, Type 1 Diabetes Genetics Consortium, Erlich HA, Concannon P, Rich SS.",
"associated_pgs_ids": {
"development": [
"PGS000023"
],
"evaluation": [
"PGS000021",
"PGS000023"
]
}
}
Given that PGS000021 is associated with PGP000013, I would expect that querying for score PGS000021 would list also PGP000013 as an associated publication. However the response to https://www.pgscatalog.org/rest/score/PGS000021?format=json
only shows PGP000011:
{
"id": "PGS000021",
"name": "GRS1",
"ftp_scoring_file": "http://ftp.ebi.ac.uk/pub/databases/spot/pgs/scores/PGS000021/ScoringFiles/PGS000021.txt.gz",
"publication": {
"id": "PGP000011",
"title": "A Type 1 Diabetes Genetic Risk Score Can Aid Discrimination Between Type 1 and Type 2 Diabetes in Young Adults.",
"doi": "10.2337/dc15-1111",
"PMID": 26577414,
"journal": "Diabetes Care",
"firstauthor": "Oram RA",
"date_publication": "2015-11-17"
},
...
I am guessing that now it would be important to also split the publication field into development and evaluation, and have arrays of publications inside them. Or, in alternative, include a new field in publications, i.e., stage, and simply have an array of publications with this extra variable.
Nevertheless, I had the understanding that the /rest/score
endpoints returned information associated only with the development of a PGS, and not its evaluation. So the publications returned in this context would only be related to development. I think that would be a good idea as you already have the PPM concept that lists associations with respective PGSes and whose publications are also listed in the objects returned by /rest/performance/
.
So right now, we have a hybrid situation, where PGPs map to "development" and "evaluation" PGS, but PGS only map to "development" PGP. Finally PPM only map to "evaluation" PGP.
The PGS traits dataset (pgs_traits_data.csv
) entry for ischemic stroke (http://purl.obolibrary.org/obo/HP_0002140) is, I think, malformed and contains
The current value is "https://www.ebi.ac.uk/ols/ontologies/efo/terms?iri=http://purl.obolibrary.org/obo/HP_0002140"
Whereas it should be "http://purl.obolibrary.org/obo/HP_0002140"
Hi PGS Catalog Team
Another question here about the count
field in ancestry_distribution
in scores
.
With count
you are conflating two different concepts: sample size and number of sample sets. Have you considered splitting them in two different fields?
Here's how I'm representing this data on the client side, e.g., PGS000018
:
$ancestries
# A tibble: 3 x 7
pgs_id stage sample_size n_sample_sets ..resource ..timestamp ..page
<chr> <chr> <dbl> <dbl> <chr> <dttm> <int>
1 PGS000018 gwas 382026 0 https://www.pgscatalog.org/rest/score/PGS000018?format=json 2021-05-07 13:08:04 1
2 PGS000018 dev 3000 0 https://www.pgscatalog.org/rest/score/PGS000018?format=json 2021-05-07 13:08:04 1
3 PGS000018 eval NA 16 https://www.pgscatalog.org/rest/score/PGS000018?format=json 2021-05-07 13:08:04 1
$ancestry_frequencies
# A tibble: 12 x 7
pgs_id stage ancestry_class_symbol frequency ..resource ..timestamp ..page
<chr> <chr> <chr> <dbl> <chr> <dttm> <int>
1 PGS000018 gwas AFR 0.8 https://www.pgscatalog.org/rest/score/PGS000018?format=json 2021-05-07 13:08:04 1
2 PGS000018 gwas AMR 1.1 https://www.pgscatalog.org/rest/score/PGS000018?format=json 2021-05-07 13:08:04 1
3 PGS000018 gwas EAS 3 https://www.pgscatalog.org/rest/score/PGS000018?format=json 2021-05-07 13:08:04 1
4 PGS000018 gwas EUR 37 https://www.pgscatalog.org/rest/score/PGS000018?format=json 2021-05-07 13:08:04 1
5 PGS000018 gwas GME 0.6 https://www.pgscatalog.org/rest/score/PGS000018?format=json 2021-05-07 13:08:04 1
6 PGS000018 gwas MAE 50.9 https://www.pgscatalog.org/rest/score/PGS000018?format=json 2021-05-07 13:08:04 1
7 PGS000018 gwas SAS 6.7 https://www.pgscatalog.org/rest/score/PGS000018?format=json 2021-05-07 13:08:04 1
8 PGS000018 dev MAE 100 https://www.pgscatalog.org/rest/score/PGS000018?format=json 2021-05-07 13:08:04 1
9 PGS000018 eval AFR 12.5 https://www.pgscatalog.org/rest/score/PGS000018?format=json 2021-05-07 13:08:04 1
10 PGS000018 eval AMR 12.5 https://www.pgscatalog.org/rest/score/PGS000018?format=json 2021-05-07 13:08:04 1
11 PGS000018 eval EUR 68.8 https://www.pgscatalog.org/rest/score/PGS000018?format=json 2021-05-07 13:08:04 1
12 PGS000018 eval MAE 6.2 https://www.pgscatalog.org/rest/score/PGS000018?format=json 2021-05-07 13:08:04 1
$multi_ancestry_composition
# A tibble: 6 x 7
pgs_id stage multi_ancestry_class_symbol ancestry_class_symbol ..resource ..timestamp ..page
<chr> <chr> <chr> <chr> <chr> <dttm> <int>
1 PGS000018 gwas MAE EUR https://www.pgscatalog.org/rest/score/PGS000018?format=json 2021-05-07 13:08:04 1
2 PGS000018 gwas MAE SAS https://www.pgscatalog.org/rest/score/PGS000018?format=json 2021-05-07 13:08:04 1
3 PGS000018 dev MAE EUR https://www.pgscatalog.org/rest/score/PGS000018?format=json 2021-05-07 13:08:04 1
4 PGS000018 dev MAE NR https://www.pgscatalog.org/rest/score/PGS000018?format=json 2021-05-07 13:08:04 1
5 PGS000018 eval MAE EUR https://www.pgscatalog.org/rest/score/PGS000018?format=json 2021-05-07 13:08:04 1
6 PGS000018 eval MAE NR https://www.pgscatalog.org/rest/score/PGS000018?format=json 2021-05-07 13:08:04 1
So I split count
in sample_size
and n_sample_sets
, and then I set n_sample_sets
to zero for stages gwas and dev, and make sample_size
NA (Not Available) for stage eval. Perhaps you could provide the total sample size for stage eval too?
In your medRxiv preprint, page 14, Supplemental Note 1, it reads Suggested.
Does that mean optional?
https://www.pgscatalog.org/rest/score/search?pmid=PGP000003&format=json
triggers a 500 Internal Server Error.
NB: While I do understand that PGP000003 is a not a possible value for pmid
, the error shouldn't be an internal server error nevertheless.
Investigate the replacement of the |
separated TextFields of synonyms and mapped_terms (from the model EFOTrait_Base) by JSON fields.
e.g.: for breast cancer
synonyms = [ {'name': 'cancer of breast'}, {'name': 'mammary cancer'}, ... ]
mapped_terms = [ {'name': 'DOID:1612'}, {'name': 'ICD10CM:C50'}, ...]
This will require changes in:
Dear PGScatalog developers,
Thanks for this very nice and comprehensive respource.
We want to calulate PGS for height and we tried using https://www.pgscatalog.org/score/PGS001405/. However, we noticed that on some rows there are multiple nucleotides listed for effect and/or reference allele. How should this be interpreted? Here are some examples:
rs71579661 1 160151624 GC G 2.117951e-02
rs35769739 12 106500158 G GT -2.205375e-02
rs200768101 17 42981237 GGGA G 2.630668e-02
rs138731997 19 41888850 A AGGGGACTGGGC -4.669412e-02
Also on some rows the rsID is missing:
10 26518418 T C 1.159232e-02
19 49244218 CAA C 4.904787e-02
19 52004795 GT G 5.072118e-02
Many thanks for your help!
The endpoint /rest/trait/{trait_id}
should return a EFOTrait_OntologyChild
element as per the schema. This object type should include a child_traits
element at the top level.
I have just played with the examples here: https://www.pgscatalog.org/rest/#/Trait%20endpoints/getTrait but it seems that the element child_traits
is never included in the response object, resembling the response of type EFOTrait_Ontology
.
BTW: Would you consider making both endpoints return objects of type EFOTrait_OntologyChild
?
It would be nice to provide the parameter include_children
to the /rest/trait/all
endpoint.
Right now what I do, as an alternative, is to get all EFO identifiers with /rest/trait/all
first, and then run one request to /rest/trait/{trait_id}
for each EFO identifier; not ideal.
In JSON responses some of the values are empty; it would be best to replace them with null
.
Example:
curl -X GET "https://www.pgscatalog.org/rest/performance/search?pgs_id=PGS000004&offset=0&limit=20&format=json" -H "accept: application/json" | jq '.' | grep -n "\"\""
189: "source_GWAS_catalog": "",
190: "source_PMID": "",
240: "covariates": "",
241: "performance_comments": ""
271: "source_GWAS_catalog": "",
272: "source_PMID": "",
322: "covariates": "",
323: "performance_comments": ""
353: "source_GWAS_catalog": "",
354: "source_PMID": "",
404: "covariates": "",
405: "performance_comments": ""
435: "source_GWAS_catalog": "",
436: "source_PMID": "",
486: "covariates": "",
487: "performance_comments": ""
517: "source_GWAS_catalog": "",
518: "source_PMID": "",
568: "covariates": "",
569: "performance_comments": ""
If we go here, we see "age, sex and the first ten principal components of genetic ancestry" are controlled for in performance metrics. Other PGSs have more or less.
Are the covariates (as reported in the PGS Catalog):
An example of the question that A could answer is "how well does PRS predict without the effect of X Y Z known risk factors?"
An example of the question that B could answer is "how much does a model with both PRS and X Y Z known risk factors predict?"
I believe it is A, but I wanted to verify that it is not C. Thank you.
Related to #152.
Trailing space still prevails in other variables:
curl -X GET "https://www.pgscatalog.org/rest/performance/search?pgs_id=PGS000004&offset=0&limit=20&format=json" -H "accept: application/json" | jq '.' | grep -n "\b \""
1478: "phenotyping_reported": "Contralateral breast cancer ",
1776: "covariates": "Age, country ",
2080: "covariates": "Age, country ",
2086: "phenotyping_reported": "Contralateral breast cancer ",
2694: "phenotyping_reported": "Contralateral breast cancer ",
2772: "covariates": "Age, country ",
2856: "covariates": "Age, country ",
curl -X GET "https://www.pgscatalog.org/rest/score/PGS000014?format=json" -H "accept: application/json" | jq '.' | grep -n "\b \""
111: "method_name": "LDPred ",
PGS002777 and PGS002778 should be in EFO_0000537 and not in EFO_0001645, shouldn't they? Correlation of these two is about 0 with all other EFO_0001645 members but all other scores are highly positively correlated with each other.
The schema for the response from /rest/cohort/{cohort_symbol}
is a Cohort_extended
object.
According to the documentation, the object associated_pgs_ids
should be an array. Both the schema description and the cached response example given agree with this. However, an actual response returns associated_pgs_ids
as an object of two elements: development
and evaluation
. I believe the documentation needs an update.
The reported traits (in Score models) need some cleanup in order to improve the display of the Trait entries in the Search results.
Here are few examples:
Potential typos
cardiovascular measurement:
- Heart rate
- Heart rate (AR)
- Heat rate <==== Typo ?
Same trait but reported slighty differently
cardiovascular measurement:
- LDL
- LDL Cholesterol
- LDL cholesterol
...
- QT interval
- QT-interval
Will need to merge similar reported traits and fix typos
The current headers are hard to parse are work with in the harmonisation pipeline. Propose the following fix:
Current:
### PGS CATALOG SCORING FILE - see www.pgscatalog.org/downloads/#dl_ftp for additional information
## POLYGENIC SCORE (PGS) INFORMATION
# PGS ID = PGS000001
# Reported Trait = Breast Cancer
# Original Genome Build = NR
# Number of Variants = 77
## SOURCE INFORMATION
# PGP ID = PGP000001
# Citation = Mavaddat N et al. J Natl Cancer Inst (2015). doi:10.1093/jnci/djv036
Potentially:
###PGS CATALOG SCORING FILE - see www.pgscatalog.org/downloads/#dl_ftp for additional information
##POLYGENIC SCORE (PGS) INFORMATION
#pgs_id=PGS000001
#trait_reported=Breast Cancer
#genome_build=NR
#variants_number=77
##SOURCE INFORMATION
#pgp_id=PGP000001
#citation=Mavaddat N et al. J Natl Cancer Inst (2015). doi:10.1093/jnci/djv036
This would change code in the initial loading of the PGS files from the raw/source files.
Hi PGS Catalog team,
This is not an issue/bug but more of a question or feature request.
I noticed you had included a few new endpoints, some of which requested by me. Thank you so much, really appreciated!
Regarding the new /rest/info
may I leave a few suggestions?
You've probably thought about this, but still here are my five cents. I think it would be nice to leave the /rest/info
endpoint only with details about the software side of the REST API, and the /rest/release/
endpoint reserved only for data related info.
So this would imply removing this JSON element from the /rest/info
response:
"latest_release": {
"date": "2021-04-28",
"scores": 761,
"publications": 167,
"traits": 204
},
and add extra fields in the /rest/release
response by including also the number of traits, and perhaps the number of sample sets. I understand that in /rest/release
you were providing the increments in new entities whereas in /rest/info
you are giving the totals. I think it would be nice to stick to increments, as we can always add them together to get the total at a given point in time.
In the R package quincunx, the main table resulting from a request to /rest/release/all
looks like:
# A tibble: 27 x 5
date n_pgs n_ppm n_pgp notes
<date> <int> <int> <int> <chr>
1 2021-04-28 4 25 7 This release contains 4 new Score(s), 7 new Publication(s) and 25 new Performance metric(s)
2 2021-04-07 6 22 5 This release contains 6 new Score(s), 5 new Publication(s) and 22 new Performance metric(s)
3 2021-03-22 13 144 11 This release contains 13 new Score(s), 11 new Publication(s) and 144 new Performance metric(s)
4 2021-02-23 17 118 11 This release contains 17 new Score(s), 11 new Publication(s) and 118 new Performance metric(s)
5 2021-02-03 58 265 8 This release contains 58 new Score(s), 8 new Publication(s) and 265 new Performance metric(s)
6 2021-01-07 6 31 6 This release contains 6 new Score(s), 6 new Publication(s) and 31 new Performance metric(s)
7 2020-12-15 306 313 4 This release contains 306 new Score(s), 4 new Publication(s) and 313 new Performance metric(s)
8 2020-12-08 9 65 6 This release contains 9 new Score(s), 6 new Publication(s) and 65 new Performance metric(s)
9 2020-11-20 4 50 5 This release contains 4 new Score(s), 5 new Publication(s) and 50 new Performance metric(s)
10 2020-11-05 4 19 4 This release contains 4 new Score(s), 4 new Publication(s) and 19 new Performance metric(s)
# … with 17 more rows
So it would be nice to have in addition, as I said, the n_efo
(number of new traits) and n_pss
(number of new sample sets) migrated from the response from /rest/info
.
Also move the citation
and terms_of_use
to the response from /rest/release
. Don't you think it belongs here more than in /rest/info/
?
"citation": {
"title": "The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation",
"doi": "10.1038/s41588-021-00783-5",
"PMID": 33692568,
"authors": "Samuel A. Lambert, Laurent Gil, Simon Jupp, Scott C. Ritchie, Yu Xu, Annalisa Buniello, Aoife McMahon, Gad Abraham, Michael Chapman, Helen Parkinson, John Danesh, Jacqueline A. L. MacArthur and Michael Inouye.",
"journal": "Nature Genetics",
"year": 2021
},
"terms_of_use": "https://www.ebi.ac.uk/about/terms-of-use"
It would be nice to provide a few extra endpoints under /rest/info
, namely:
/rest/info/all
for all REST API versions (this would be the most useful of the endpoints here suggested, because, as of now, I can only see that latest changes to the API, and oftentimes it would be nice to review the changelog over a longer period of time; otherwise, I am left with no other alternative than checking the GitHub repository and revise the commit history... which is not very efficient.)/rest/info/{release_date}
, analogous to /rest/release/{release_date}
/rest/info/{version}
, e.g., /rest/info/1.7
So in its final form, the JSON from /rest/info
would be simply an array of:
"date": "2021-04-28",
"version": 1.7,
"changelog": [
"New data 'ancestry_distribution' in the `/rest/score` endpoints, providing information about ancestry distribution on each stage of the PGS",
"New endpoint `/rest/ancestry_categories` providing the list of ancestry symbols and names."
]
Again, all in all, thanks for the terrific work! These are just some ideas, and are not really that important aspects of the REST API.
Would be nice to have, but not high priority.
Do I understand it correctly that although the endpoint /rest/trait/search
allows for inclusion of child traits with include_children=1
, in reality is not possible to tell apart the traits that are direct matches of the queries, and which ones are children thereof.
Instead, would it not be possible to make /rest/trait/search
return EFOTrait_OntologyChild
responses, like the endpoint /rest/trait/{trait_id}
does?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.