Comments (15)
Hi @punkish, here are our first comments with regards to the "v2/treatments" endpoint.
It looks very promising!
I saw you have pagesize, pagenum, but I didn't saw the "sortBy" options.
For all facet groups we will need a list of values with the respective count, i.e.:
facets: {
journalTitles: {
displayName: "Journal Titles",
total: 524,
data: [
{
displayName: Zookeys,
total: 10
},
{
displayName: Zootaxa,
total: 16
},
---
]
},
---
}
In the current design we have "Article Author". @mguidoti could you please check if that is correct?
I guess it should be "authorityName"?
Here are the missing facet groups for the treatments:
relatedMaterialCitations: {},
relatedTreatmentCitations: {},
hasFigures: {},
collectionCodes: {},
Here are the missing fields for the treatment record:
records: [
{
figuresCnt: 10,
materialsCnt: 10,
externalLinks: {
plazi: {href:"", name:"Plazi"},
zenodo: {href:"", name:"Zenodo"},
gbif: {href:"", name:"GBIF"},
},
...
},
...
]
We have to discuss (@myrmoteras ?) if we want/can to spit the current "treatmentTitle"
into "treatmentTaxon" and "treatmentAuthority", i.e.:
"treatmentTitle": "Maratus felinus Schubert, 2019, sp. nov."
into
"treatmentTaxon": "Maratus felinus",
"treatmentAuthority": "Schubert, 2019, sp. nov.",
or we can simply change the design of a treatment in the list of results.
A general quetion with regards to the dashboards for the treatments, how do you plan to return them, as a separate endpoint (i.e. /v2/treatmentsDasboards) or as a part of the "/v2/treatments" endpoint?
... more thought after the Biodiverity_Next ...
from blr-website.
I saw you have pagesize, pagenum, but I didn't saw the "sortBy" options.
sortBy
is something that is a client-side requirement. The API will provide the results sorted by the primary key. But since every client can have different requirements, and since it is trivial to do a JavaScript sort, that is best done by the client.
For all facet groups we will need a list of values with the respective count, i.e.:
facets: { journalTitles: { displayName: "Journal Titles", total: 524, data: [ { displayName: Zookeys, total: 10 }, { displayName: Zootaxa, total: 16 }, --- ] }, --- }
I investigated providing counts. There are two issues here: One, looking at several implementations (Amazon comes to mind), facets don't show any counts. Two, I tried doing counts, but the performance is really bad (other than for the first run, which can be cached).
Interestingly, if you look at https://zenodo.org, it does provide facets with counts, but the counts are really misleading. As you click on the facets, the counts don't change. So, perhaps they are facing the same issue that I am facing – the first time the counts are probably cached so easily provided. But with every click, the result set becomes smaller and yet, the facet counts don't change. That gets really confusing for the user.
My suggestion, try using just the facets. With every click on a facet, a new result set will be fetched because the entire result set is bigger than just the pageSize
worth of results that are displayed.
In the current design we have "Article Author". @mguidoti could you please check if that is correct?
I guess it should be "authorityName"?
I am not sure what the above means. Can you clarify?
Here are the missing facet groups for the treatments:
relatedMaterialCitations: {}, relatedTreatmentCitations: {}, hasFigures: {}, collectionCodes: {},
The above are not fields in the treatments table. Please look at the treatments document that @tcatapano made. If the above are required as facets, I have to figure out how to provide them, if at all possible. For example, if you want the count of relatedMaterialCitations
, we run into the same problem as I described above regarding counts.
Here are the missing fields for the treatment record:
records: [ { figuresCnt: 10, materialsCnt: 10, externalLinks: { plazi: {href:"", name:"Plazi"}, zenodo: {href:"", name:"Zenodo"}, gbif: {href:"", name:"GBIF"}, }, ... }, ... ]
I will check if the above fields exist in the treatments table as is or if they have to be created. Will get back to you soon. Also, if the above fields can be returned, they have to be added to the specs document that @tcatapano made.
We have to discuss (@myrmoteras ?) if we want/can to spit the current "treatmentTitle"
into "treatmentTaxon" and "treatmentAuthority", i.e.:"treatmentTitle": "Maratus felinus Schubert, 2019, sp. nov." into "treatmentTaxon": "Maratus felinus", "treatmentAuthority": "Schubert, 2019, sp. nov.",
or we can simply change the design of a treatment in the list of results.
A general quetion with regards to the dashboards for the treatments, how do you plan to return them, as a separate endpoint (i.e. /v2/treatmentsDasboards) or as a part of the "/v2/treatments" endpoint?
As I explained in an earlier post (have to find the reference), think of the dashboards as summary of the current result set (the result of any query). These summaries will be provided as a part of the treatments endpoint. There is no resource called treatmentsDashboards
so that can't be an endpoint. The endpoint is only a legitimate resource, and for now they are, treatments, materialsCitations, figureCitations, bibRefCitations, treatmentCitations, and treatmentAuthors
from blr-website.
hola @teodorgeorgiev, I have just pushed some improvements to Zenodeo. Please check out the facets being returned now. For example, https://zenodeo.punkish.org/v2/treatments
returns the following (only part of the output shown below)
{
"value": {
"num-of-records": 308587,
"search-criteria": {
"page": "1",
"size": "30",
"limit": 30,
"offset": 0
},
"_links": {
"self": {
"href": "https://zenodeo.punkish.org/v2/treatments?page=1&size=30"
}
},
"facets": {
"journalTitle": [
{
"journalTitle": "& al. • Phylogeny of Iresine and pollen evolution (Amaranthaceae)",
"c": 36
},
{
"journalTitle": "1",
"c": 1
},
{
"journalTitle": "AMERICAN MUSEUM NOVITATES",
"c": 2
},
{
"journalTitle": "AMERICAN MUSEUM Novitates",
"c": 7
},
{
"journalTitle": "Abhandlungen herausgegeben von der Senckenbergischen Naturforschenden Gesellschaft",
"c": 1
},
{
"journalTitle": "Abhandlungen und Berichte des Naturkundemuseums Görlitz",
"c": 3
},
{
"journalTitle": "Acarologia",
"c": 4
},
{
"journalTitle": "Acarology",
"c": 4
},
{
"journalTitle": "Acta Arachnologica",
"c": 78
},
{
"journalTitle": "Acta Arachnologica Sinica",
"c": 2
},
{
"journalTitle": "Acta Biol., Venez",
"c": 70
},
{
"journalTitle": "Acta Entomologica Musei Nationalis Pragae",
"c": 6
},
The performance is still not up to what I would call satisfactory, but the cached values are returned instantly, of course. I am going to continue to chip away to make this better.
cc @myrmoteras
from blr-website.
Hi @punkish please check below our commments regarding the treatment endpoint - Teodor
Hi! I am Georgi from team of pensoft
I saw changes of treatments endpoint for facets and think that is good except these missing resources
species: [
{
species: :string
c: integer
},
],
journalVolume: [
{
journalVolume: :string
c: integer
},
],
relatedMaterialCitations: {
yes: integer, // count
no: integer // count
},
relatedTreatmentCitations: {
yes: integer, // count
no: integer // count
},
hasFigures: {
yes: integer, // count
no: integer // count
},
collectionCodes: {
yes: integer, // count
no: integer // count
},
I saw you have pagesize, pagenum, but I didn't saw the "sortBy" options.
sortBy is something that is a client-side requirement. The API will provide the results sorted by the primary key. But since every client can have different requirements, and since it is trivial to do a JavaScript sort, that is best done by the client.
We expect sortBy
request options to work for sorting all records. We can not sort from client because the set of results is just chunk from the whole set.
sortBy: [oneOf] ASC|DESC
- treatmentAuthors
- journalYear
- materialsCitations
- figureCitations
- treatmentCitations?
Requirments could be see here
from blr-website.
I have been testing various facets and I really don't think they make much sense as is. For example, I added species
to the mix and almost 90,000 rows, many of them with really janky data. Tried journalVolume
and got similar results… almost 4000 rows and meaningless numbers for volumes (journal volumes are, after all, just numbers – is it really meaningful to say that '38' occurs '72' times?). In any case, the biggest problem is the size of the result. When no params are provided, the default result set is almost 5 MB in size. You really don't want to be making users download 5 MB of data just to be able to populate their search widget. This has to be really rethought or scaled down in its ambitions.
Then there is the issue of relatedMaterialCitations
, relatedTreatmentCitations
, hasFigures
, collectionCodes
. These are not columns in the treatments table. I can get the numbers via joins, but they are not similar to the other facets. Even in terms of their structure in the JSON depicted above, they are just objects with 'yes' and 'no' values while the other facets are arrays of objects. Mixing data types for something that should be logically similar doesn't feel right.
from blr-website.
Let's rethink this facets business. For starters, let's say you go to the website and hit search with no params provided. Think of this query as
SELECT Count(*) AS c FROM treatments;
The answer comes back, "There are 250000 treatments" and perhaps the first 30 treatments are shown. Note that the "first 30" is dependent on the sort order. But since the sort order is not provided, (kinda pointless when one is viewing only 30 records), the default sort order is the primary key.
Facets should allow you to narrow the result. But the facets themselves should not be overwhelming. For example, if all 250K treatments came from five journals, you could provide the names of those five journals and the number of treatments from each. Clicking on any one of those journals would give you the number of treatments from that journal. The effective SQL query would be
SELECT Count(*) AS c FROM treatments WHERE journal = ?;
Now imagine that instead of 5, all those 250K treatments came from 3000 different journals. There is no way you would provide a list of all those 3000 journals so the user could narrow the records. The web page would be a mess.
So, rethink the facets and use only those that result in a small number of distinct values.
from blr-website.
This is the link to our test website:
http://blr.uplaysandbox.website/
You can play with it and you can see what is available till now.
from blr-website.
This is the link to our test website:
http://blr.uplaysandbox.website/
You can play with it and you can see what is available till now.
I like it 👍 I am working on enhancements to the API and will update you soon
from blr-website.
Hi @punkish please check below our commments regarding the treatment endpoint - Teodor
…
We expect
sortBy
request options to work for sorting all records. We can not sort from client because the set of results is just chunk from the whole set.sortBy: [oneOf]
ASC|DESC
- treatmentAuthors
- journalYear
- materialsCitations
- figureCitations
- treatmentCitations?
Requirments could be see here
Wanted to let you know that I've got sorting working now although I haven't yet pushed the changes to the public API. Am still testing it. Hope to push it up by this weekend. In advance though, please see the following notes:
The following columns are not part of the 'treatments' table. They are related records for every treatment
- treatmentAuthors
- materialsCitations
- figureCitations
I also don't have 'treatmentCitations' in my table. That leaves only 'journalYear' from what you asked for.
Remember, I can only sort by the columns in my table. The columns are
- treatmentTitle
- articleDoi
- zenodoDep
- zoobank
- articleTitle
- publicationDate
- journalTitle
- journalYear
- journalVolume
- journalIssue
- pages
- authorityName
- authorityYear
- kingdom
- phylum
- order
- family
- genus
- species
- status
- taxonomicNameLabel
- rank
Thought sorting by many of the above may not make sense. Note that the default sort is by 'treatmentId' with sort order ASC. The syntax is
?sortBy=<column:DIR>
// for example
?sortBy=journalYear:ASC
More before this week ends.
from blr-website.
hello @howkins and @teodorgeorgiev
apologies for the delay in delivering this API, but I've been busy with testing it and trying to make it fast enough to be usable. I am pushing a working version now but I want you to be aware of a breaking change that is easily fixed.
Now the treatments end-point (and eventually all the end-points) will not return facets
and stats
automatically. Instead, you will have to explicitly ask for them like so
/treatments?q=maratus&journalYear=2005&facets=true&stats=true
In other words, you have to append facets=true
and/or stats=true
for the API to return the respective data. This is because the same end-point can be used for other purposes where facets
or stats
may not be needed. And, actually facets
are really burdensome both for querying and for sending back. On a default query (with no query params), the facets
, as you guys want them, add almost 4.5MB to the data. This is really inefficient. But since I am not going to have a specific API for just facets (API end-points are only for nouns, resources such as images
, treatments
, publications
, etc.)
So, with this caveat, I am pushing the changes now. Your BLR application will break because you are not asking for facets
explicitly. Just add facets=true
to the query string and all will be fine.
Many thanks for your patience as I have worked on this.
from blr-website.
hello @howkins and @teodorgeorgiev,
I have pushed updates to the queries so that they are much faster. I have tried a few queries and there is no timeout happening anymore. But, please do check, and if you face a problem, please open an issue immediately. That is the only way I can solve things. I am hoping the query responses will be in the sub-second range, even with all the facets and stats, but that is an ambitious goal. Hopefully we can get there.
Many thanks
from blr-website.
Hello @punkish I have tried to make most different queries and I found that is much better as productivity but some times I receive status code 504 Gateway Time-out
for queries which i was executed with success before.
from blr-website.
from blr-website.
Ok immediately when I receive this response again i will send you report with example.
I noticed another issue. When open this Query1
for example I see journalVolume like a option for search in facets but when i try to access it (Query2
) i receive Object with statusCode: 400
Query1: https://zenodeo.punkish.org/v2/treatments?facets=true&stats=true&q=temnothorax
Query2: https://zenodeo.punkish.org/v2/treatments?facets=true&stats=true&journalVolume=11&q=temnothorax
from blr-website.
from blr-website.
Related Issues (20)
- BLR Website: testing HOT 3
- BLR website: missing facet for authors?
- BLR website; adding links to ocellus, openbiodiv, synospecies
- testing: blank screen
- BLR website: author is confusing and needs explanation? HOT 7
- testing: Roundtrips HOT 1
- WEbsite: Search term delimitation HOT 1
- website: taxonomic hierarchy HOT 2
- BLR Website release: date? HOT 24
- BLR website URL is not updating with the query HOT 1
- taxonomic treatments; xhtml version of treatment with italics, etc
- taxon authorities in Taxon search HOT 3
- linking Arcadia on the home page HOT 1
- testing: search does not reset HOT 2
- treatment titles should be taxonomicConceptLabel HOT 2
- treatment vs publication HOT 4
- restore zenodeo query url in development site
- Strange thumbnail in image search result HOT 1
- Collection facet not working
- use case: Guardian article about new species published by NHM HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from blr-website.