Giter Site home page Giter Site logo

Dashboards Specs about blr-website HOT 15 OPEN

mguidoti avatar mguidoti commented on July 28, 2024
Dashboards Specs

from blr-website.

Comments (15)

teodorgeorgiev avatar teodorgeorgiev commented on July 28, 2024

Hi @punkish, here are our first comments with regards to the "v2/treatments" endpoint.

It looks very promising!

I saw you have pagesize, pagenum, but I didn't saw the "sortBy" options.

For all facet groups we will need a list of values with the respective count, i.e.:

   facets: {
            journalTitles: {
                displayName: "Journal Titles",
                total: 524,
                data: [
                    {
                        displayName: Zookeys,
                        total: 10
                    },
		    {
                        displayName: Zootaxa,
                        total: 16
                    },
                    ---
                ]
            },
          ---
	}

In the current design we have "Article Author". @mguidoti could you please check if that is correct?
I guess it should be "authorityName"?

Here are the missing facet groups for the treatments:

            relatedMaterialCitations: {},
            relatedTreatmentCitations: {}, 
            hasFigures: {}, 
            collectionCodes: {}, 

Here are the missing fields for the treatment record:

records: [
            {

                figuresCnt: 10,
                materialsCnt: 10,
                externalLinks: {                    
                    plazi: {href:"", name:"Plazi"},
                    zenodo: {href:"", name:"Zenodo"},
                    gbif: {href:"", name:"GBIF"},                    
                },
                ...
            },
            ...
        ]

We have to discuss (@myrmoteras ?) if we want/can to spit the current "treatmentTitle"
into "treatmentTaxon" and "treatmentAuthority", i.e.:

     "treatmentTitle": "Maratus felinus Schubert, 2019, sp. nov."
into
      "treatmentTaxon": "Maratus felinus", 
      "treatmentAuthority": "Schubert, 2019, sp. nov.", 

or we can simply change the design of a treatment in the list of results.

A general quetion with regards to the dashboards for the treatments, how do you plan to return them, as a separate endpoint (i.e. ​/v2​/treatmentsDasboards) or as a part of the "​/v2​/treatments" endpoint?

... more thought after the Biodiverity_Next ...

from blr-website.

punkish avatar punkish commented on July 28, 2024

I saw you have pagesize, pagenum, but I didn't saw the "sortBy" options.

sortBy is something that is a client-side requirement. The API will provide the results sorted by the primary key. But since every client can have different requirements, and since it is trivial to do a JavaScript sort, that is best done by the client.

For all facet groups we will need a list of values with the respective count, i.e.:

   facets: {
            journalTitles: {
                displayName: "Journal Titles",
                total: 524,
                data: [
                    {
                        displayName: Zookeys,
                        total: 10
                    },
		    {
                        displayName: Zootaxa,
                        total: 16
                    },
                    ---
                ]
            },
          ---
	}

I investigated providing counts. There are two issues here: One, looking at several implementations (Amazon comes to mind), facets don't show any counts. Two, I tried doing counts, but the performance is really bad (other than for the first run, which can be cached).

Interestingly, if you look at https://zenodo.org, it does provide facets with counts, but the counts are really misleading. As you click on the facets, the counts don't change. So, perhaps they are facing the same issue that I am facing – the first time the counts are probably cached so easily provided. But with every click, the result set becomes smaller and yet, the facet counts don't change. That gets really confusing for the user.

My suggestion, try using just the facets. With every click on a facet, a new result set will be fetched because the entire result set is bigger than just the pageSize worth of results that are displayed.

In the current design we have "Article Author". @mguidoti could you please check if that is correct?
I guess it should be "authorityName"?

I am not sure what the above means. Can you clarify?

Here are the missing facet groups for the treatments:

            relatedMaterialCitations: {},
            relatedTreatmentCitations: {}, 
            hasFigures: {}, 
            collectionCodes: {}, 

The above are not fields in the treatments table. Please look at the treatments document that @tcatapano made. If the above are required as facets, I have to figure out how to provide them, if at all possible. For example, if you want the count of relatedMaterialCitations, we run into the same problem as I described above regarding counts.

Here are the missing fields for the treatment record:

records: [
            {

                figuresCnt: 10,
                materialsCnt: 10,
                externalLinks: {                    
                    plazi: {href:"", name:"Plazi"},
                    zenodo: {href:"", name:"Zenodo"},
                    gbif: {href:"", name:"GBIF"},                    
                },
                ...
            },
            ...
        ]

I will check if the above fields exist in the treatments table as is or if they have to be created. Will get back to you soon. Also, if the above fields can be returned, they have to be added to the specs document that @tcatapano made.

We have to discuss (@myrmoteras ?) if we want/can to spit the current "treatmentTitle"
into "treatmentTaxon" and "treatmentAuthority", i.e.:

     "treatmentTitle": "Maratus felinus Schubert, 2019, sp. nov."
into
      "treatmentTaxon": "Maratus felinus", 
      "treatmentAuthority": "Schubert, 2019, sp. nov.", 

or we can simply change the design of a treatment in the list of results.

A general quetion with regards to the dashboards for the treatments, how do you plan to return them, as a separate endpoint (i.e. ​/v2​/treatmentsDasboards) or as a part of the "​/v2​/treatments" endpoint?

As I explained in an earlier post (have to find the reference), think of the dashboards as summary of the current result set (the result of any query). These summaries will be provided as a part of the treatments endpoint. There is no resource called treatmentsDashboards so that can't be an endpoint. The endpoint is only a legitimate resource, and for now they are, treatments, materialsCitations, figureCitations, bibRefCitations, treatmentCitations, and treatmentAuthors

from blr-website.

punkish avatar punkish commented on July 28, 2024

hola @teodorgeorgiev, I have just pushed some improvements to Zenodeo. Please check out the facets being returned now. For example, https://zenodeo.punkish.org/v2/treatments returns the following (only part of the output shown below)

{
  "value": {
    "num-of-records": 308587,
    "search-criteria": {
      "page": "1",
      "size": "30",
      "limit": 30,
      "offset": 0
    },
    "_links": {
      "self": {
        "href": "https://zenodeo.punkish.org/v2/treatments?page=1&size=30"
      }
    },
    "facets": {
      "journalTitle": [
        {
          "journalTitle": "& al. • Phylogeny of Iresine and pollen evolution (Amaranthaceae)",
          "c": 36
        },
        {
          "journalTitle": "1",
          "c": 1
        },
        {
          "journalTitle": "AMERICAN MUSEUM NOVITATES",
          "c": 2
        },
        {
          "journalTitle": "AMERICAN MUSEUM Novitates",
          "c": 7
        },
        {
          "journalTitle": "Abhandlungen herausgegeben von der Senckenbergischen Naturforschenden Gesellschaft",
          "c": 1
        },
        {
          "journalTitle": "Abhandlungen und Berichte des Naturkundemuseums Görlitz",
          "c": 3
        },
        {
          "journalTitle": "Acarologia",
          "c": 4
        },
        {
          "journalTitle": "Acarology",
          "c": 4
        },
        {
          "journalTitle": "Acta Arachnologica",
          "c": 78
        },
        {
          "journalTitle": "Acta Arachnologica Sinica",
          "c": 2
        },
        {
          "journalTitle": "Acta Biol., Venez",
          "c": 70
        },
        {
          "journalTitle": "Acta Entomologica Musei Nationalis Pragae",
          "c": 6
        },

The performance is still not up to what I would call satisfactory, but the cached values are returned instantly, of course. I am going to continue to chip away to make this better.

cc @myrmoteras

from blr-website.

howkins avatar howkins commented on July 28, 2024

Hi @punkish please check below our commments regarding the treatment endpoint - Teodor


Hi! I am Georgi from team of pensoft
I saw changes of treatments endpoint for facets and think that is good except these missing resources

species: [
    {
        species: :string
        c: integer
    },
],
journalVolume: [
    {
        journalVolume: :string
        c: integer
    },
],
relatedMaterialCitations: {
    yes: integer, // count
    no: integer // count
},
relatedTreatmentCitations: {
    yes: integer, // count
    no: integer // count
}, 
hasFigures: {
    yes: integer, // count
    no: integer // count
}, 
collectionCodes: {
    yes: integer, // count
    no: integer // count
}, 

I saw you have pagesize, pagenum, but I didn't saw the "sortBy" options.

sortBy is something that is a client-side requirement. The API will provide the results sorted by the primary key. But since every client can have different requirements, and since it is trivial to do a JavaScript sort, that is best done by the client.

We expect sortBy request options to work for sorting all records. We can not sort from client because the set of results is just chunk from the whole set.

sortBy: [oneOf] ASC|DESC

  • treatmentAuthors
  • journalYear
  • materialsCitations
  • figureCitations
  • treatmentCitations?

Requirments could be see here

from blr-website.

punkish avatar punkish commented on July 28, 2024

I have been testing various facets and I really don't think they make much sense as is. For example, I added species to the mix and almost 90,000 rows, many of them with really janky data. Tried journalVolume and got similar results… almost 4000 rows and meaningless numbers for volumes (journal volumes are, after all, just numbers – is it really meaningful to say that '38' occurs '72' times?). In any case, the biggest problem is the size of the result. When no params are provided, the default result set is almost 5 MB in size. You really don't want to be making users download 5 MB of data just to be able to populate their search widget. This has to be really rethought or scaled down in its ambitions.

Then there is the issue of relatedMaterialCitations, relatedTreatmentCitations, hasFigures, collectionCodes. These are not columns in the treatments table. I can get the numbers via joins, but they are not similar to the other facets. Even in terms of their structure in the JSON depicted above, they are just objects with 'yes' and 'no' values while the other facets are arrays of objects. Mixing data types for something that should be logically similar doesn't feel right.

from blr-website.

punkish avatar punkish commented on July 28, 2024

Let's rethink this facets business. For starters, let's say you go to the website and hit search with no params provided. Think of this query as

SELECT Count(*) AS c FROM treatments;

The answer comes back, "There are 250000 treatments" and perhaps the first 30 treatments are shown. Note that the "first 30" is dependent on the sort order. But since the sort order is not provided, (kinda pointless when one is viewing only 30 records), the default sort order is the primary key.

Facets should allow you to narrow the result. But the facets themselves should not be overwhelming. For example, if all 250K treatments came from five journals, you could provide the names of those five journals and the number of treatments from each. Clicking on any one of those journals would give you the number of treatments from that journal. The effective SQL query would be

SELECT Count(*) AS c FROM treatments WHERE journal = ?;

Now imagine that instead of 5, all those 250K treatments came from 3000 different journals. There is no way you would provide a list of all those 3000 journals so the user could narrow the records. The web page would be a mess.

So, rethink the facets and use only those that result in a small number of distinct values.

from blr-website.

howkins avatar howkins commented on July 28, 2024

This is the link to our test website:
http://blr.uplaysandbox.website/
You can play with it and you can see what is available till now.

from blr-website.

punkish avatar punkish commented on July 28, 2024

This is the link to our test website:
http://blr.uplaysandbox.website/
You can play with it and you can see what is available till now.

I like it 👍 I am working on enhancements to the API and will update you soon

from blr-website.

punkish avatar punkish commented on July 28, 2024

Hi @punkish please check below our commments regarding the treatment endpoint - Teodor

We expect sortBy request options to work for sorting all records. We can not sort from client because the set of results is just chunk from the whole set.

sortBy: [oneOf] ASC|DESC

  • treatmentAuthors
  • journalYear
  • materialsCitations
  • figureCitations
  • treatmentCitations?

Requirments could be see here

Wanted to let you know that I've got sorting working now although I haven't yet pushed the changes to the public API. Am still testing it. Hope to push it up by this weekend. In advance though, please see the following notes:

The following columns are not part of the 'treatments' table. They are related records for every treatment
- treatmentAuthors
- materialsCitations
- figureCitations

I also don't have 'treatmentCitations' in my table. That leaves only 'journalYear' from what you asked for.

Remember, I can only sort by the columns in my table. The columns are
- treatmentTitle
- articleDoi
- zenodoDep
- zoobank
- articleTitle
- publicationDate
- journalTitle
- journalYear
- journalVolume
- journalIssue
- pages
- authorityName
- authorityYear
- kingdom
- phylum
- order
- family
- genus
- species
- status
- taxonomicNameLabel
- rank

Thought sorting by many of the above may not make sense. Note that the default sort is by 'treatmentId' with sort order ASC. The syntax is

?sortBy=<column:DIR>

// for example

?sortBy=journalYear:ASC

More before this week ends.

from blr-website.

punkish avatar punkish commented on July 28, 2024

hello @howkins and @teodorgeorgiev

apologies for the delay in delivering this API, but I've been busy with testing it and trying to make it fast enough to be usable. I am pushing a working version now but I want you to be aware of a breaking change that is easily fixed.

Now the treatments end-point (and eventually all the end-points) will not return facets and stats automatically. Instead, you will have to explicitly ask for them like so

/treatments?q=maratus&journalYear=2005&facets=true&stats=true

In other words, you have to append facets=true and/or stats=true for the API to return the respective data. This is because the same end-point can be used for other purposes where facets or stats may not be needed. And, actually facets are really burdensome both for querying and for sending back. On a default query (with no query params), the facets, as you guys want them, add almost 4.5MB to the data. This is really inefficient. But since I am not going to have a specific API for just facets (API end-points are only for nouns, resources such as images, treatments, publications, etc.)

So, with this caveat, I am pushing the changes now. Your BLR application will break because you are not asking for facets explicitly. Just add facets=true to the query string and all will be fine.

Many thanks for your patience as I have worked on this.

from blr-website.

punkish avatar punkish commented on July 28, 2024

hello @howkins and @teodorgeorgiev,

I have pushed updates to the queries so that they are much faster. I have tried a few queries and there is no timeout happening anymore. But, please do check, and if you face a problem, please open an issue immediately. That is the only way I can solve things. I am hoping the query responses will be in the sub-second range, even with all the facets and stats, but that is an ambitious goal. Hopefully we can get there.

Many thanks

from blr-website.

howkins avatar howkins commented on July 28, 2024

Hello @punkish I have tried to make most different queries and I found that is much better as productivity but some times I receive status code 504 Gateway Time-out for queries which i was executed with success before.

from blr-website.

punkish avatar punkish commented on July 28, 2024

from blr-website.

howkins avatar howkins commented on July 28, 2024

Ok immediately when I receive this response again i will send you report with example.

I noticed another issue. When open this Query1 for example I see journalVolume like a option for search in facets but when i try to access it (Query2) i receive Object with statusCode: 400
Query1: https://zenodeo.punkish.org/v2/treatments?facets=true&stats=true&q=temnothorax

Query2: https://zenodeo.punkish.org/v2/treatments?facets=true&stats=true&journalVolume=11&q=temnothorax

from blr-website.

punkish avatar punkish commented on July 28, 2024

from blr-website.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.