Giter Site home page Giter Site logo

s2-folks's Introduction

Overview

Public space for the user community of Semantic Scholar APIs to share scripts, report issues, and make suggestions.

Disclaimers

All the information you contribute to this repository, including github issues and code samples are public and open. Please do not include sensitive, confidential or personal information, unless you want to be credited for the information you provide.

s2-folks's People

Contributors

amberrose2 avatar androbin avatar ashleyleeaiheng avatar biogeek avatar cfiorelli avatar dirkraft avatar joe32140 avatar milescrawford avatar rodneykinney avatar seanxuu avatar yoganandc avatar yvonne-chou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

s2-folks's Issues

Lookup authors by orcid ids

Feedback from conference chair who used the peer review APIs:

a main thing that we struggled with is mapping names to semantic scholar ids. ACM now requires orcid ids. Would be nice for semantic scholar to work with orcid ids. also, i tried to look up semantic scholar ids from names using the semantic scholar api. unfortunately, the meta data wasn’t good enough to narrow down the list returned. for EC, I would have liked to know if the author was in Economics or Computer Science, but this data didn’t seem to be returned directly. I could have asked for their papers and then made a guess based on primary field of papers.

Does the API provide paper full text?

Hi! I had a quick question about the Semantic Scholar API: is there a part of the API that I can connect to where it contains all the text in the research papers? I noticed there are datasets for headers, dates, authors, references, but I'm not sure if I missed the one that included the text in its entirety in research papers. Thanks!
Feedback was from https://www.semanticscholar.org/product/api

From freshdesk

API health page

The API health page will (eventually) provide our users with timely information about operational metrics (e.g., error rate and response time) as well as data coverage metrics (e.g., # of abstracts, # of non-empty tldr fields) in the data which backs the APIs..

When using web search and API search, I noticed that I get different search results.

When I search for the paper 'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding', I get different results on these two pages: https://www.semanticscholar.org/search?q=BERT%3A%20Pre-training%20of%20Deep%20Bidirectional%20Transformers%20for%20Language%20Understanding&sort=relevance https://api.semanticscholar.org/graph/v1/paper/search?query=BERT:%20Pre-training%20of%20Deep%20Bidirectional%20Transformers%20for%20Language%20Understanding Do you know if there's a reason for this discrepancy? Can I access the API used for web search? Thank you!

From freshdesk.

10k limit for citation details of a paper

Hi everyone, I have been built a social network analysis tool on articles and authors for the graduation project. We have an option for users that they can retrieve the citations of a paper and create a graph with them. However S2AG can not give all the citations of a paper if citationCount is more than 10k when we use 'Details about a paper's citations' endpoint. Limit is 10k for this endpoint. Is there any workaround for this issue?

How to verify my API key works as intended?

I’m getting an error code 403 back from the API when using my api key with the following headers:
"headers": {
"Content-type": "application/json",
"x-api-key": my_precious_key
},
it looks like I am sending it correctly, but somehow I’m getting a Forbidden 403 back… Am I missing something?

From slack.

API returning empty responses.

Since 17th of March, the API seems to be returning empty responses in 10% of our users' searches. This is new behaviour (we didn't account for that in our code and no errors were raised until then). Did something significant change or is this a bug? The response we get is:

{
offset: 0, 
total: 0
}

How to download dataset

I am sorry that I am confronted with some difficulties. I mean I am at a loss of what to do after reading the readme files of how to download the dataset.

A bunch of PubMed IDs (PMIDs) are missing. Why?

Ken Church:

A colleague asked me to do something with about 1k PMIDs. I found most of them in semantic scholar, but not these:

PMID:34995702
PMID:34457137
PMID:34831513
PMID:32662296
PMID:34932685
PMID:35015933
PMID:25666784
PMID:33226074

Confusing authors and program committee

https://www.semanticscholar.org/search?q=Map-Reduce%20for%20Machine%20Learning%20on%20Multicore&sort=relevance

This paper appeared in NIPS (now called NeurIPS). It should not be hard to find

https://www.researchgate.net/profile/Gary-Bradski-4/publication/221617998_Map-Reduce_for_Machine_Learning_on_Multicore/links/00b4951c369007c315000000/Map-Reduce-for-Machine-Learning-on-Multicore.pdf

The authors should not be confused with the editors (program committee).

There are links on semantic scholar to slides (as opposed to paper): https://pdfs.semanticscholar.org/38af/f6df1accc456f6cda7d16d4b9ecf418ef21e.pdf

It would be good to distinguish authors from editors, and slides from papers.

The API is hard to use. HELP!

Dear Semanticscholar Team, I hope this email finds you well. I am writing to express my admiration for the work that you do at Semanticscholar. Your platform has been a valuable resource for me in my academic research, and I appreciate the comprehensive and up-to-date database of scholarly articles that you provide. As I delve deeper into my research, I am interested in exploring some of the data and features that you offer through your Dataset API. However, I have encountered some issues accessing the API, and I was hoping you could offer some guidance on how to proceed. I have tried several times to access the API, but I have not been successful in doing so. I have checked the documentation and followed the instructions carefully, but I keep receiving blank messages when attempting to access the data. I would greatly appreciate it if you could provide some assistance on this matter, as I am eager to use your API to further my research. Once again, thank you for your excellent work and for providing such a valuable resource for the academic community. I look forward to hearing back from you soon.

From freshdesk

How are the S2 identifiers defined?

There are two types of S2 identifiers as described in the API:

<sha> - a Semantic Scholar ID, e.g. 649def34f8be52c8b66281af98ae884c09aef38b
CorpusId:<id> - Semantic Scholar numerical ID, e.g. 215416146

Can we always assume the sha is a string of 40 lowercase hex characters?
Can we always assume the CorpusId is a 32 bit integer?
Is there a way to convert from one to the other?

How to check the status of my API key request?

Hi there, I requested an API key last week (to enable downloading files) and was wondering how I can follow up on this request (in case additional information is required). When should I expect to get an answer normally?

From freshdesk

How SPECTER embeddings are used in recommendations API?

When it comes to the recommendations being returned by the API, I was wondering if you were using the provided [email protected] embedding on your end to get a measure of similarity and then return them (maybe you are using a different version of the embeddings). If so, I was wondering what similarity algorithm/formula you were using as I’m noticing that if I run cosine similarity (as well as euclidean distance and simple dot-product) the order of the returned recommendations is random-looking. I am assuming you must be using some other formula though or that maybe the embeddings used are slightly different?

From a private message on slack.

Affiliation requirements

Doesn't The API support the affiliation?
if not, any idea what I should do if I want to fetch data for a specific affiliation!

Where to find PDF URL that appears in the web UI but not in the API?

The publication with paperId=65b190b353c15a6670fc98614c2a1542286bbb2e (linked here) has a link in the web UI to download a PDF. However, when I lookup the paper by id using the Graph API, it is listed as NOT open access with no openAccessPdf link. What field is the PDF link in the front-end drawn from?

Been more than five (5) business days since I submitted the partner request form

Several users have reported the that they submitted the partner form more than 5 days ago on https://www.semanticscholar.org/product/api#Partner-Form but didn't receive the API key.

Is it in your spam folder?
Often times, this happens because the email which contains the API key is sent to your spam folder. Please check to verify that this is not the case.

Did you submit your request before Mar 31, 2023?
If this is the case, please resubmit the form https://www.semanticscholar.org/product/api#Partner-Form . Why? We've modified our process for issuing API keys around this time. We did our best to process requests which were submitted before the new process was in effect, but processing older requests is a manual and error-prone process and we may have accidentally missed your request.

The API key is not in your spam folder AND your request was submitted after Mar 31, 2023?
Please comment on this issue and provide the email and affiliation you used to submit the form and we'll get back to you as soon as possible.

S2 paperId for S2ORC papers

Hello,
Is there a way to have the S2 id (paperId field) for the papers in S2ORC? The corpus_id seems to be the id specific to S2ORC and is not the same as the S2 id associated to all papers that can be found using keyword search in the API.

Thanks!

How does the S2 Recommendation API work under the hood?

Hey guys I am building a basic paper recommendation system for my bachelor's thesis and I was wondering if there is there any public information on how the S2 recommendation API works under the hood or if that can be shared? Like any details on the architecture? Is it another neural network or KNN? Any information will be of great help! Please let me know!

I am also curious if any of the code for the recommendations is available from the original SPECTER paper? Specifically section 3.4 here: https://arxiv.org/pdf/2004.07180.pdf.

I want to get more than 10K papers using the search endpoint

From freshdesk

I just wanna know how to resolve the limitation of the ?offset? and ?limit? in API. For example, I cannot get the whole papers about customer service in 2022 by using API, because "The sum of offset and limit must be < 10000" while the number of papers that I want EXCEEDS 10000. Please help me solve the problem, thank you so much.

Publication date vs. arxiv submission date

I'm looking into the publication dates of papers and I see that for papers that were submitted to ArXiv, the publication date is the same as the ArXiv submission date for 90% of Semantic Scholar papers. Is this because:
i) these papers were submitted to ArXiv before they were published in the journal, or
ii) the publication dates for these papers were not known so the ArXiv dates were used as publication dates?
I see that the official documentation says that the publication date is either the print date or the journal publication date as provided by the source, but details are not listed. It would be great if someone could provide some insight into how these dates are collected.
More generally, our research would greatly profit from having the actual journal publication dates for papers that have ArXiv submission dates as their publication dates. Is there a way to get them? We (me and
@Przemyslaw Grabowicz ) study at UMass Amherst the effects of early promotion and revisions of scientific publications on the number of citations.

From slack

Internal server error when using /paper/batch with specific paper IDs

When using the 'get multiple papers' endpoint of the S2AG API, any reuquest that includes the ids ea74a98158624ebc2fc03f7aba2b9056737dc9ae or 317cebe9aef0cc1f0ee36707c06ba38e978ac6bd results in an Internal Server Error. Accessing either paper via the single paper endpoint works with no problems. Some other ids are also affected, though most work perfectly well.
Feedback was from https://www.semanticscholar.org/paper/Developing-machine-learning-methods-for-automatic-Wang-Zhang/ea74a98158624ebc2fc03f7aba2b9056737dc9ae

From freshdesk

Order results from bulk paper fetch

Reported by @kyleclo :

Describe the bug
Using the bulk API query to get papers. Given an input list of paper IDs [a, b, c] the papers can come back out-of-order [b, c, a] or even be silently missing entries [b, c].

To Reproduce
Steps to reproduce the behavior:

In Python:

# out of order
import requests

MAX_REQUEST_SIZE = 500
def chunks(lst, chunk_size=MAX_REQUEST_SIZE):
    """Splits a longer list to respect batch size"""
    for i in range(0, len(lst), chunk_size):
        yield lst[i : i + chunk_size]
        
QUERY = f'https://api.semanticscholar.org/graph/v1/paper/batch'
chunk = ['b10ab3b45876dcd75e96feecdd5dee5c9633bc1a', 'a14897448e472a430439e2e24abba637f9db7a27', 'fbbb80bfeb35569ee8473b80aa98c716c4a6b034']

response = requests.post(QUERY, json={'ids': chunk})

print([p['paperId'] for p in response.json()])
> ['b10ab3b45876dcd75e96feecdd5dee5c9633bc1a', 'fbbb80bfeb35569ee8473b80aa98c716c4a6b034', 'a14897448e472a430439e2e24abba637f9db7a27']

chunk = ['1e42f34a97365192e2ed144bbc1cffe90250fa56', 'dafecfed46fa849be587bd3cfcc44dce03b6f96f', 'd4f216955046faeaa0737aa4c6de760e4594d6f4']
response = requests.post(QUERY, json={'ids': chunk})

print([p['paperId'] for p in response.json()])
> ['dafecfed46fa849be587bd3cfcc44dce03b6f96f', '1e42f34a97365192e2ed144bbc1cffe90250fa56']

Expected behavior
Expected behavior is the API returns the entries in the same order as the inputs in the POST. The bulk Authors query in API preserves input order.

Similarly, if there is lack of a match in the API, the API should still return something for that entry as opposed to a shorter length list. For example, the bulk Authors query in API still returns None for unmatched author IDs

QUERY = f'https://api.semanticscholar.org/graph/v1/author/batch'
response = requests.post(QUERY, json={'ids': ['46258841', '999999999', '3328733']})
> response.json()
> [{'authorId': '46258841', 'name': 'Kyle Lo'}, None, {'authorId': '3328733', 'name': 'Luca Soldaini'}]

Fields should be part of the request payload in paper/batch, but documentation says otherwise

Originally reported by @codeviking in https://github.com/allenai/scholar/issues/35457

The docs suggests that the fields parameter should be submitted as a query string argument. This doesn't work. Or rather, the payload in this scenario doesn't include the desired fields.

In practice the fields parameter should also be a part of the request payload. This works as expected.

In other words, this didn't work:

curl -d '{ "ids": ["...", "..."] }' https://api.semanticscholar.org/graph/v1/paper/batch?fields=...,...
But this did:

curl -d '{ "ids": ["...", "..."], "fields": ["...", "..." }' https://api.semanticscholar.org/graph/v1/paper/batch
This makes sense, intuitively, which is why I was able to figure it out.

API requests from cloud server gives Internal Server Error, but requests from local machine works fine.

Hi, I am aware of #16. Problem for me is when I send requests from my cloud server (Azure), it gives Internal Server Error almost every time, but when I send requests from my local machine it never fails. I wonder if this is because of some blocking mechanism that you are using to fight the DDoS. If you have any idea about this case please contact me, I would like to solve this ASAP for our product testing.

Search operators or support for exact text matching?

Hi, thank you for building a great and accessible product! I had a quick suggestion/question about search on the platform. I am attempting to use Semantic Scholar to search for papers in really niche topics, but I'm running into some problems with the search interface exposed by the API. I read about your ranking system detailed in a blog post linked on the API product page. I think the machine learning re-ranker weights paper popularity too highly in cases when I am trying to search for few results that have an exact string match (as is the case in niche areas of expertise). Instead, Semantic Scholar seems to return some popular papers in adjacent fields in the top results. I'd like to reference the NIH RePORT search interface as one that seems to work well with exact text matching combined with embedding search. I get fewer results but the results are much more applicable. Is there a plan to start supporting search operators or at least an option for exact text matching search?

From freshdesk

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.