Light

allenai / s2-folks Goto Github PK

View Code? Open in Web Editor NEW

130.0 16.0 22.0 169 KB

Public space for the user community of Semantic Scholar APIs to share scripts, report issues, and make suggestions.

License: Other

s2-folks's Introduction

Overview

Public space for the user community of Semantic Scholar APIs to share scripts, report issues, and make suggestions.

Disclaimers

All the information you contribute to this repository, including github issues and code samples are public and open. Please do not include sensitive, confidential or personal information, unless you want to be credited for the information you provide.

s2-folks's People

Contributors

Stargazers

Watchers

s2-folks's Issues

How to get an authentication key to use the Semantic Scholar APIs?

I am a student in Computer Science and I would like to develop a speech assistant, that helps students in online Papers-Research. For this reason I would like to ask You if it is possible to provide me with an API key. Thanks in advance!

Lookup authors by orcid ids

Feedback from conference chair who used the peer review APIs:

a main thing that we struggled with is mapping names to semantic scholar ids. ACM now requires orcid ids. Would be nice for semantic scholar to work with orcid ids. also, i tried to look up semantic scholar ids from names using the semantic scholar api. unfortunately, the meta data wasn’t good enough to narrow down the list returned. for EC, I would have liked to know if the author was in Economics or Computer Science, but this data didn’t seem to be returned directly. I could have asked for their papers and then made a guess based on primary field of papers.

Does the API provide paper full text?

Hi! I had a quick question about the Semantic Scholar API: is there a part of the API that I can connect to where it contains all the text in the research papers? I noticed there are datasets for headers, dates, authors, references, but I'm not sure if I missed the one that included the text in its entirety in research papers. Thanks!
Feedback was from https://www.semanticscholar.org/product/api

API health page

The API health page will (eventually) provide our users with timely information about operational metrics (e.g., error rate and response time) as well as data coverage metrics (e.g., # of abstracts, # of non-empty tldr fields) in the data which backs the APIs..

When using web search and API search, I noticed that I get different search results.

When I search for the paper 'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding', I get different results on these two pages: https://www.semanticscholar.org/search?q=BERT%3A%20Pre-training%20of%20Deep%20Bidirectional%20Transformers%20for%20Language%20Understanding&sort=relevance https://api.semanticscholar.org/graph/v1/paper/search?query=BERT:%20Pre-training%20of%20Deep%20Bidirectional%20Transformers%20for%20Language%20Understanding Do you know if there's a reason for this discrepancy? Can I access the API used for web search? Thank you!

From freshdesk.

10k limit for citation details of a paper

Hi everyone, I have been built a social network analysis tool on articles and authors for the graduation project. We have an option for users that they can retrieve the citations of a paper and create a graph with them. However S2AG can not give all the citations of a paper if citationCount is more than 10k when we use 'Details about a paper's citations' endpoint. Limit is 10k for this endpoint. Is there any workaround for this issue?

How can I download the paper dataset according to the field-of-reserach?

Context 1:

Context 2:

Two questions:

The bulk dataset api returns 30 links for S2ORC. I wonder if there is a link for the metadata file.
How can I download the paper dataset according to the field-of-reserach？

Copied from a slack conversation, below is a screenshot from slack for more context:

How to verify my API key works as intended?

I’m getting an error code 403 back from the API when using my api key with the following headers:
"headers": {
"Content-type": "application/json",
"x-api-key": my_precious_key
},
it looks like I am sending it correctly, but somehow I’m getting a Forbidden 403 back… Am I missing something?

From slack.

Can the Semantic Scholar API be used to access data from other APIs/databases?

We are looking for an AI platform that can access and export data from one of the following scientific Databases/APIs: Google Scholar: Google Scholar API - SerpApi ORCID:: Public API Crossref: REST API - Crossref CORE: CORE ? Aggregating the world?s open access research papers Dimensions: https://www.dimensions.ai/ Scitrus: Would this be possible with the Semantic scholar platform/API?

API returning empty responses.

Since 17th of March, the API seems to be returning empty responses in 10% of our users' searches. This is new behaviour (we didn't account for that in our code and no errors were raised until then). Did something significant change or is this a bug? The response we get is:

{
offset: 0, 
total: 0
}

Performing exact paper title search

Can Semantic Scholar API perform an exact paper title search? I discovered that while I can obtain relevant results by conducting an exact title search on the Semantic Scholar website (https://www.semanticscholar.org/), the same is not possible through the API search (https://www.semanticscholar.org/product/api).

How to download dataset

I am sorry that I am confronted with some difficulties. I mean I am at a loss of what to do after reading the readme files of how to download the dataset.

A bunch of PubMed IDs (PMIDs) are missing. Why?

Ken Church:

A colleague asked me to do something with about 1k PMIDs. I found most of them in semantic scholar, but not these:

PMID:34995702
PMID:34457137
PMID:34831513
PMID:32662296
PMID:34932685
PMID:35015933
PMID:25666784
PMID:33226074

Confusing authors and program committee

https://www.semanticscholar.org/search?q=Map-Reduce%20for%20Machine%20Learning%20on%20Multicore&sort=relevance

This paper appeared in NIPS (now called NeurIPS). It should not be hard to find

https://www.researchgate.net/profile/Gary-Bradski-4/publication/221617998_Map-Reduce_for_Machine_Learning_on_Multicore/links/00b4951c369007c315000000/Map-Reduce-for-Machine-Learning-on-Multicore.pdf

The authors should not be confused with the editors (program committee).

There are links on semantic scholar to slides (as opposed to paper): https://pdfs.semanticscholar.org/38af/f6df1accc456f6cda7d16d4b9ecf418ef21e.pdf

It would be good to distinguish authors from editors, and slides from papers.

How to download the datasets?

I am interested in looking at the datasets you have available, but when I go to this link https://api.semanticscholar.org/api-docs/datasets#tag/Release-Data there is not really any helpful information on how to actually obtain the dataset. I was wondering what steps were necessary now that I have an api key to download the datasets and potentially use them.

The default rate limit is too low

I receive only 429 on my requests. I didn't made 100 requests in 5 minutes like the limiting says so can you please tell me what is the issue?

The API is hard to use. HELP!

Dear Semanticscholar Team, I hope this email finds you well. I am writing to express my admiration for the work that you do at Semanticscholar. Your platform has been a valuable resource for me in my academic research, and I appreciate the comprehensive and up-to-date database of scholarly articles that you provide. As I delve deeper into my research, I am interested in exploring some of the data and features that you offer through your Dataset API. However, I have encountered some issues accessing the API, and I was hoping you could offer some guidance on how to proceed. I have tried several times to access the API, but I have not been successful in doing so. I have checked the documentation and followed the instructions carefully, but I keep receiving blank messages when attempting to access the data. I would greatly appreciate it if you could provide some assistance on this matter, as I am eager to use your API to further my research. Once again, thank you for your excellent work and for providing such a valuable resource for the academic community. I look forward to hearing back from you soon.

Is the list of recommended papers sorted (closest first)?

Question from S2 intern [email protected] via email.

How are the S2 identifiers defined?

There are two types of S2 identifiers as described in the API:

<sha> - a Semantic Scholar ID, e.g. 649def34f8be52c8b66281af98ae884c09aef38b
CorpusId:<id> - Semantic Scholar numerical ID, e.g. 215416146

Can we always assume the sha is a string of 40 lowercase hex characters?
Can we always assume the CorpusId is a 32 bit integer?
Is there a way to convert from one to the other?

How to check the status of my API key request?

Hi there, I requested an API key last week (to enable downloading files) and was wondering how I can follow up on this request (in case additional information is required). When should I expect to get an answer normally?

Some papers have abstracts on semanticscholar.org but not in the API

How SPECTER embeddings are used in recommendations API?

When it comes to the recommendations being returned by the API, I was wondering if you were using the provided [email protected] embedding on your end to get a measure of similarity and then return them (maybe you are using a different version of the embeddings). If so, I was wondering what similarity algorithm/formula you were using as I’m noticing that if I run cosine similarity (as well as euclidean distance and simple dot-product) the order of the returned recommendations is random-looking. I am assuming you must be using some other formula though or that maybe the embeddings used are slightly different?

From a private message on slack.

TLDR - getting a 500 error on API for some records

if I try the API on https://api.semanticscholar.org/graph/v1/paper/10.1128/mSystems.00606-19/?fields=tldr,year,citationCount then it will throw a 500 error until I remove the tldr field. I only get this on very few papers in a list of about 1500 - any idea why?

API Request Timed out & Internal Server Error

I'm getting the two messages:
{'message': 'Internal Server Error'}
{'message': 'Endpoint request timed out'},
how to fix this error?

Affiliation requirements

Doesn't The API support the affiliation?
if not, any idea what I should do if I want to fetch data for a specific affiliation!

How to download abstract using DOI (Digital Object Identifier)?

How to download abstract using doi?

Where to find PDF URL that appears in the web UI but not in the API?

The publication with paperId=65b190b353c15a6670fc98614c2a1542286bbb2e (linked here) has a link in the web UI to download a PDF. However, when I lookup the paper by id using the Graph API, it is listed as NOT open access with no openAccessPdf link. What field is the PDF link in the front-end drawn from?

Been more than five (5) business days since I submitted the partner request form

Several users have reported the that they submitted the partner form more than 5 days ago on https://www.semanticscholar.org/product/api#Partner-Form but didn't receive the API key.

Is it in your spam folder?
Often times, this happens because the email which contains the API key is sent to your spam folder. Please check to verify that this is not the case.

Did you submit your request before Mar 31, 2023?
If this is the case, please resubmit the form https://www.semanticscholar.org/product/api#Partner-Form . Why? We've modified our process for issuing API keys around this time. We did our best to process requests which were submitted before the new process was in effect, but processing older requests is a manual and error-prone process and we may have accidentally missed your request.

The API key is not in your spam folder AND your request was submitted after Mar 31, 2023?
Please comment on this issue and provide the email and affiliation you used to submit the form and we'll get back to you as soon as possible.

What are 504 errors in the S2AG API and how to address them?

From Andrew White on slack

Batch Paper API Endpoint Limited to 50 IDs

The API Documentation states that the batch paper endpoint is capable of handling up to 1000 IDs; however, when more than 50 IDs are POSTed, this endpoint errs with an unhelpful 500 error message.

S2 paperId for S2ORC papers

Hello,
Is there a way to have the S2 id (paperId field) for the papers in S2ORC? The corpus_id seems to be the id specific to S2ORC and is not the same as the S2 id associated to all papers that can be found using keyword search in the API.

Thanks!

Feature request: Search authors by external ID and affiliation

The author search API is very limited, by name only. It would be good to be able to look up authors by DBLP external ID, and by affiliation. It makes less sense that I have to search by name, then check the ID fields on the resutsl.

Where can I find the api_endpoint URL?

Dear All,
I hope my email finds you well.
I'm asking for the api_endpoint URL ? where can I find it?

From https://semanticscholar.freshdesk.com/a/tickets/55774

I want to continue using the APIs but my API key is about to expire

My API key expires tomorrow at noon. Is it possible to ask for an extension? or a new key?

From slack.

How does the S2 Recommendation API work under the hood?

Hey guys I am building a basic paper recommendation system for my bachelor's thesis and I was wondering if there is there any public information on how the S2 recommendation API works under the hood or if that can be shared? Like any details on the architecture? Is it another neural network or KNN? Any information will be of great help! Please let me know!

I am also curious if any of the code for the recommendations is available from the original SPECTER paper? Specifically section 3.4 here: https://arxiv.org/pdf/2004.07180.pdf.

What's the most up to date paper which describes the Semantic Scholar APIs?

Q: What's the most up to date paper which describes the Semantic Scholar APIs?

I want to get more than 10K papers using the search endpoint

I just wanna know how to resolve the limitation of the ?offset? and ?limit? in API. For example, I cannot get the whole papers about customer service in 2022 by using API, because "The sum of offset and limit must be < 10000" while the number of papers that I want EXCEEDS 10000. Please help me solve the problem, thank you so much.

Publication date vs. arxiv submission date

I'm looking into the publication dates of papers and I see that for papers that were submitted to ArXiv, the publication date is the same as the ArXiv submission date for 90% of Semantic Scholar papers. Is this because:
i) these papers were submitted to ArXiv before they were published in the journal, or
ii) the publication dates for these papers were not known so the ArXiv dates were used as publication dates?
I see that the official documentation says that the publication date is either the print date or the journal publication date as provided by the source, but details are not listed. It would be great if someone could provide some insight into how these dates are collected.
More generally, our research would greatly profit from having the actual journal publication dates for papers that have ArXiv submission dates as their publication dates. Is there a way to get them? We (me and
@Przemyslaw Grabowicz ) study at UMass Amherst the effects of early promotion and revisions of scientific publications on the number of citations.

From slack

Internal server error when using /paper/batch with specific paper IDs

When using the 'get multiple papers' endpoint of the S2AG API, any reuquest that includes the ids ea74a98158624ebc2fc03f7aba2b9056737dc9ae or 317cebe9aef0cc1f0ee36707c06ba38e978ac6bd results in an Internal Server Error. Accessing either paper via the single paper endpoint works with no problems. Some other ids are also affected, though most work perfectly well.
Feedback was from https://www.semanticscholar.org/paper/Developing-machine-learning-methods-for-automatic-Wang-Zhang/ea74a98158624ebc2fc03f7aba2b9056737dc9ae

Do I need a different API key to download the datasets?

Does the S2 API Key also work for downloading full-corpus datasets or would I need a different API key?

Order results from bulk paper fetch

Reported by @kyleclo :

Describe the bug
Using the bulk API query to get papers. Given an input list of paper IDs [a, b, c] the papers can come back out-of-order [b, c, a] or even be silently missing entries [b, c].

To Reproduce
Steps to reproduce the behavior:

In Python:

# out of order
import requests

MAX_REQUEST_SIZE = 500
def chunks(lst, chunk_size=MAX_REQUEST_SIZE):
    """Splits a longer list to respect batch size"""
    for i in range(0, len(lst), chunk_size):
        yield lst[i : i + chunk_size]
        
QUERY = f'https://api.semanticscholar.org/graph/v1/paper/batch'
chunk = ['b10ab3b45876dcd75e96feecdd5dee5c9633bc1a', 'a14897448e472a430439e2e24abba637f9db7a27', 'fbbb80bfeb35569ee8473b80aa98c716c4a6b034']

response = requests.post(QUERY, json={'ids': chunk})

print([p['paperId'] for p in response.json()])
> ['b10ab3b45876dcd75e96feecdd5dee5c9633bc1a', 'fbbb80bfeb35569ee8473b80aa98c716c4a6b034', 'a14897448e472a430439e2e24abba637f9db7a27']

chunk = ['1e42f34a97365192e2ed144bbc1cffe90250fa56', 'dafecfed46fa849be587bd3cfcc44dce03b6f96f', 'd4f216955046faeaa0737aa4c6de760e4594d6f4']
response = requests.post(QUERY, json={'ids': chunk})

print([p['paperId'] for p in response.json()])
> ['dafecfed46fa849be587bd3cfcc44dce03b6f96f', '1e42f34a97365192e2ed144bbc1cffe90250fa56']

Expected behavior
Expected behavior is the API returns the entries in the same order as the inputs in the POST. The bulk Authors query in API preserves input order.

Similarly, if there is lack of a match in the API, the API should still return something for that entry as opposed to a shorter length list. For example, the bulk Authors query in API still returns None for unmatched author IDs

QUERY = f'https://api.semanticscholar.org/graph/v1/author/batch'
response = requests.post(QUERY, json={'ids': ['46258841', '999999999', '3328733']})
> response.json()
> [{'authorId': '46258841', 'name': 'Kyle Lo'}, None, {'authorId': '3328733', 'name': 'Luca Soldaini'}]

Json documentation of the return json files of full text dataset

Where can I find the json documentation of the return json files when downloading the full text dataset?

Can I bulk-download the Semantic Scholar database?

Is there any possibility for bulk-download of the Semantic Scholar database?

What is the version of SPECTER embeddings we serve in the API?

jaron: v0.1.1 -- you can see this when you return the embedding field for a given paper, it will return both model number and an array of vectors

Get full text for a specific paper using the /graph endpoint.

Here I have integrate the paper search feature and need to show the paper information in the detail page. But not finding any resource to show full content. Please advice. Thanks

Fields should be part of the request payload in paper/batch, but documentation says otherwise

Originally reported by @codeviking in https://github.com/allenai/scholar/issues/35457

The docs suggests that the fields parameter should be submitted as a query string argument. This doesn't work. Or rather, the payload in this scenario doesn't include the desired fields.

In practice the fields parameter should also be a part of the request payload. This works as expected.

In other words, this didn't work:

curl -d '{ "ids": ["...", "..."] }' https://api.semanticscholar.org/graph/v1/paper/batch?fields=...,...
But this did:

curl -d '{ "ids": ["...", "..."], "fields": ["...", "..." }' https://api.semanticscholar.org/graph/v1/paper/batch
This makes sense, intuitively, which is why I was able to figure it out.

API calls involving high counts for nested authors/papers time out

To Reproduce
Examples (taken from the PR above):

author/search endpoint
- localhost8080
- prod
author/ endpoint
- localhost8080
- prod

Expected behavior
Instead of timing out, give an error msg? Or extend time out limits?

S2AG API: Search endpoint is down

https://api.semanticscholar.org/graph/v1/paper/search?query=literature+graph

this doesn't seem to work right now. I am getting a - "Endpoint request timed out". Did I reach limit on my API usage?

API requests from cloud server gives Internal Server Error, but requests from local machine works fine.

Hi, I am aware of #16. Problem for me is when I send requests from my cloud server (Azure), it gives Internal Server Error almost every time, but when I send requests from my local machine it never fails. I wonder if this is because of some blocking mechanism that you are using to fight the DDoS. If you have any idea about this case please contact me, I would like to solve this ASAP for our product testing.

Search operators or support for exact text matching?

Hi, thank you for building a great and accessible product! I had a quick suggestion/question about search on the platform. I am attempting to use Semantic Scholar to search for papers in really niche topics, but I'm running into some problems with the search interface exposed by the API. I read about your ranking system detailed in a blog post linked on the API product page. I think the machine learning re-ranker weights paper popularity too highly in cases when I am trying to search for few results that have an exact string match (as is the case in niche areas of expertise). Instead, Semantic Scholar seems to return some popular papers in adjacent fields in the top results. I'd like to reference the NIH RePORT search interface as one that seems to work well with exact text matching combined with embedding search. I get fewer results but the results are much more applicable. Is there a plan to start supporting search operators or at least an option for exact text matching search?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.