danielnsilva / semanticscholar Goto Github PK

Unofficial Python client library for Semantic Scholar APIs.

License: MIT License

Python 100.00%

python-library wrapper-api academic-publications semantic-scholar

semanticscholar's Introduction

semanticscholar

Unofficial Python client library for Semantic Scholar APIs, currently supporting the Academic Graph API and Recommendations API.

How to install
Usage
Semantic Scholar API official docs and additional resources

How to install

pip install semanticscholar

Usage

Programmatically retrieve paper and author data by ID or query string. Can be used to access both the public API and the S2 Data Partner's API using a private key.

Paper Lookup

To access paper data:

from semanticscholar import SemanticScholar
sch = SemanticScholar()
paper = sch.get_paper('10.1093/mind/lix.236.433')
paper.title

Output:

'Computing Machinery and Intelligence'

Author Lookup

To access author data:

from semanticscholar import SemanticScholar
sch = SemanticScholar()
author = sch.get_author(2262347)
author.name

Output:

'Alan M. Turing'

Retrieve multiple items at once

You can fetch up to 1000 distinct papers or authors in one API call. To do that, provide a list of IDs (array of strings).

Get details for multiple papers:

from semanticscholar import SemanticScholar
sch = SemanticScholar()
list_of_paper_ids = [
     'CorpusId:470667',
     '10.2139/ssrn.2250500',
     '0f40b1f08821e22e859c6050916cec3667778613'
]
results = sch.get_papers(list_of_paper_ids)
for item in results:
     print(item.title)

Output:

Improving Third-Party Audits and Regulatory Compliance in India
How Much Should We Trust Differences-in-Differences Estimates?
The Miracle of Microfinance? Evidence from a Randomized Evaluation

Get details for multiple authors:

from semanticscholar import SemanticScholar
sch = SemanticScholar()
list_of_author_ids = ['3234559', '1726629', '1711844']
results = sch.get_authors(list_of_author_ids)
for item in results:
     print(item.name)

Output:

E. Dijkstra
D. Parnas
I. Sommerville

Search for papers and authors

To search for papers by keyword:

from semanticscholar import SemanticScholar
sch = SemanticScholar()
results = sch.search_paper('Computing Machinery and Intelligence')
print(f'{results.total} results.', f'First occurrence: {results[0].title}.')

Output:

492 results. First occurrence: Computing Machinery and Intelligence.

Warning

From the official documentation: "Because of the subtleties of finding partial phrase matches in different parts of the document, be cautious about interpreting the total field as a count of documents containing any particular word in the query."

To search for authors by name:

from semanticscholar import SemanticScholar
sch = SemanticScholar()
results = sch.search_author('Alan M. Turing')
print(f'{results.total} results.', f'First occurrence: {results[0].name}.')

Output:

4 results. First occurrence: A. Turing.

Traversing search results

Each call to search_paper() and search_author() will paginate through results, returning the list of papers or authors up to the bound limit (default value is 100). You can retrieve the next batch of results by calling next_page() or simply iterating over all of them:

from semanticscholar import SemanticScholar
sch = SemanticScholar()
results = sch.search_paper('Computing Machinery and Intelligence')
for item in results:
     print(item.title)

Output:

Computing Machinery and Intelligence
Computing Machinery and Intelligence (1950)
Artificial intelligence in the research of consciousness and in social life (in honor of 70-years anniversary of A. Turing’s paper “Computing Machinery and Intelligence” (papers of the “round table”)
Studies on computing machinery and intelligence
On Computing Machinery and Intelligence
...
Information revolution: Impact of technology on global workforce

When iterating over the return of search methods, the client library will always traverse all results regardless of the number of pages. If just the first batch is enough, you can avoid more calls to API, handling only current results:

from semanticscholar import SemanticScholar
sch = SemanticScholar()
results = sch.search_paper('Computing Machinery and Intelligence')
for item in results.items:
     print(item.title)

Output:

Computing Machinery and Intelligence
Computing Machinery and Intelligence (1950)
Artificial intelligence in the research of consciousness and in social life (in honor of 70-years anniversary of A. Turing’s paper “Computing Machinery and Intelligence” (papers of the “round table”)
Studies on computing machinery and intelligence
On Computing Machinery and Intelligence
...
Building Thinking Machines by Solving Animal Cognition Tasks

Recommended papers

To get recommended papers for a given paper:

from semanticscholar import SemanticScholar
sch = SemanticScholar()
results = sch.get_recommended_papers('10.2139/ssrn.2250500')
for item in results:
     print(item.title)

Output:

Microcredit: Impacts and promising innovations
MIT Open Access
The Econmics of Badmouthing: Libel Law and the Underworld of the Financial Press in France before World War I
Give Biden a 6-Point
Getting more value from Australian Intergenerational Reports
...
Structural Change and Economic Dynamics

To get recommended papers based on a list of positive and negative paper examples:

from semanticscholar import SemanticScholar
sch = SemanticScholar()
positive_paper_ids = ['10.1145/3544585.3544600']
negative_paper_ids = ['10.1145/301250.301271']
results = sch.get_recommended_papers_from_lists(positive_paper_ids, negative_paper_ids)
for item in results:
     print(item.title)

Output:

BUILDING MINIMUM SPANNING TREES BY LIMITED NUMBER OF NODES OVER TRIANGULATED SET OF INITIAL NODES
Recognition of chordal graphs and cographs which are Cover-Incomparability graphs
Minimizing Maximum Unmet Demand by Transportations Between Adjacent Nodes Characterized by Supplies and Demands
Optimal Near-Linear Space Heaviest Induced Ancestors
Diameter-2-critical graphs with at most 13 nodes
...
Advanced Heuristic and Approximation Algorithms (M2)

You can also omit the list of negative paper IDs; in which case, the API will return recommended papers based on the list of positive paper IDs only.

Query parameters for all methods

`fields: list`

The list of the fields to be returned. By default, the response includes all fields. As explained in official documentation, fields like papers (author lookup and search) may result in responses bigger than the usual size and affect performance. Consider reducing the list. Check official documentation for a list of available fields.

from semanticscholar import SemanticScholar
sch = SemanticScholar()
results = sch.search_paper('software engineering', fields=['title','year'])
for item in results:
     print(item)

Output:

{'paperId': 'd0bc1501ae6f54dd16534e651d90d2aeeeb1cfc1', 'title': 'Software engineering: What is it?', 'year': 2018}
{'paperId': 'f70b2f20be241f445a61f33c4b8e76e554760340', 'title': 'Software Engineering for Machine Learning: A Case Study', 'year': 2019}
{'paperId': '55bdaa9d27ed595e2ccf34b3a7847020cc9c946c', 'title': 'Performing systematic literature reviews in software engineering', 'year': 2006}
{'paperId': '27e57cc2f22c1921d2a1c3954d5062e3fe391553', 'title': 'Guidelines for conducting and reporting case study research in software engineering', 'year': 2009}    
{'paperId': '81dbfc1bc890368979399874e47e0529ddceaece', 'title': "Software Engineering: A Practitioner's Approach", 'year': 1982}
...

Query parameters for all search methods

`limit: int`

This parameter represents the maximum number of results to return on each call to API, and its value can't be higher than 100, which is the default value. According to official documentation, setting a smaller limit reduces output size and latency.

from semanticscholar import SemanticScholar
sch = SemanticScholar()
results = sch.search_paper('software engineering', limit=5)
len(results)

Output:

Query parameters for search papers

`year: str`

Restrict results to a specific publication year or a given range, following the patterns '{year}' or '{start}-{end}'. Also you can omit the start or the end. Examples: '2000', '1991-2000', '1991-', '-2000'.

from semanticscholar import SemanticScholar
sch = SemanticScholar()
results = sch.search_paper('software engineering', year=2000)
results[0].year

Output:

`fields_of_study: list`

Restrict results to a given list of fields of study. Check official documentation for a list of available fields.

from semanticscholar import SemanticScholar
sch = SemanticScholar()
results = sch.search_paper('software engineering', fields_of_study=['Computer Science','Education'])
results[0].s2FieldsOfStudy

Output:

[{'category': 'Computer Science', 'source': 'external'}, {'category': 'Computer Science', 'source': 's2-fos-model'}]

Other options

`timeout: int`

You can set the wait time for a response. By default, requests to API will wait for 10 seconds until the Timeout Exception raises. To change the default value, specify it at instance creation of SemanticScholar class:

from semanticscholar import SemanticScholar
sch = SemanticScholar(timeout=5)

or set timeout property value:

from semanticscholar import SemanticScholar
sch = SemanticScholar()
sch.timeout = 5

Accessing the Data Partner's API

If you are a Semantic Scholar Data Partner you can provide the private key as an optional argument:

from semanticscholar import SemanticScholar
s2_api_key = '40-CharacterPrivateKeyProvidedToPartners'
sch = SemanticScholar(api_key=s2_api_key)

Semantic Scholar API official docs and additional resources

If you have concerns or feedback specific to this library, feel free to open an issue. However, the official documentation provides additional resources for broader API-related issues.

For details on Semantic Scholar APIs capabilities and limits, go to the official documentation.
The Frequently Asked Questions page also provides helpful content if you need a better understanding of data fetched from Semantic Scholar services.
This official GitHub repository allows users to report issues and suggest improvements.
There is also a community on Slack.

semanticscholar's People

Contributors

Stargazers

Watchers

semanticscholar's Issues

How can I filter by conference without specifying a query?

For example, I want to retrieve the paper with the most citations in AAAI, but I cannot retrieve it without specifying query. how can I retrieve papers filtered only by conference without specifying query?

some get_paper results are empty

Here are two examples:

print(sch.get_paper("c77cd64f26228442ffff9219bfd870c83c8747c0"))
print(sch.get_paper("d7701e78e0bfc92b03a89582e80cfb751ac03f26"))

results in

{}
{}

But those are real papers according to the API:

curl https://api.semanticscholar.org/graph/v1/paper/c77cd64f26228442ffff9219bfd870c83c8747c0
curl https://api.semanticscholar.org/graph/v1/paper/d7701e78e0bfc92b03a89582e80cfb751ac03f26

results in:

{"paperId": "c77cd64f26228442ffff9219bfd870c83c8747c0", "title": "Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints"}
{"paperId": "d7701e78e0bfc92b03a89582e80cfb751ac03f26", "title": "Explaining Explanations: An Overview of Interpretability of Machine Learning"}

My version is "0.3.2".

Add support for Datasets API

Move current usage examples in README to the new documentation.

Add contextsWithIntent field for Citation and Reference

Difference between FieldsOfStudy and s2FieldsOfStudy?

What is the difference between FieldsOfStudy and s2FieldsOfStudy? I also notice every paper contain a valid s2FieldsOfStudy, but a few of them has None FieldsOfStudy. If I need to analyze the field distribution of the papers, is it better to use s2FieldsOfStudy since it is more complete?

Remove the Aliases field from Author

Is there a way to get the list of cited papers of a paper (given this paper’s id)?

HTTP status 403 Forbidden error

When I'm trying to access paper data i get error:

raise PermissionError('HTTP status 403 Forbidden.')

I was able to get some data yesterday. But today it does not working again

example code

from semanticscholar import SemanticScholar
sch = SemanticScholar(timeout=2)
paper = sch.paper("10.2307/2393374")
test = paper['references']
print(test)

Filter issue on Python library (Google Colab)

The issue identified concerns the semanticscholar library used to interact with the Semantic Scholar API. When the search_paper function is used with a limit argument to limit the number of results returned, the limitation does not appear to be applied correctly, and more results than specified are returned.

But when we make a direct request to the Semantic Scholar API using HTTP requests, rather than using the Python library, it seems working.

I don't know if I missed something but if not, maybe a little "fix" would be great. Thanks for your attention and have a nice day !

Add publicationDateOrYear filtering option to paper search

Missing query parameters in paper search

Currently the search_paper method has the parameters: query, year, fields_of_study, fields, and limit.

According to the documentation, it's missing the following parameters:

publicationTypes
openAccessPdf
venue

get_authors sometimes returns None values in a list of Author objects.

sch.get_authors(['2148555752']) returns a NoneType error because the SS API returns None for this author id and the module does not seem to account for this. A NoneType error is then generated when trying to complete an Author object.

TypeError Traceback (most recent call last)
Cell In [286], line 1
----> 1 sch.get_authors(['2148555752'])

File ~/anaconda3/envs/py39/lib/python3.9/site-packages/semanticscholar/SemanticScholar.py:397, in SemanticScholar.get_authors(self, author_ids, fields)
385 '''Get details for multiple authors at once
386
387 :calls: `POST /author/batch <https://api.semanticscholar.org/api-docs/\
(...)
393 :raises: BadQueryParametersException: if no author was found.
394 '''
396 loop = asyncio.get_event_loop()
--> 397 authors = loop.run_until_complete(
398 self._AsyncSemanticScholar.get_authors(
399 author_ids=author_ids,
400 fields=fields
401 )
402 )
404 return authors

File ~/anaconda3/envs/py39/lib/python3.9/site-packages/nest_asyncio.py:89, in _patch_loop..run_until_complete(self, future)
86 if not f.done():
87 raise RuntimeError(
88 'Event loop stopped before Future completed.')
---> 89 return f.result()

File ~/anaconda3/envs/py39/lib/python3.9/asyncio/futures.py:201, in Future.result(self)
199 self.__log_traceback = False
200 if self._exception is not None:
--> 201 raise self._exception
202 return self._result

File ~/anaconda3/envs/py39/lib/python3.9/asyncio/tasks.py:256, in Task.__step(failed resolving arguments)
252 try:
253 if exc is None:
254 # We use the send method directly, because coroutines
255 # don't have __iter__ and __next__ methods.
--> 256 result = coro.send(None)
257 else:
258 result = coro.throw(exc)

File ~/anaconda3/envs/py39/lib/python3.9/site-packages/semanticscholar/AsyncSemanticScholar.py:543, in AsyncSemanticScholar.get_authors(self, author_ids, fields)
539 payload = { "ids": author_ids }
541 data = await self._requester.get_data_async(
542 url, parameters, self.auth_header, payload)
--> 543 authors = [Author(item) for item in data]
545 return authors

File ~/anaconda3/envs/py39/lib/python3.9/site-packages/semanticscholar/AsyncSemanticScholar.py:543, in (.0)
539 payload = { "ids": author_ids }
541 data = await self._requester.get_data_async(
542 url, parameters, self.auth_header, payload)
--> 543 authors = [Author(item) for item in data]
545 return authors

File ~/anaconda3/envs/py39/lib/python3.9/site-packages/semanticscholar/Author.py:57, in Author.init(self, data)
55 self._papers = None
56 self._url = None
---> 57 self._init_attributes(data)

File ~/anaconda3/envs/py39/lib/python3.9/site-packages/semanticscholar/Author.py:131, in Author._init_attributes(self, data)
129 def _init_attributes(self, data):
130 self._data = data
--> 131 if 'affiliations' in data:
132 self._affiliations = data['affiliations']
133 if 'authorId' in data:

TypeError: argument of type 'NoneType' is not iterable

Support for asynchronous programming

Hello! I'm building a database that requires a lot of requests. The requests library makes this process very slow. Can you support aiohttp to achieve asynchrony and improve request efficiency?

Add debug option

Implement debugging feature that outputs essential request details such as the request URL, HTTP method, headers and payload.

Get details for multiple papers or authors at once

Academic Graph API now supports query up to 1000 papers or authors at once, which reduces API calls.

https://api.semanticscholar.org/api-docs/graph#tag/Paper-Data/operation/post_graph_get_papers
https://api.semanticscholar.org/api-docs/graph#tag/Author-Data/operation/post_graph_get_authors

Retrieve papers from conference

Question

This is a follow up question related to #58, because I am not clear on the query issue.

How could we get the most cited papers for a given conference, let's say NeurIPS 2023, could I use this tool to retrieve top-10 most cited papers?

This would be the url to query https://www.semanticscholar.org/venue?name=Neural%20Information%20Processing%20Systems&year%5B0%5D=2023&year%5B1%5D=2023&page=2&sort=influence, but I am not sure if it is possible to retrieve this.

No TLDRs when using Data Partner's API

Regular use of 'sch.get_paper' has TLDR, while access through Data Partner's API does not.
What's wrong?

Provide additional usage examples.

Expand and enhance project documentation

Return None in get_paper and get_papers when data is none instead of failing to construct a Paper

I am using semanticscholar to get a large number of papers while traversing the S2AG iteratively, and sometimes this results in queries with SemanticScholar.get_paper that result in data being None.

This causes an error here in Paper._init_attributes of course, because that method assumes that data is a dict.

As a temporary workaround, I've writing custom get_paper and get_papers methods that change the line return Paper(data) to return Paper(data) if data is not None else None. (See here for an example.) Otherwise I like to use semanticscholar as is. But this gets tedious to maintain in parallel with updates to semanticscholar; for example, I now need to write more complicated functions to keep up with AsyncSemanticScholar.

Would the developers consider returning None, or otherwise not causing SemanticScholar to throw an error that would stop a loop that I might not be monitoring?

I suppose the alternative is to include try except blocks in my code and manually return None, but that seems uglier. On the other hand, the developers might feel it is preferred. Thanks for your consideration.

Including year in paper search increases number of results

I noticed some unexpected behavior when including year as a query parameter in the search for papers. Specifically, including the year drastically increases the number of results returned. This does not seem right as it should rather filter out a large number of publications.

Example:

from semanticscholar import SemanticScholar
sch = SemanticScholar()

results = sch.search_paper('Computing Machinery and Intelligence')
print(f'{results.total} results without year parameter.')

results = sch.search_paper('Computing Machinery and Intelligence', year=1950)
print(f'{results.total} results with year parameter.')

The first search returns 510 results, the second more than 250k results.

Any idea why this might be the case?

Tim Berners-Lee author search gives 0 results

def testSearchAuthor(self):
        """
      test autor search
        """
        results=sch.search_author("Tim Berners-Lee")
        print(results.total)

compare to https://www.semanticscholar.org/search?q=Tim%20Berners-Lee&sort=relevance

and https://www.semanticscholar.org/author/Tim-Berners-Lee/1403176300

Source of the references and inconsistency

What is the source of these references by sch.get_paper_references?

I notice that a few of them are not consistent with the ground truth references in the pdf (i.e. missing references, having extra references, or the references are not the same ones).

Also, the ordering is not the same as ground truth so cannot map to the in-context citation index.

Adding some additional filter arguments to get_recommended_papers_from_lists

Would it be possible to add one or two additional filter arguments to the great function get_recommended_papers_from_lists? I may have missed it, but being able to add e.g. a publication date range would be fantastic. It seems like the papers returned are the most recent but being able to filter on this would then allow the package to be used as a more general weekly or monthly recommendation system. I saw "pool" by "recent" etc. in other functions, so even something similar to that would be ideal. Only option I can think of is to just query frequently then hope that the returned results have the new recommendations from since last search and just keep a record of ones that have been used before.

    def get_recommended_papers_from_lists(
                self,
                positive_paper_ids: List[str],
                negative_paper_ids: List[str] = None,
                fields: list = None,
                limit: int = 100,
                publicationDate_min: datetime, ## for example
                publicationDate_max: datetime ## adding something similar
            ) -> List[Paper]:

Thanks for great package, I'd be using it every week should this option become available.

Method to search author by name

Hello! Great software. However, it proves difficult to find a list of the author code strings ie.
author = sch.author(2262347)
Is there a method to search by name? Else, what would be a good way to compile a list of the codes?

Add minCitationCount filtering option to paper search

Set up issue templates

Add support for paper title search

Outline key documentation areas and create basic structure.

Add contribution guidelines

PDF of a paper

Papers on SemanticScholar often include a link to a PDF to a paper (example). Is there a way to access this link via the SemanticScholar API or not? If not, then what's the easiest way to obtain the said PDF?

I suppose that if ACL is in externalIds then one can construct the pfd link as f"https://aclanthology.org/{ACL}.pdf" which however is not very elegant.

This is maybe a general question regarding the API and not of this project but I'd still hope that someone would know the answer.

Add or improve docstrings.

Citations list maxes out at 1000

I'm trying to use semanticscholar to do some cross referencing of papers and their citations. For this, I'm iterating and examining every citation. I thought that paper.citation would list all citations, but it seems to just list 1000 of them:

from semanticscholar import SemanticScholar
sch = SemanticScholar(timeout=10)
paper = sch.get_paper('a6cb366736791bcccc5c8639de5a8f9636bf87e8')
print(f"len(paper.citations) = {len(paper.citations)} but paper.citationCount = {paper.citationCount}")

Is there a way to get all citations?

Add support for paper bulk search

Would you please upgrade pip version?

Daniel,

Would you please upgrade pip version?

You might add a .travis config just like the following one: https://github.com/wyh/semanticscholar/blob/master/.travis.yml

After adding travis, it will build and deploy pip automatically.

Searching with keywords

Can we have a search mechanism for searching papers based on some keyword queries? There is already an API call for that:
https://api.semanticscholar.org/graph/v1/paper/search?query=word+embedding+automatic+query+expansion&fields=title,abstract

Review and enhance test suite

I can't use offset and limit parameters in the getpaper method

I'm using the method https://api.semanticscholar.org/api-docs/graph#tag/Paper-Data/operation/get_graph_get_paper_citations

Here's my code snippet:

  references = sch.get_paper(str(paper['paperId']) + "/references", fields=['contexts','paperId' ,'externalIds','intents','isInfluential','abstract','publicationDate ','title','venue','authors','year'])

It is working very well, however, the api limits the results to 1000 citations, in my dataset, I have articles with more than 4 thousand references, for this reason, I tried to use offset and limit, as suggested in the referenced documentation, however, I cannot include offset within fields, as it returns an error, it was also not possible to include either the offset parameter or the limit parameter directly in the URL.

It would be very useful for the library to be able to use these parameters.

How can I get author information without knowing internal author ID?

Unexpected, frequent timeout

The timeout happens frequently. My current walk around is to enforce running up to three times.

Is this expected? Could you help me a bit?

Traceback (most recent call last):
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/anyio/streams/tls.py", line 133, in _call_sslobject_method
    result = func(*args)
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/ssl.py", line 889, in read
    v = self._sslobj.read(len)
ssl.SSLWantReadError: The operation did not complete (read) (_ssl.c:2633)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/anyio/_core/_tasks.py", line 115, in fail_after
    yield cancel_scope
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/httpcore/_backends/anyio.py", line 34, in read
    return await self._stream.receive(max_bytes=max_bytes)
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/anyio/streams/tls.py", line 198, in receive
    data = await self._call_sslobject_method(self._ssl_object.read, max_bytes)
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/anyio/streams/tls.py", line 140, in _call_sslobject_method
    data = await self.transport_stream.receive()
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 1095, in receive
    await self._protocol.read_event.wait()
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/asyncio/locks.py", line 226, in wait
    await fut
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/asyncio/futures.py", line 284, in __await__
    yield self  # This tells Task to wait for completion.
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/asyncio/tasks.py", line 328, in __wakeup
    future.result()
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/asyncio/futures.py", line 196, in result
    raise exc
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f709761e580

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/httpcore/_exceptions.py", line 10, in map_exceptions
    yield
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/httpcore/_backends/anyio.py", line 36, in read
    return b""
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/contextlib.py", line 137, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/anyio/_core/_tasks.py", line 118, in fail_after
    raise TimeoutError
TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/httpx/_transports/default.py", line 66, in map_httpcore_exceptions
    yield
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/httpx/_transports/default.py", line 366, in handle_async_request
    resp = await self._pool.handle_async_request(req)
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/httpcore/_async/connection_pool.py", line 268, in handle_async_request
    raise exc
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/httpcore/_async/connection_pool.py", line 251, in handle_async_request
    response = await connection.handle_async_request(request)
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/httpcore/_async/connection.py", line 103, in handle_async_request
    return await self._connection.handle_async_request(request)
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/httpcore/_async/http11.py", line 133, in handle_async_request
    raise exc
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/httpcore/_async/http11.py", line 111, in handle_async_request
    ) = await self._receive_response_headers(**kwargs)
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/httpcore/_async/http11.py", line 176, in _receive_response_headers
    event = await self._receive_event(timeout=timeout)
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/httpcore/_async/http11.py", line 212, in _receive_event
    data = await self._network_stream.read(
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/httpcore/_backends/anyio.py", line 36, in read
    return b""
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/contextlib.py", line 137, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc) from exc
httpcore.ReadTimeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/mnt/nlpgridio3/data/liyifei/survey-generation-dev/src/generate_refs.py", line 54, in <module>
    generate_refs('test')
  File "/mnt/nlpgridio3/data/liyifei/survey-generation-dev/src/generate_refs.py", line 26, in generate_refs
    results = sch.search_paper(ref_title)
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/semanticscholar/SemanticScholar.py", line 266, in search_paper
    results = loop.run_until_complete(
  File "/home1/l/liyifei/.local/lib/python3.9/site-packages/nest_asyncio.py", line 90, in run_until_complete
    return f.result()
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/asyncio/futures.py", line 201, in result
    raise self._exception
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/asyncio/tasks.py", line 256, in __step
    result = coro.send(None)
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/semanticscholar/AsyncSemanticScholar.py", line 342, in search_paper
    results = await PaginatedResults.create(
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/semanticscholar/PaginatedResults.py", line 56, in create
    await obj._async_get_next_page()
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/semanticscholar/PaginatedResults.py", line 121, in _async_get_next_page
    results = await self._request_data()
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/semanticscholar/PaginatedResults.py", line 112, in _request_data
    return await self._requester.get_data_async(
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/tenacity/_asyncio.py", line 88, in async_wrapped
    return await fn(*args, **kwargs)
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/tenacity/_asyncio.py", line 47, in __call__
    do = self.iter(retry_state=retry_state)
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/tenacity/__init__.py", line 314, in iter
    return fut.result()
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/tenacity/_asyncio.py", line 50, in __call__
    result = await fn(*args, **kwargs)
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/semanticscholar/ApiRequester.py", line 62, in get_data_async
    r = await client.request(
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/httpx/_client.py", line 1530, in request
    return await self.send(request, auth=auth, follow_redirects=follow_redirects)
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/httpx/_client.py", line 1617, in send
    response = await self._send_handling_auth(
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/httpx/_client.py", line 1645, in _send_handling_auth
    response = await self._send_handling_redirects(
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/httpx/_client.py", line 1682, in _send_handling_redirects
    response = await self._send_single_request(request)
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/httpx/_client.py", line 1719, in _send_single_request
    response = await transport.handle_async_request(request)
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/httpx/_transports/default.py", line 366, in handle_async_request
    resp = await self._pool.handle_async_request(req)
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/contextlib.py", line 137, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/nlp/data/liyifei/.conda/envs/lsg/lib/python3.9/site-packages/httpx/_transports/default.py", line 83, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.ReadTimeout

key error in pagination

The unit test failed so i am catching the exception now. IMHO there should not be an exception.

    def testSearchPaper(self):
        """
        test paper search
        """
        titles=["M. Agosti, C. Thanos (Eds). Post-proceedings of the First Italian Research Conference on Digital Library Management Systems (IRCDL 2005), Padova, 28th January, 2005. September 2005."]
        for title in titles:
            try:
                results=self.semscholar.sch.search_paper(title)
                self.showResults(results)
            except Exception as ex:
                print(f"exception on testSearchPaper: {str(ex)}")
                pass

Extract Bibtex

Using get_paper() we can access its metadata, but none of it appears to have a preformatted version of its bibtex citation? Is this something that is available or can be reliably constructed with automatic methods?

Use.

Hi Daniel. Can you explain to me how can I use this script because with the info posted I get errors. I'm really new at python, sorry to bother.

Thank you in advance.

Jonathan

How to get list of outputs without concerning several invalid inputs

When we process a bunch of document, we first get the list_of_corpusid and input them once by sch.get_papers(list_of_corpusid) to get a list of outputs so that we can save time.

However, when one or several of the inputs are invalid corpusid, the whole outputs would fail. This is painful when processing large corpus because we cannot guarantee that all corpusid are valid and the batch-processing is always interrupted.

search_paper takes forever and does not return any output

Hi,

Any hints on why this leads to an error?

Can we wrap the 500 or 504 Exception to a more specific Exception

Proposed change description

Sometimes, when we try to get data from semantic scholar server, the http response status code may be 500 or 504, and it's json message is "Endpoint request timeout".

Then the semanticscholar.ApiRequester will raise an Exception as below.

For some reason, we may want to ignore this error and retry calling that endpoint again, Therefore, if this Exception is more specific, it would be much more helpful to us.

What about create a new semanticscholar.ServerInternalException, and raise ServerInternalException(data['message']) instead when we got 500 or 504 response. So, we can catch this ServerInternalException like this.

client = AsyncSemanticScholar()
try:
    papers = await client.get_papers(['some-id'])
except ServerInternalException:
    papers = client.get_papers(['some-id']) # call it again

Thanks.

KeyError: 'data'

https://www.semanticscholar.org/paper/Detection-of-Repackaged-Android-Malware-with-Tian-Yao/c4685a31bcd83dd2c3c7ec8e741058c43203c61e

>>> from semanticscholar import SemanticScholar
>>> sch = SemanticScholar()
>>> title = 'Detection of Repackaged Android Malware with Code-Heterogeneity Features.'
>>> search_query = sch.search_paper(title)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/homebrew/lib/python3.10/site-packages/semanticscholar/SemanticScholar.py", line 353, in search_paper
    results = PaginatedResults(
  File "/opt/homebrew/lib/python3.10/site-packages/semanticscholar/PaginatedResults.py", line 38, in __init__
    self.__get_next_page()
  File "/opt/homebrew/lib/python3.10/site-packages/semanticscholar/PaginatedResults.py", line 101, in __get_next_page
    self._data = results['data']
KeyError: 'data'

a paper with DOI, title ect, is not showing

thanks a lot for the package!
for example, the paper with the DOI 10.2147/OAJSM.S133406
has all the details. but when i try to retrieve the title or authors name, i get othing
paper=sch.paper('10.2147/OAJSM.S133406')
thanks, Igor

danielnsilva / semanticscholar Goto Github PK

semanticscholar's Introduction

semanticscholar

Table of Contents

How to install

Usage

Paper Lookup

Author Lookup

Retrieve multiple items at once

Search for papers and authors

Traversing search results

Recommended papers

Query parameters for all methods

fields: list

Query parameters for all search methods

limit: int

Query parameters for search papers

year: str

fields_of_study: list

Other options

timeout: int

Accessing the Data Partner's API

Semantic Scholar API official docs and additional resources

semanticscholar's People

Contributors

Stargazers

Watchers

Forkers

semanticscholar's Issues

Question

Proposed change description

Recommend Projects

Recommend Topics

Recommend Org

`fields: list`

`limit: int`

`year: str`

`fields_of_study: list`

`timeout: int`