pybliometrics-dev / pybliometrics Goto Github PK

View Code? Open in Web Editor NEW

379.0 379.0 123.0 2.42 MB

Python-based API-Wrapper to access Scopus

Home Page: https://pybliometrics.readthedocs.io/en/stable/

License: Other

Python 100.00%

api-wrapper bibliometrics scopus scopus-api

pybliometrics's People

Contributors

Stargazers

Watchers

Forkers

r21gh talaveol ribonilla alexlib silask fnbns alistairwalsh mindlessjoker juanfcocontreras computron waffleboy adrienginesty sahansera materialsintelligence lmann4 samuelcesc liuq jdagdelen aythae aieri tschroth mbrcic aymericbds sayakaasano liao-wk fabiosangregorio savior-wu crew102 nikolicdanilo wjjmjh elseviersoftwarex fredson myforkedrepositories gilad1bl i-am-schramm philip-muench mkyl elsevierdev asoehartono jingjingshan1201 governorunfortunable lmbertholdo cauluomeng sharejin lucyxiaoluwang webteg milensys etspielberg hadimorrow cvictorr2508 pierrehenry14 dennis9707 mmagaldi77 fefu-analytics-sector damian-romero nurzhsapargali mrshepardd sai90000 podusmonens herreio 1kastner zenderro saigerutherford fjmartinmartinez susbiores-ubc thubaralei jlodoesit marlonrf yufeng-yao astrochun cllikemushroom bz-dev ssaberipouya raffaem dreamingspires marysanchezg verasamaooo yelibrarian b613jk amit2das denzelglandel leoliu0 fragon-size marcao02 kauanmatos224 anilkumar-krishna guen-ltu reinaqu jakobhoffmann zktuong mangaliso-maduna abhinavwidak denizkenankilic meflm vincent-wq khanzanbaz123 lkampoli innodatalab claudiocorb nhanuser

pybliometrics's Issues

EOFError: EOF when reading a line

in this file: scopus/utils/create_config.py
if py3:

key = input(prompt_key)

I just created config file with my apiKey but I have this issue: EOFError: EOF when reading a line

Cache for ScopusAbstract does not take view into account

ScopusAbstract does not take view parameter into account when fetching items from the cache. This is problematic when requesting the same abstract multiple times, but for different views. For example (for empty cache):

a = ScopusAbstract('2-s2.0-84937398919', view='META') # Correct: Fetches META from API.
b = ScopusAbstract('2-s2.0-84937398919', view='FULL') # Error: Fetches META from cache again.

I noticed this when I switched from META to META_ABS in my script, and not getting the abstracts. Clearing the cache resolved the issue.

get_coauthors() throwing UnicodeEncodeError

When trying to get coauthors from an author, I receive the following exception:

line 232, in get_coauthors
    coauthor_name = '{0} {1}'.format(given_name, surname)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 2: ordinal not in range(128)

(I couldn't wait so I fixed this locally on my machine by simply encoding the given name and surname as UTF-8 at line 232)

UnicodeEncodeError on examples

I was trying to replicate the examples given in the README.org using python 2.7.

from scopus.scopus_api import ScopusAbstract

ab = ScopusAbstract("2-s2.0-84930616647", refresh=True)

results in


Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/scopus/scopus_api.py", line 133, in __init__
    results = ET.fromstring(text)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1301, in XML
    return parser.close()
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1654, in close
    self._raiseerror(v)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: no element found: line 1, column 0

What's happening here? Is this a bug?

https and instToken

Hey,

Everytime I update the module Scopus, I have problems.
Indeed, Elsevier gave me APIkey and InstToken to access to API Scopus.
Elsevier told me to use https and not http.

Everytime I need to check code to detect some modifications...

And now, with the new update (citation Overview), I try to detect where is wrong...

Can you add the option of instToken (it's easy) and (more) the HTTPS url ?

Thanks a lot for your help

Aymeric

Add ability to query Scopus without downloading records

I think it'd be useful for Search classes to have the ability to just query the Scopus APIs without actually downloading all the data. The user may want to know how many records match their query before actually committing to downloading those records. The way I'd implement this would be to have a download parameter added to Search classes which is set to True by default. When the user just wants to find out how many records match their query, they'd set download=False.

Related to this, I'd suggest adding a property for the number of matching records. Adding this property would allow the user to, for example, programmatically decide whether to download the records based on the size of the result set.

Happy to put together a PR for this.

ISSN and E-ISSN for abstracts

Dear all,

we are dealing with a massive dataset and need ISSN's for the further matching. For journals that have both issn and e-issn, request ScopusAbstract('').issn returns two values, e.g.:
'09670750 14680351'

I just want to make sure that in this pair ISSN is always the first one?

Thank you!

Unicode errors on RIS export, suggest to use unicode templates

If a ScopusAbstract contains unicode, the .ris property (and probably others like .bibtex) fail. This can be fixed by using a unicode template, e.g.:

template = u'''TY  - JOUR

with the u character added for unicode. You may need to repeat for other templates, e.g. bibtex

If you want a test, just try:

from scopus import ScopusAbstract
print(ScopusAbstract("2-s2.0-84896986100").ris)

The difference between request results and the information on the Scopus page

Hello!
I've occasionally noticed the following issue and so far did not find this topic been addressed. There is a divergence between results of the request and the actual information on the author's page.

E.g. I have the following output

au = ScopusAuthor("7004130277", refresh = True)
print (au.name)
print (au._current_affiliation)
print (au.ndocuments)
print (au.ncitations)
print (au.ncited_by)

Annamária R. Várkonyi -Kóczy
Selye Janos University, Department of Mathematics and Computer Science
164
494
364

When according to the author's page https://www.scopus.com/authid/detail.uri?authorId=7004130277
it should be
Annamária R. Várkonyi -Kóczy
Selye Janos University, Department of Mathematics and Computer Science
169
608
435

I had the suspicion that it counts only first-author paper, but it did not hold. May you please help me either by explaining why it happens or what I am missing.

Thanks! Tanya

ScopusAbstract references

Hi Michael,

I try to use ScopusAbstract to get references(environment: python 3, MAC OS). The code looks like the following:

from scopus import ScopusAbstract
ab = ScopusAbstract("2-s2.0-84930616647")
print(ab.refcount)
print(ab.references)

It returns "None". Other attributes work well, like ab. volume, ab. authors. Could you help me with this?

Thanks a lot,
Chao

switching API key

Hi,

I am hoping to switch to another API key because each key comes with a quota. However, even after uninstalling (or deleting) the package, the same old API key was automatically applied after reinstall. Do you know how to switch to another key?

Thank you!

AttributeError: 'NoneType' object has no attribute 'find'

Hi, I am using Mac OS X 10 and python 2.7. I have created my api key in the required file. However, I always get this error:

from scopus.scopus_api import ScopusAbstract
ab = ScopusAbstract("2-s2.0-84930616647", refresh=True)
Traceback (most recent call last):
File "", line 1, in
File "/Library/Python/2.7/site-packages/scopus/scopus_api.py", line 244, in init
sl = coredata.find('dtd:link[@rel="scopus"]', ns).get('href')
AttributeError: 'NoneType' object has no attribute 'find'

Look forward to your reply. Thanks a lot.

Refactor ScopusAuthor functions

The functions .get_document_eids() and .n_journal_articles() and maybe .n_yearly_publications() are in fact all the same: Under the hood they search for published articles. They can be refactored into one more general function that searches a set of publications (default: articles) for a given (range of) years.

For example, the user could request the EIDs for an author, which in its default remains at it is now: All publications, all the years. Parameters will steer whether the EIDs should other types of abstracts in form of a list. Another parameter would look for a given (list of) years. We do not need a function to return the number of journal articles, the user would simply call len() on the EIDs he receives.

Add a progress bar to download function?

It would be nice if download() could provide an indication of its progress, especially for large result sets that may take awhile to download. I don't know if this can be done cleanly without having to rely on an external lib like https://github.com/tqdm/tqdm, but either way I think it's worth considering.

version 1.x should be able to deal with missing fields

It seems that the new module version has some issues when some information is missing. This was not the case with the 0.x releases.
I'll give a few examples.

Author 55663915100 is recorded in Scopus as having no first name. When he is picked up as coauthor, I get a KeyError:

In [129]: AuthorRetrieval(7006053052).get_coauthors()
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-129-c797e4eae889> in <module>()
----> 1 AuthorRetrieval(7006053052).get_coauthors()

~/scopus/lib/python3.6/site-packages/scopus/author_retrieval.py in get_coauthors(self)
    270                     areas = [entry['subject-area']['$']]
    271                 new = coauth(surname=entry['preferred-name']['surname'],
--> 272                              given_name=entry['preferred-name']['given-name'],
    273                              id=entry['dc:identifier'].split(':')[-1],
    274                              areas='; '.join(areas),

KeyError: 'given-name'

Similarly, author 25823668100 has no subject-area:

In [130]: AuthorRetrieval(7007019154).get_coauthors()
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-130-149d28fd7cdb> in <module>()
----> 1 AuthorRetrieval(7007019154).get_coauthors()

~/scopus/lib/python3.6/site-packages/scopus/author_retrieval.py in get_coauthors(self)
    266                 aff = entry.get('affiliation-current', {})
    267                 try:
--> 268                     areas = [a['$'] for a in entry['subject-area']]
    269                 except TypeError:  # Only one subject area given
    270                     areas = [entry['subject-area']['$']]

KeyError: 'subject-area'

In [134]: AuthorRetrieval(25823668100).subject_areas
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-134-c19d35ceb413> in <module>()
----> 1 AuthorRetrieval(25823668100).subject_areas

~/scopus/lib/python3.6/site-packages/scopus/author_retrieval.py in subject_areas(self)
    180         areas = []
    181         area = namedtuple('Subjectarea', 'area abbreviation code')
--> 182         for item in self._json['subject-areas'].get('subject-area', []):
    183             new = area(area=item['$'], code=item['@code'],
    184                        abbreviation=item['@abbrev'])

AttributeError: 'NoneType' object has no attribute 'get'

Perhaps entries should be "padded" with None for any field that isn't returned by Scopus.

401 Client Error

Got this error
401 Client Error: Unauthorized for url: http://api.elsevier.com/content/author/author_id/7004212771?author_id=7004212771&view=ENHANCED

Here's my code:

import scopus
scopus.ScopusAuthor(7004212771)

I have modified my_scopus.py to add MY_API_KEY
I have an Insti token as well. Not sure how to fit that. thanks!

Discrepancy between ScopusAuthor and ScopusAbstract

There is a discrepancy in the data retrieved between these lines:

from scopus import ScopusAuthor, ScopusSearch, ScopusAbstract
author= "24340839100"
auDetails = ScopusAuthor(author, refresh=True)
print(auDetails.ndocuments)
# n=43

and these

s = ScopusSearch('AU-ID(' + str(author) + ')', refresh=True)
print(len(s.EIDS))
# n= 45

Why? Is this an issue on the API itself or in the function?
I have noticed that not all authors are affected by this.
The second number is the one displayed by Scopus official Website.

Update 1.1 query issue

After update 1.1, all search queries will result in the following error.

KeyError: 'Directories' followed by

NoSectionError: No section: 'Directories'

Why no references in ScopusAbstract() class?

The ScopusAbstract() class from scopus_api.py explicitly downloads the "META_ABS" view. This does not include references of this article. Is there a reason why they should be excluded? If so, whats is the next best way to obtain the references of an article?

The general view (i.e. removing '?view=META_ABS') would include the references and still provide the same information currently used.

Using the 'fields' parameter of ScopusSearch

I notice in the documentation that I can specify a list of fields I want returned from ScopusSearch.
https://scopus.readthedocs.io/en/latest/reference/scopus.ScopusSearch.html

What I'm wondering is, after I specify the list, how do I get them out of the results? Or is this not yet fully implemented?

AbstractRetrieval with full view returns no references although the API REF view does

Hi,
I am using the AbstractRetrieval to get the references of a document, using this code:

AbstractRetrieval(paper_id, view="FULL").references

but it returns None.
As stated in the documentation:

Note: Requires the FULL view of the article. Might be empty even if refcount is positive.

Using Scopus API with a simple get request and the REF view, like this:

https://api.elsevier.com/content/abstract/EID:2-s2.0-84951753303?apiKey=MY_API_KEY&view=REF&httpAccept=application/json

it returns an array of 40 references (even if the value of @total-references field is 48).

Using Scopus API with the FULL view, I get an array of 48 references.

I would expect to get the same results as the FULL view.

New class: PlumX Metrics

Create new class to access the PlumX Metrics API using the retrieval superclass.

Error from AbstractRetrieval references

When getting the AbstractRetrieval for 2-s2.0-85028692635 and looking at the references, I get a TypeError. I think it is enough to add TypeError in except of line 20 of utils\parse_content.py".

s = scopus.AbstractRetrieval('2-s2.0-85028692635', view='FULL', refresh=True)

s.references
Traceback (most recent call last):

  File "<ipython-input-15-5705d616f366>", line 1, in <module>
    s.references

  File "C:\anaconda3\lib\site-packages\scopus\abstract_retrieval.py", line 383, in references
    items = listify(chained_get(self._json, path, []))

  File "C:\anaconda3\lib\site-packages\scopus\utils\parse_content.py", line 19, in chained_get
    container = container[key]

TypeError: 'NoneType' object is not subscriptable`

issue with creating qfile for complex scopus queries

When you have complex (and long) scopus queries, the qfile creation mechanism fails. For example, for the search query "test" (with double quotes). Rather than just replacing slashes of the queries (query.replace('/', 'slash') in scopus_search.py how about implementing a solution like the one mentioned here (see the accepted solution): http://stackoverflow.com/questions/7406102/create-sane-safe-filename-from-any-unsafe-string
Then again, when the query is very long, this will still generate a very long file name. Therefore may be it's better to hash the query string and use the hash to create the file.

count and max_entries parameters need to be passable to init method of all classes that inherit from Search

Right now there are some subclasses of Search that don't have a count or max_entries parameter. For example, ScopusSearch automatically sets max_entries to 5,000 and count to either 25 or 200, depending on the value of view (see code snippet below). This is problematic b/c the 5,000 document limit no longer applies to Scopus subscribers, but subscribers are nevertheless still forced to use 5,000 as max_entries b/c it's hard-coded in the constructor. Meanwhile, non-subscribers can't use ScopusSearch b/c they only have access to the "Standard" view, which means that count is set to 200 in the __init__() method (resulting in an error b/c their access level requires that count be <= 25 - https://dev.elsevier.com/api_key_settings.html).

https://github.com/scopus-api/scopus/blob/d8791c162cef4c2f830d983b435333d9d8eaf472/scopus/scopus_search.py#L143-L148

Abstract description is not beeing extracted

Even if an abstract is present in the original scopus .json file, abstract.description call is not working and returns None, even though a manual API call does contain information in the dc:description part.
Since the abstract class parses the description ( self.description = get_encoded_text(coredata, 'dc:description')), there must be soemthing wrong with the parser.

I am just using a random article as an example:

from scopus.scopus_api import ScopusAbstract

ab = ScopusAbstract('2-s2.0-84871396493', refresh=True)
print(ab.description)
print(ab.doi)

Link to abstract:
https://bmcinfectdis.biomedcentral.com/articles/10.1186/1471-2334-12-366

function get_encoded_text(container, xpath) seems to be dysfunctional

Dear Prof. Kitchin,
Thanks for your Git.
I used the example you posted:
from scopus.scopus_author import ScopusAuthor
au = ScopusAuthor(7004212771)
print([a.name for a in au.get_coauthors()])

It worked for me, then suddenly it is broken, the information is like this:
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

It looks like the function, get_encoded_text, can not parse the correct number of elements.
Hope you noted this.
Thanks

Author Search counts doubling?

I'm making a Python script to get info about the author country. I think the API might have a strange behaviour that is: each time an AuthorSearch is made, e.g., s1 = AuthorSearch('AU-ID(' + str(au_id) + ')', refresh=False), it counts as 2 in the 5k limit that Scopus imposes to this type of query.

Anyone with this behaviour?

raw_input error on python3

Hi,

I installed the package pip under python3 but it throws an error on raw_input. I suspect that was because I didn't have the MY_API_KEY specified, but on python3 the call should be input not raw_input.
Thanks,
Luke

scopus.utils not found

This will probably be a really newbie question, and for that I apologize. I am just starting with python on Windows and hitting a lot of speedbumps. :) Running Anaconda 3, and I already have my environment set to use the scopus libraries. When I change directory to where scopus_api.py is and try to run it, I get an error:

(C:\Users\schultz\Anaconda3\envs\scopus) C:\Users\schultz\scopus>scopus_api.py Traceback (most recent call last): File "C:\Users\schultz\scopus\scopus_api.py", line 5, in <module> from scopus.utils import get_content, get_encoded_text, ns ImportError: No module named scopus.utils

Elsevier Announcement

Hi,

In the website for developpers, Elsevier announces the end of the object "Search API".
" Please note: Object Search API will no longer be available after October 2017. "
Is it the same that "Scopus Search"?
Do you know what's next?

We have developped a new methology to identify collaboration cluster in our university. But we need access to Scopus data for authors and documents.

Thanks a lot

Aymeric

get_encode_text fails in ScopusJournal

Hi,
Running the example about ScopusJournal:
"""
from scopus.scopus_api import ScopusAbstract, ScopusJournal
ab = ScopusAbstract("2-s2.0-84930616647")
print(ScopusJournal(ab.issn))
"""

I get the following error:
Traceback (most recent call last):
File "E:\Jaime\PAPELEO\Acreditacion\temp aneca CU\produccion 2017-06-19\prueba_journal.py", line 10, in
print(ScopusJournal(ab.issn))
File "C:\Anaconda3\lib\site-packages\scopus\scopus_api.py", line 602, in init
self.publisher = get_encoded_text(self.xml, 'entry/dc:publisher')
File "C:\Anaconda3\lib\site-packages\scopus\utils\get_encoded_text.py", line 30, in get_encoded_text
return container.find(xpath, ns).text
TypeError: slice indices must be integers or None or have an index method

xml seems to be ok

The extrange think is that get_encode_text works fine in ScopusAbstract

New class: Serial Title

Create new class to access the Serial Title API using the retrieval superclass.

UnicodeEncodeError with scopus_affiliation.py

There still seems to be a unicode issue in python 2.7 with certain authors.

This time it's with https://www.scopus.com/authid/detail.uri?authorId=6602884607

>>> from scopus.scopus_author import ScopusAuthor
>>> au = ScopusAuthor("6602884607", refresh=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/scopus/scopus_author.py", line 178, in __init__
    if el is not None]]
  File "/usr/local/lib/python2.7/dist-packages/scopus/scopus_affiliation.py", line 81, in __init__
    f.write(resp.text)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 1299: ordinal not in range(128)

egg_info error on installing scopus

Hi guys,

got this one when trying to install scopus with pip (pip install scopus). There are more lines of error but these ones the last ones, not sure if its worth posting everything

....
 File "/Users/USERNAME/anaconda/lib/python3.5/locale.py", line 486, in _parse_localename
        raise ValueError('unknown locale: %s' % localename)
    ValueError: unknown locale: UTF-8
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/__/rw3vn8f92zb6mfxw07hzvcnm015yc9/T/pip-build-j1jx4iyv/scopus/

I'm using python 3.6 with Anaconda. What could be possible solution to that?

400 Client Error with ScopusSearch

I am getting the following error when I try to reproduce your query sample. It seems like it's trying to access an incorrect website for scopus.

s = ScopusSearch('FIRSTAUTH ( kitchin  j.r. )', refresh=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Python/2.7/site-packages/scopus/scopus_search.py", line 91, in __init__
    resp = download(url=url, params=params, accept="json")
  File "/Library/Python/2.7/site-packages/scopus/utils/get_content.py", line 41, in download
    resp.raise_for_status()  # Raise status code if necessary
  File "/Library/Python/2.7/site-packages/requests/models.py", line 935, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: http://api.elsevier.com/content/search/scopus?count=200&query=FIRSTAUTH+%28+kitchin++j.r.+%29&start=0&fields=eid

UnicodeDecodeError when installing

Fresh install of Anaconda and git command line.

Sorry if this is a simple problem, I am getting started on a couple different levels, so most of the messages below do not mean much to me.

(C:\Users\sac\AppData\Local\Continuum\Anaconda3\envs\scopus) C:\Users\sac>pip install git+git://github.com/scopus-api/scopus
Collecting git+git://github.com/scopus-api/scopus
Cloning git://github.com/scopus-api/scopus to c:\users\sac\appdata\local\temp\1\pip-qwjphy1x-build

Complete output from command python setup.py egg_info:

Installed c:\users\sac\appdata\local\temp\1\pip-qwjphy1x-build\.eggs\pbr-3.1.1-py3.6.egg
[pbr] Generating ChangeLog
ERROR:root:Error parsing
Traceback (most recent call last):
  File "c:\users\sac\appdata\local\temp\1\pip-qwjphy1x-build\.eggs\pbr-3.1.1-py3.6.egg\pbr\core.py", line 111, in pbr
    attrs = util.cfg_to_args(path, dist.script_args)
  File "c:\users\sac\appdata\local\temp\1\pip-qwjphy1x-build\.eggs\pbr-3.1.1-py3.6.egg\pbr\util.py", line 251, in cfg_to_args
    kwargs = setup_cfg_to_setup_kwargs(config, script_args)
  File "c:\users\sac\appdata\local\temp\1\pip-qwjphy1x-build\.eggs\pbr-3.1.1-py3.6.egg\pbr\util.py", line 315, in setup_cfg_to_setup_kwa
    value += description_file.read().strip() + '\n\n'
  File "C:\Users\sac\AppData\Local\Continuum\Anaconda3\envs\scopus\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 23899: character maps to <undefined>
error in setup command: Error parsing C:\Users\sac\AppData\Local\Temp\1\pip-qwjphy1x-build\setup.cfg: UnicodeDecodeError: 'charmap' code
code byte 0x81 in position 23899: character maps to <undefined>

----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in C:\Users\sac\AppData\Local\Temp\1\pip-qwjphy1x-build\`

Redesign API to use Pandas

It would be really interesting if some of the API could return pandas Dataframes.

Moreover, I was not able to understand why in some case the hindex showed in the scopus page of an author is greater then the value that I get using the API.
I was not able also to ge the total number of papers. (Do I have to call author_impact_factor(..) ) with
a specific year? It seems it start from 2014.
How do I get the total number of papers of an author?

Scopus Api Error

Dear Michael-E-Rose, dear programmers,

I am using your Scopus Api and I like several aspects of it. But when I create an instance of the class "ScopusAbstract(object)" (in scopus_api.py), inserting an EID like "2-s2.0-63449122615" as string object, I get the error "NoneType object has no attribute 'find'". The specific problem is in line 244, which is

self.scopus_link = coredata.find('link[@rel="scopus"]', ns).get('href')

Would you be so kind and help me? I really need this for my study course.

Here is an example call:

from scopus.scopus_api import ScopusAbstract
from scopus.scopus_search import ScopusSearch
 
query_str = "This paper presents a new approach to select events of interest to users in a social"
query = ScopusSearch("ABS({})".format(query_str), refresh=False)
id = query.EIDS[0]
ab = ScopusAbstract(id)

-> Error "NoneType object has no attribute 'find'" in line 244 of scopus_api.py

Thank you.

Kind regards

Stefan Wilharm

Cache filenames too long

When the query is too long, the module fails when saving cache results:

[Errno 63] File name too long

The issue is in scopus_search.py:

qfile = os.path.join(SCOPUS_SEARCH_DIR,
                             # We need to remove / in a DOI here so we can save
                             # it as a file.
                             query.replace('/', '_slash_'))

[...]

with open(qfile, 'wb') as f:

Could the cache filename be shortened, or at least, truncated?

can't get scopus API working for ScopusAbstract

Hi,

I installed scopus-api both in Python 2 and Python 3 environments (Mac), set up my API key as instructed, but can't seem to get the basic Scopus Abstract function working:

>>> ab = ScopusAbstract("2-s2.0-84930616647", refresh=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ajain/miniconda3/envs/py3/lib/python3.5/site-packages/scopus/scopus_api.py", line 239, in __init__
    self.scopus_link = coredata.find('link[@rel="scopus"]', ns).get('href')
AttributeError: 'NoneType' object has no attribute 'find'

Note that the ScopusAuthor seems to work fine, hopefully suggesting that my API key set up should be in place:

>>> au = ScopusAuthor(7004212771)
>>> from scopus.scopus_author import ScopusAuthor
>>> au = ScopusAuthor(7004212771)
>>> print([a.name for a in au.get_coauthors()])
['Jens Kehlet Nørskov', 'Bruce C. Gates', 'Matthias Scheffler', 'Dionisios G. Vlachos', 'R. J. Gorte',  ...]

get_corresponding_author_info()

I downloaded the abstract corresponding to: 2-s2.0-84930616647 and got the following error.

UnboundLocalError                         Traceback (most recent call last)
<ipython-input-88-452337c57a07> in <module>()
----> 1 ab.get_corresponding_author_info()

C:\Users\Silas\Miniconda3\envs\py2\lib\site-packages\scopus\scopus_api.py in get_corresponding_author_info(self)
    280                             name = aa.text
    281 
--> 282                         return (scopus_url, name, email)
    283 
    284     def __str__(self):

UnboundLocalError: local variable 'scopus_url' referenced before assignment

There is a small bug in the if clause line 278 in scopus_api.py
But I don't know how to resolve it.

Bugs after recent update?

I installed this package on April 6, it worked fine although there were some minor bugs. Today (April 16) I updated the version and found some problems.
I typed the following code in ipython (python 2.7) on Mac OS X:

from scopus.scopus_api import ScopusAbstract
ab = ScopusAbstract("2-s2.0-84930616647", view='META', refresh=True) 
/usr/local/lib/python2.7/site-packages/scopus/scopus_api.pyc in __init__(self, EID, view, refresh)
239         # Parse authors
240         authors = xml.find('dtd:authors', ns)
--> 241         self._authors = [_ScopusAuthor(author) for author in authors]
    242         self._affiliations = [_ScopusAffiliation(aff) for aff
    243                               in xml.findall('dtd:affiliation', ns)]
TypeError: 'NoneType' object is not iterable

then I tried view='FULL',

from scopus.scopus_api import ScopusAbstract
ab = ScopusAbstract("2-s2.0-84930616647", view='FULL', refresh=True) #it is OK
print ab.refcount 
/usr/local/lib/python2.7/site-packages/scopus/scopus_api.pyc in refcount(self)
    146             return self._references.attrib['refcount']
    147         else:
--> 148             raise TypeError("Could not load article references. "
    149                             "Did you load with view=FULL?")
    150

TypeError: Could not load article references. Did you load with view=FULL?

Other information:
I installed the package using: pip install git+git://github.com/jkitchin/scopus
updated the package using: pip install --upgrade git+git://github.com/jkitchin/scopus

Is there a way to get the link to download the pdf file

I wanted to ask if it would be possible to get a link to download pdf file using this API?

Kind regards
Silas

Error with ScopusAbstract

Dear All,

ScopusAbstract requests just stopped working for me overnight (it's not a quota issue).

E.g. for the query from the readme I get this error.

from scopus.scopus_api import ScopusAbstract
ab = ScopusAbstract("2-s2.0-84930616647", refresh=True)
print(ab)

AttributeError Traceback (most recent call last)
in ()
1 from scopus.scopus_api import ScopusAbstract
2
----> 3 ab = ScopusAbstract("2-s2.0-84930616647", refresh=True)
4 print(ab)

C:\ProgramData\Anaconda3\lib\site-packages\scopus\scopus_api.py in init(self, EID, view, refresh)
221 self.creator = get_encoded_text(coredata, 'dc:creator')
222 self.description = get_encoded_text(coredata, 'dc:description')
--> 223 sl = coredata.find('dtd:link[@rel="scopus"]', ns).get('href')
224 self_link = coredata.find('dtd:link[@rel="self"]', ns).get('href')
225 cite_link = coredata.find('dtd:link[@rel="cited-by"]', ns)

AttributeError: 'NoneType' object has no attribute 'find'

It happens to all other ScopusAbstract queries as well. I would really appreciate your advice.

Best, Tatiana

Author affiliation missing in ScopusAbstract

Hi, I am using ScopusAbstract and have a problem.

from scopus import ScopusAbstract
ab = ScopusAbstract("2-s2.0-84930616647")
for au in ab.authors:
        print(au)

I can't get the author's affiliation id now, which is different from your example. My question is how I can get this information. I mean the author's affiliation when the paper got published, rather than one's current affiliation.

Thanks a lot!

error from ScopusSearch

Good day,

I am starting a python script which will eventually map given authors to a list of author-ids. To start with, I am attempting a somewhat simple search on a known author, but I receive an error:

map_authors.py
`from scopus import ScopusSearch

s = ScopusSearch('AUTHFIRST ( Bo ) and AUTHLASTNAME ( Li ) and AFFIL ( University of Illinois )', refresh=True)
print(s)`

(C:\Users\sac\AppData\Local\Continuum\Anaconda3\envs\scopus) c:\Users\sac\scopus_base>python map_authors.py Traceback (most recent call last): File "map_authors.py", line 4, in <module> print(s) File "C:\Users\sac\AppData\Local\Continuum\Anaconda3\envs\scopus\lib\site-packages\scopus\scopus_search.py", line 110, in __str__ entries='\n '.join(self.EIDS)) KeyError: 'query'

The error does not seem to map back to the Scopus Search API known errors. What is the root issue?

Class for citation overview API

I have recently started using Scopus' Citation Overview API. That's a special API to retrieve yearly citations to an author. That's a necessary add-on because the yearly citation from the Author Retrieval API equals the sum of total citations to all articles published in a given year. In most cases this is not necessary.

As the Citation Overview API requires special permission (see page), it was probably not implement in this package. Are you, @jkitchin, interested in adding the class for this API? I'm happy to create a PR in the next few days.

ContentAffiliationRetrieval str method AttributeError

Method str gives the following error
AttributeError: 'ContentAffiliationRetrieval' object has no attribute 'name'

Atribute was renamed in scopus 1.x and should be
s = '''{self.affiliation_name} ... '''
instead of
s = '''{self.name} ...'''

Using:
Windows 10 64bit
scopus.version 1.4.3
python 3.6

Support querying the Citations Overview API with multiple EIDs at once

The current CitationOverview class makes requesting citation reports for a single document easy, but its strict one-query-per-document structure makes it cumbersome to fetch non-trivial amounts of data.
Since scopus allows for adding a multitude of &scopus_id= parameters to the citation overview queries, being able to request multiple documents at once would drastically improve the efficiency of querying scopus. This is particularly true when factoring in the API limitations of (currently) 20k request per week per enabled API key (at only 3 requests / second).

I have thought a bit about the problem, and since the CitationOverview class doesn't really lend itself to representing a multitude of results (well, not at all), I think the best approach would be to add a very non-OO function to seed the cache. A generic description could be:

given a list of EIDs,
construct a CitationOverview API query (this is easy since requests natively supports lists as parameter values),
use scopus.utils.download to retrieve the whole set of results at once,
split the results saving one cache file per requested document (retaining the EID-as-filename structure)

After a first seeding run, CitationOverview classes could then be quickly instantiated from the cache.

I think the above would be the most natural way to implement a bulk downloader without having to mess with the CitationOverview class, or having to duplicate most of it into a new and weird CitationOverviewBulk class.

I could try to take a stab at the above in the next few days, do you think it would be the right approach?