Giter Site home page Giter Site logo

massimoaria / dimensionsr Goto Github PK

View Code? Open in Web Editor NEW
36.0 2.0 7.0 102 KB

Gathering metadata about publications, patents, grants, clinical trials and policy documents from DS Dimensions using DSL API

License: Other

R 48.68% HTML 51.32%
gathering-bibliographic-records ds-dimensions dsl-api bibliometrix bibliometrics grants grants-search bibliographic-database bibliography publications

dimensionsr's Introduction

dimensionsR

An R-package to gather bibliographic data from DS Dimenions.ai

Project Status: Active - The project has reached a stable, usable state and is being actively developed. Build Status rstudio mirror downloads cran version

 

The goal of dimensionsR is to gather metadata about publications, grants, policy documents,
and clinical trials from DS Dimensions using DSL API.

 

http://github.com/massimoaria/dimensionsR

 

by Massimo Aria

Full Professor in Social Statistics

PhD in Computational Statistics

Laboratory and Research Group STAD Statistics, Technology, Data Analysis

Department of Economics and Statistics

University of Naples Federico II

email [email protected]

https://www.massimoaria.com

 

Installation

You can install the developer version of the dimensionsR from GitHub with:

install.packages("devtools")
devtools::install_github("massimoaria/dimensionsR")

You can install the released version of dimensionsR from CRAN with:

install.packages("dimensionsR")

 

 

Load the package

library(dimensionsR)

 

 

A brief example

Imagine, we want to download a metadata collection of journal articles which (1) have used bibliometric approaches in their researches, (2) have been published for the past 20 years (3) and have been written in the English language.

The workflow mainly consists of five steps:

  1. Generate a valid token

  2. Write the query

  3. Check the effectiveness of the query

  4. Download the collection of document metadata

  5. Convert the download object into a "readable" and and "usable" format

 

First step: Generate a valid token

The access to DS Dimensions API system (https://www.dimensions.ai/) requires an "API token". Dimensions provides free data access for scientometric research projects, but you have to submit a request to obtain valid credentials (to generate API tokens) at https://digital-science.ccgranttracker.com/.

Once your request is approved, you can generate a valid token using the function dsAuth with your dimensions credentials or your dimensions API key.

Token by Dimensions credentials:

token <- dsAuth(username = "your_username", password = "your_password")

Token by Dimensions API key:

token <- dsAuth(key = "your_apikey")

The token is temporary and needs to be generated again after a few hours.

 

Second step: Write a valid query

Dimensions defined a custom query language called Dimensions Search Language (DSL). You can choose to write a valid query using that language or, in alternaative, using the function dsQueryBuild.

dsQueryBuild generates a valid query, written following the Dimensions Search Language (DSL), from a set of search parameters provided by the user.

Following our example, we have to define a query to download metadata about: (1) a set of journal articles which (2) have used bibliometric approaches in their researches, (4) in the field of management sciences, (3) and have been published for the past 20 years.

To write that query, we should set the function arguments as following:

  • a set of journal articles

    item = "publications"

    type = "articles"

  • have used bibliometric approaches in their researches

    words = "bibliometric*"

  • in the field of management sciences

    categories = "management"

  • documents published from 2000 to 2020:

    start_year = 2000

    end_year = 2020

  • and finally, export the following fields

    output_fields = c("basics", "extras", "authors", "concepts")

    (you could also export all database fields setting output_fields = "all")

Passing all these arguments to the function dsQueryBuild, we obtain the final query:

query <- dsQueryBuild(item = "publications", 
                  words = "bibliometric*", 
                  type = "article", 
                  categories = "management", 
                  start_year = 1980, end_year = 2020,
                  output_fields = c("basics", "extras", "authors", "concepts"))
                  
query

# "search publications in title_abstract_only  for \"\\\"bibliometric*\\\"\" where year in [ 1980 : 2020 ] and type in [ \"article\" ] and (category_for.name ~\"management\") return publications[type + basics + extras + authors + concepts]"

A complete list of output items is available at https://docs.dimensions.ai/dsl/data-sources.html  

Third step: Check the effectiveness of the query

Now, we want to know how many documents could be retrieved by our query.

To do that, we use the function dsApiRequest, setting the argument limit = 0:

res <- dsApiRequest(token = token, query = query, limit = 0)

res$total_count

# [1] 1148

 

Fourth step: Download the collection of document metadata

We could decide to change the query or continue to download the whole collection or a part of it (setting the limit argument lower than res$total_count).

Image, we decided to download the whole collection composed by 1148 documents:

D <- dsApiRequest(token = token, query = query, step = 200, limit = res$total_count)

# Records:  200 of 1148 
# Records:  400 of 1148 
# Records:  600 of 1148 
# Records:  800 of 1148 
# Records:  1000 of 1148 
# Records:  1148 of 1148 

The function dsApiRequest returns a list D composed by 4 objects:

  • "data". It is the xml-structured list containing the bibliographic metadata collection downloaded from the DS Dimensions database.

  • "item". It idicates the type of record retrieved. The argument can be equal to c("publications", "grants", "patents", "clinical_trials", "policy_documents").

  • "query". It a character object containing the query formulated with dsQueryBuild (or by the user) in the DSL language.

  • "total_counts". It is an integer object indicating the total number of records matching the query.

 

Fifth step: Convert the download object into a "readable" and and "usable" format

From the xml-structured object to a "classical" data frame

Finally, we transform the xml-structured object D into a data frame, with cases corresponding to documents and variables to Field Tags as used in the bibliometrix R package (https://CRAN.R-project.org/package=bibliometrix, https://bibliometrix.org/, https://github.com/massimoaria/bibliometrix).

M <- dsApi2df(D)

str(M)

'data.frame':	1148 obs. of  36 variables:
# $ AU     : chr  "DUQUE-ACEVEDO M;BELMONTE-UREÑA LJ;CORTÉS-GARCÍA FJ;CAMACHO-FERRE F" "SINGH S;DHIR S;DAS VM;SHARMA A" "FARINHA L;SEBASTIÃO JR;SAMPAIO C;LOPES J" "BILDOSOLA I;GARECHANA G;ZARRABEITIA E;CILLERUELO E" ...
# $ AF     : chr  "DUQUE-ACEVEDO, MÓNICA;BELMONTE-UREÑA, LUIS J.;CORTÉS-GARCÍA, FRANCISCO JOAQUÍN;CAMACHO-FERRE, FRANCISCO" "SINGH, SHIWANGI;DHIR, SANJAY;DAS, V. MUKUNDA;SHARMA, ANUJ" "FARINHA, LUÍS;SEBASTIÃO, JOÃO RENATO;SAMPAIO, CARLOS;LOPES, JOÃO" "BILDOSOLA, IÑAKI;GARECHANA, GAIZKA;ZARRABEITIA, ENARA;CILLERUELO, ERNESTO" ...
# $ TI     : chr  "AGRICULTURAL WASTE: REVIEW OF THE EVOLUTION, APPROACHES AND PERSPECTIVES ON ALTERNATIVE USES" "BIBLIOMETRIC OVERVIEW OF THE TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE JOURNAL: ANALYSIS FROM 1970 TO 2018" "SOCIAL INNOVATION AND SOCIAL ENTREPRENEURSHIP: DISCOVERING ORIGINS, EXPLORING CURRENT AND FUTURE TRENDS" "CHARACTERIZATION OF STRATEGIC EMERGING TECHNOLOGIES: THE CASE OF BIG DATA" ...
# $ SO     : chr  "GLOBAL ECOLOGY AND CONSERVATION" "TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE" "INTERNATIONAL REVIEW ON PUBLIC AND NONPROFIT MARKETING" "CENTRAL EUROPEAN JOURNAL OF OPERATIONS RESEARCH" ...
# $ SO_LIST: chr  "DOAJ;ERA 2018;NORWEGIAN REGISTER LEVEL 1" "NORWEGIAN REGISTER LEVEL 1;ERA 2015;VABB-SHW;ERA 2018" "ERA 2015;ERA 2018" "NORWEGIAN REGISTER LEVEL 1;ERA 2015;VABB-SHW;ERA 2018" ...
# $ LA     : chr  "ENGLISH" "ENGLISH" "ENGLISH" "ENGLISH" ...
# $ DT     : chr  "ARTICLE" "ARTICLE" "ARTICLE" "ARTICLE" ...
# $ DE     : chr  "RESEARCH;ANALYSIS;SCIENTIFIC PRODUCTION;PRODUCTION;ARTICLE;BIBLIOMETRIC METHODS;EVOLUTION;APPROACH;TRENDS;TRANS"| __truncated__ "PURPOSE;PAPER;TECHNOLOGICAL FORECASTING;JOURNALS;PERIOD;ARTICLE;SCOPE;DIVERSITY;FIELD;FRAGMENTATION;BELIEFS;REV"| __truncated__ "NONPROFIT SECTOR;SECTOR;IMPORTANT PART;PART;ECONOMY;CRITICAL FACTOR;FACTORS;SOCIAL CHANGE;CHANGES;SOCIAL INNOVA"| __truncated__ "CURRENT ENTERPRISES;ENTERPRISES;EMERGING TECHNOLOGIES;TECHNOLOGY;LATE ADOPTION;ADOPTION;COMPETITIVENESS;FRAME;A"| __truncated__ ...
# $ ID     : chr  "RESEARCH;ANALYSIS;SCIENTIFIC PRODUCTION;PRODUCTION;ARTICLE;BIBLIOMETRIC METHODS;EVOLUTION;APPROACH;TRENDS;TRANS"| __truncated__ "PURPOSE;PAPER;TECHNOLOGICAL FORECASTING;JOURNALS;PERIOD;ARTICLE;SCOPE;DIVERSITY;FIELD;FRAGMENTATION;BELIEFS;REV"| __truncated__ "NONPROFIT SECTOR;SECTOR;IMPORTANT PART;PART;ECONOMY;CRITICAL FACTOR;FACTORS;SOCIAL CHANGE;CHANGES;SOCIAL INNOVA"| __truncated__ "CURRENT ENTERPRISES;ENTERPRISES;EMERGING TECHNOLOGIES;TECHNOLOGY;LATE ADOPTION;ADOPTION;COMPETITIVENESS;FRAME;A"| __truncated__ ...
# $ AB     : chr  "NA" "NA" "NA" "NA" ...
# $ C1     : chr  "UNIVERSITY OF ALMERÍA, SPAIN;UNIVERSITY OF ALMERÍA, SPAIN;UNIVERSIDAD AUTÓNOMA DE CHILE, CHILE;UNIVERSITY OF ALMERÍA, SPAIN" "DEPARTMENT OF MANAGEMENT STUDIES, INDIAN INSTITUTE OF TECHNOLOGY (IIT) DELHI, HAUZ KHAS, NEW DELHI - 110016, IN"| __truncated__ "POLYTECHNIC INSTITUTE OF CASTELO BRANCO AND NECE RESEARCH CENTER IN BUSINESS SCIENCES, LARGO DO MUNICÍPIO, 6060"| __truncated__ "UNIVERSITY OF THE BASQUE COUNTRY, SPAIN;UNIVERSITY OF THE BASQUE COUNTRY, SPAIN;UNIVERSITY OF THE BASQUE COUNTR"| __truncated__ ...
# $ RP     : chr  "UNIVERSITY OF ALMERÍA,SPAIN" "DEPARTMENT OF MANAGEMENT STUDIES, INDIAN INSTITUTE OF TECHNOLOGY (IIT) DELHI, HAUZ KHAS, NEW DELHI - 110016, INDIA,NA" "POLYTECHNIC INSTITUTE OF CASTELO BRANCO AND NECE RESEARCH CENTER IN BUSINESS SCIENCES, LARGO DO MUNICÍPIO, 6060"| __truncated__ "UNIVERSITY OF THE BASQUE COUNTRY,SPAIN" ...
# $ OI     : chr  "0000-0001-5860-5000;0000-0003-0209-0997" "0000-0001-8693-5947" "" "0000-0002-4964-9762;0000-0002-1913-3239;0000-0002-2347-3885" ...
# $ FU     : chr  "" "" "" "" ...
# $ CR     : chr  "PUB.1026056866;PUB.1033294208;PUB.1022712617;PUB.1049702372;PUB.1008318289;PUB.1121499517;PUB.1120414045;PUB.11"| __truncated__ "PUB.1059655231;PUB.1069907290;PUB.1009741249;PUB.1028082114;PUB.1026178255;PUB.1023011736;PUB.1035913069;PUB.10"| __truncated__ "PUB.1067499492;PUB.1027037063;PUB.1026026695;PUB.1101848308;PUB.1037291488;PUB.1049249540;PUB.1041384155;PUB.10"| __truncated__ "PUB.1040547089;PUB.1009922941;PUB.1009413848;PUB.1109700874;PUB.1019443976;PUB.1048606603;PUB.1017000773;PUB.10"| __truncated__ ...
# $ ALT    : chr  "2" NA NA "1" ...
# $ TC     : num  1 0 0 1 2 3 0 0 0 0 ...
# $ TCR    : num  1 0 0 1 2 3 0 0 0 0 ...
# $ PU     : chr  "ELSEVIER" "ELSEVIER" "SPRINGER NATURE" "SPRINGER NATURE" ...
# $ SN     : chr  "2351-9894" NA NA NA ...
# $ J9     : chr  NA NA NA NA ...
# $ JI     : chr  NA NA NA NA ...
# $ PY     : num  1982 1982 1982 1982 1982 ...
# $ VL     : chr  "22" "154" "17" "28" ...
# $ IS     : chr  NA NA "1" "1" ...
# $ DI     : chr  "10.1016/j.gecco.2020.e00902" "10.1016/j.techfore.2020.119963" "10.1007/s12208-020-00243-6" "10.1007/s10100-018-0597-9" ...
# $ PG     : chr  "E00902" "119963" "77-96" "45-60" ...
# $ SC     : chr  "ENVIRONMENTAL SCIENCES; BIOLOGICAL SCIENCES; ECOLOGY; ENVIRONMENTAL SCIENCE AND MANAGEMENT" "TECHNOLOGY; ECONOMICS; COMMERCE, MANAGEMENT, TOURISM AND SERVICES" "MARKETING; COMMERCE, MANAGEMENT, TOURISM AND SERVICES" "COMMERCE, MANAGEMENT, TOURISM AND SERVICES; APPLIED MATHEMATICS; BUSINESS AND MANAGEMENT; NUMERICAL AND COMPUTA"| __truncated__ ...
# $ OA     : chr  "ALL OA" "CLOSED" "CLOSED" "CLOSED" ...
# $ URL    : chr  "https://doi.org/10.1016/j.gecco.2020.e00902" NA NA NA ...
# $ DB     : chr  "DIMENSIONS" "DIMENSIONS" "DIMENSIONS" "DIMENSIONS" ...
# $ AU_UN  : chr  "UNIVERSITY OF ALMERÍA;UNIVERSITY OF ALMERÍA;UNIVERSIDAD AUTÓNOMA DE CHILE;UNIVERSITY OF ALMERÍA" "DEPARTMENT OF MANAGEMENT STUDIES, INDIAN INSTITUTE OF TECHNOLOGY (IIT) DELHI, HAUZ KHAS, NEW DELHI - 110016, IN"| __truncated__ "POLYTECHNIC INSTITUTE OF CASTELO BRANCO AND NECE RESEARCH CENTER IN BUSINESS SCIENCES, LARGO DO MUNICÍPIO, 6060"| __truncated__ "UNIVERSITY OF THE BASQUE COUNTRY;UNIVERSITY OF THE BASQUE COUNTRY;UNIVERSITY OF THE BASQUE COUNTRY;UNIVERSITY O"| __truncated__ ...
# $ AU1_UN : chr  "UNIVERSITY OF ALMERÍA" "DEPARTMENT OF MANAGEMENT STUDIES, INDIAN INSTITUTE OF TECHNOLOGY (IIT) DELHI, HAUZ KHAS, NEW DELHI - 110016, INDIA" "POLYTECHNIC INSTITUTE OF CASTELO BRANCO AND NECE RESEARCH CENTER IN BUSINESS SCIENCES, LARGO DO MUNICÍPIO, 6060"| __truncated__ "UNIVERSITY OF THE BASQUE COUNTRY" ...
# $ AU_CO  : chr  "SPAIN;SPAIN;CHILE;SPAIN" "" "PORTUGAL;PORTUGAL;PORTUGAL" "SPAIN;SPAIN;SPAIN;SPAIN" ...
# $ AU1_CO : chr  "SPAIN" NA "PORTUGAL" "SPAIN" ...
# $ SR_FULL: chr  NA NA NA NA ...

 

An overview to the collection using bibliometrix

Now, we can use some bibliometrix functions to get an overview of the bibliographic collection.

bibliometrix is an R-tool for quantitative research in scientometrics and bibliometrics that includes all the main bibliometric methods of analysis (https://CRAN.R-project.org/package=bibliometrix, https://bibliometrix.org/, https://github.com/massimoaria/bibliometrix).

First, we install and load the bibliometrix package:

install.packages("bibliometrix")
library(bibliometrix)

 

Main information about the collection

Then, we use the biblioAnalysis and summary functions to perform a descriptive analysis of the data frame: Then, we add some metadata to the dimensions collection, and we use the biblioAnalysis and summary functions to perform a descriptive analysis of the data frame:

M <- convert2df(D, dbsource = "dimensions", format = "api")

results <- biblioAnalysis(M)
summary(results)

    # MAIN INFORMATION ABOUT DATA
    # 
    # Timespan                              1982 : 2020 
    # Sources (Journals, Books, etc)        568 
    # Documents                             1148 
    # Average years from publication        4.81 
    # Average citations per documents       19.54 
    # Average citations per year per doc    2.876 
    # References                            0 
    # 
    # DOCUMENT TYPES                     
    # article      1148 
    # 
    # DOCUMENT CONTENTS
    # Keywords Plus (ID)                    12874 
    # Author's Keywords (DE)                12874 
    #  
    # AUTHORS
    #  Authors                               2783 
    #  Author Appearances                    3538 
    #  Authors of single-authored documents  140 
    #  Authors of multi-authored documents   2643 
    #  
    # AUTHORS COLLABORATION
    #  Single-authored documents             148 
    #  Documents per Author                  0.413 
    #  Authors per Document                  2.42 
    #  Co-Authors per Documents              3.08 
    #  Collaboration Index                   2.66 
    #  
    # 
    # Annual Scientific Production
    # 
    #  Year    Articles
    #     1982        1
    #     1983        1
    #     1985        2
    #     1986        1
    #     1987        3
    #     1988        4
    #     1989        1
    #     1990        2
    #     1992        3
    #     1993        3
    #     1994        3
    #     1995        5
    #     1996        4
    #     1997        3
    #     1998        4
    #     1999       10
    #     2000        3
    #     2001        8
    #     2002        6
    #     2003        1
    #     2004        4
    #     2005       10
    #     2006       17
    #     2007       13
    #     2008       13
    #     2009       16
    #     2010       22
    #     2011       32
    #     2012       32
    #     2013       48
    #     2014       37
    #     2015       68
    #     2016       85
    #     2017      108
    #     2018      147
    #     2019      283
    #     2020      145
    # 
    # Annual Percentage Growth Rate 13.99298 
    # 
    # 
    # Most Productive Authors
    # 
    #    Authors        Articles Authors        Articles Fractionalized
    # 1     MERIGÓ JM         20    MERIGÓ JM                      5.43
    # 2     ZHANG Y           14    KUMAR S                        4.08
    # 3     KUMAR S           13    KOSTOFF RN                     3.69
    # 4     PORTER AL         11    ZHANG Y                        3.47
    # 5     KOSTOFF RN        10    MCMILLAN GS                    3.33
    # 6     WANG J            10    HO Y                           3.25
    # 7     CAPUTO A           9    VOGEL R                        3.17
    # 8     MARZI G            9    FERREIRA MP                    3.00
    # 9     ALON I             8    PASADEOS Y                     3.00
    # 10    FERREIRA MP        8    PORTER AL                      2.94
    # 
    # 
    # Top manuscripts per citations
    # 
    #                                                     Paper            TC TCperYear
    # 1  PARK SH, 1996, STRATEGIC MANAGEMENT JOURNAL                     1011     40.44
    # 2  RAMOS‐RODRÍGUEZ A, 2004, STRATEGIC MANAGEMENT JOURNAL            472     27.76
    # 3  DAIM TU, 2006, TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE       461     30.73
    # 4  FAHIMNIA B, 2015, INTERNATIONAL JOURNAL OF PRODUCTION ECONOMICS  368     61.33
    # 5  DE BAKKER FGA, 2005, BUSINESS & SOCIETY                          359     22.44
    # 6  KOSTOFF RN, 2001, IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT    351     17.55
    # 7  MURRAY F, 2002, RESEARCH POLICY                                  300     15.79
    # 8  ZUPIC I, 2015, ORGANIZATIONAL RESEARCH METHODS                   296     49.33
    # 9  MOED H, 1985, RESEARCH POLICY                                    283      7.86
    # 10 GALVAGNO M, 2014, MANAGING SERVICE QUALITY                       222     31.71
    # 
    # 
    # Corresponding Author's Countries
    # 
    # Country Articles   Freq SCP MCP MCP_Ratio
    # 1  UNITED STATES       116 0.1420  79  37     0.319
    # 2  CHINA               104 0.1273  57  47     0.452
    # 3  BRAZIL               79 0.0967  61  18     0.228
    # 4  SPAIN                69 0.0845  48  21     0.304
    # 5  UNITED KINGDOM       49 0.0600  31  18     0.367
    # 6  ITALY                47 0.0575  34  13     0.277
    # 7  AUSTRALIA            45 0.0551  22  23     0.511
    # 8  GERMANY              34 0.0416  25   9     0.265
    # 9  PORTUGAL             24 0.0294  16   8     0.333
    # 10 NETHERLANDS          22 0.0269  17   5     0.227
    # 
    # 
    # SCP: Single Country Publications
    # 
    # MCP: Multiple Country Publications
    # 
    # 
    # Total Citations per Country
    # 
    # Country      Total Citations Average Article Citations
    # 1  UNITED STATES                     4936                    42.552
    # 2  UNITED KINGDOM                    1654                    33.755
    # 3  SPAIN                             1618                    23.449
    # 4  CHINA                             1472                    14.154
    # 5  AUSTRALIA                         1416                    31.467
    # 6  NETHERLANDS                       1252                    56.909
    # 7  BRAZIL                             985                    12.468
    # 8  ITALY                              890                    18.936
    # 9  GERMANY                            662                    19.471
    # 10 FINLAND                            559                    50.818
    # 
    # 
    # Most Relevant Sources
    # 
    # Sources        Articles
    # 1  TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE           48
    # 2  RESEARCH POLICY                                       44
    # 3  JOURNAL OF BUSINESS RESEARCH                          24
    # 4  TECHNOLOGY ANALYSIS AND STRATEGIC MANAGEMENT          24
    # 5  TECHNOVATION                                          14
    # 6  INDUSTRIAL MARKETING MANAGEMENT                       11
    # 7  INTERNATIONAL JOURNAL OF HOSPITALITY MANAGEMENT       10
    # 8  THE JOURNAL OF TECHNOLOGY TRANSFER                    10
    # 9  IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT            9
    # 10 R AND D MANAGEMENT                                     9
    # 
    # 
    # Most Relevant Keywords
    # 
    # Author Keywords (DE)      Articles Keywords-Plus (ID)     Articles
    # 1      RESEARCH                   683  RESEARCH                   683
    # 2      ANALYSIS                   678  ANALYSIS                   678
    # 3      STUDY                      442  STUDY                      442
    # 4      PAPER                      428  PAPER                      428
    # 5      ARTICLE                    406  ARTICLE                    406
    # 6      BIBLIOMETRIC ANALYSIS      398  BIBLIOMETRIC ANALYSIS      398
    # 7      LITERATURE                 376  LITERATURE                 376
    # 8      JOURNALS                   320  JOURNALS                   320
    # 9      APPROACH                   307  APPROACH                   307
    # 10     AUTHORS                    299  AUTHORS                    299

"# dimensionsR"

dimensionsr's People

Contributors

massimoaria avatar natesheehan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

dimensionsr's Issues

Searching metadata for specific DOIs

Is there any way to search for metadata based on specific DOIs?

I tried this approach:

QUERY1 <- "10.1109/wfcs.2015.7160587 OR 10.1109/irps.2019.8720541 OR 10.1364/assl.2017.jtu2a.48 OR 10.1097/01.hjh.0000467419.26688.b0 OR 10.3139/147.110446 OR 10.1002/bapi.201610010 OR 10.1002/adma.202002525 OR 10.1515/ms-2017-0245 OR 10.1364/oe.26.009963 OR 10.1364/ol.40.002068"

token <- dsAuth(key = myapikey) # replace this with your API key
query1 <- dsQueryBuild(words = QUERY1)
D1 <- dsApiRequest(token = token, query = QUERY1, limit = 5000, verbose = TRUE)

... but that doesn't work.

At https://docs.dimensions.ai/dsl/language.html I can find no mention of DOIs at all. Does DSL simply does not support this function?

difficulty running query

New to DimensionsR. Trying to run the example code from [here] (https://cran.r-project.org/web/packages/dimensionsR/vignettes/A_Brief_Example.html) to test it out. The example code is below alongside this error code:

Error in res$total_count : $ operator is invalid for atomic vectors

Any help would be appreciated!

`#query dimensions
query <- dsQueryBuild(item = "publications",
words = "bibliometric*",
type = "article",
categories = "management",
start_year = 1980, end_year = 2023,
output_fields = "all")

query

#check effectiveness of query
res <- dsApiRequest(token = token, query = query, limit = 0)

res$total_count

#Error in res$total_count : $ operator is invalid for atomic vectors`

Missing data from API pulls after using dsApi2df

Hi there,

I work at Digital-Science (home of Dimensions) and I spent some time doing a few basic data pulls from the Dimensions API using your package. It works really well overall, but I noticed there are some fields that aren't coming through from dsApi2df when the API query includes fieldsets other than [all].
Fields coming through as "" (blank) are:
-AU
-AF
-SC
-CR
-AU_UN
-AU1_UN.
-AU_CO
-AU1_CO

There might be more than this, but these are the ones I noticed right away. I think it may have something to do with differences in the field names between the API results and what dsApi2df is expecting.

Here is the query I used that dsApi2df had a problem with:

'search publications 
in title_abstract_only for \"\\\"nano*\\\" AND \\\"drug delivery\\\"\" 
where year in [2010:2020] return publications[basics+extras]'

Running the same query with return publications[all] does not have any issues, though.

Let me know if you have any questions for us. I'd be happy to try to answer them myself or connect you with someone from our API team.

Cheers,

Sam

Obligatory session info:

R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.4

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] dimensionsR_0.0.1 forcats_0.4.0     stringr_1.4.0     dplyr_0.8.3       purrr_0.3.3       readr_1.3.1      
 [7] tidyr_1.0.0       tibble_2.1.3      ggplot2_3.2.1     tidyverse_1.2.1  

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.2       cellranger_1.1.0 pillar_1.4.2     compiler_3.6.1   tools_3.6.1      zeallot_0.1.0    lubridate_1.7.4 
 [8] jsonlite_1.6     lifecycle_0.1.0  nlme_3.1-140     gtable_0.3.0     lattice_0.20-38  pkgconfig_2.0.3  rlang_0.4.1     
[15] cli_1.1.0        rstudioapi_0.10  curl_4.2         haven_2.1.1      xfun_0.10        withr_2.1.2      xml2_1.2.2      
[22] httr_1.4.1       knitr_1.25       generics_0.0.2   vctrs_0.2.0      hms_0.5.1        grid_3.6.1       tidyselect_0.2.5
[29] glue_1.3.1       R6_2.4.0         readxl_1.3.1     modelr_0.1.5     magrittr_1.5     backports_1.1.5  scales_1.0.0    
[36] rvest_0.3.5      assertthat_0.2.1 colorspace_1.4-1 stringi_1.4.3    lazyeval_0.2.2   munsell_0.5.0    broom_0.5.2     
[43] crayon_1.3.4 

ID missing in pub2df --> dsApi2df

Dear developers,

Thank you so much for this great library! I just wanted to let you know that I noticed that id (a["id"]) is a missing field in the pub2df function.

Best
Philipp Baaden

OA column returns NA values

When fetching the basics data fields the Open Access column always returns NA's.

I have looked at the response and have a fix for this.

make dsApi2df more flexibel?

Hey,

I have found that using dsApi2df depends on a standard return field (i.e. containing the standard information). It would be great if dsApi2df was more flexible in that it could convert any publications return object (or other entity) to a df.

E.g.
search publications return publications
-> can be converted to a df

search publications return publications[id+doi+research_orgs]
-> does not work

Returning author data?

I am not sure how to access certain author data using this package. 1) how do I change a search "in title_abstract_only" to "in authors"? 2) How can I return Dimensions author ID numbers of the form "ur.[numeric string]"? I can only return publication ID numbers of the form "PUB.[numeric string]." Thank you!

formatting for dsApiRequest query not built with dsQueryBuild

Hi Massimo,

I recently submitted an issue for bibliometrix regarding difficulty uploading a dimensions csv file. As a workaround, I wanted to re-run the query myself using dimensionsR (to hopefully avoid the formatting issues with the existing files). However, my query asks for things that appear to be outside the current functionality of either dsQueryBuild or biblioshiny (specifically, searching for all publications from a given research org). I see that in #3 you mention that dsApiRequest can be used to run queries not built with dsQueryBuild. However, since the dsApiRequest documentation only gives an example using dsQueryBuild, I'm not sure about any special formatting or other requirements.

The query I ran in Dimensions is:

search publications where type != "preprint" and year in [2022 : 2023]
and research_orgs = "grid.25879.31" return publications[title + type +
doi + journal + publisher + date + year + open_access + authors + researchers
+ research_orgs + funders + category_for + category_sdg + altmetric + times_cited]

Do I simply assign this (in quotes) as query and pass it into dsApiRequest that way? Are any additional formatting or other changes required?

Thanks,
Kelly

dsApiRequest: make skip argument optional (error with facets)

When querying for facets, offset parameters, i.e. "skip" are not supported. However, dsApiRequest automatically adds skip.

E.g.
search publications where research_orgs in ["grid.10420.37"] and year>2021 return year limit 1000 skip 100
-> Semantic errors found: Offset is not supported for facet results

search publications where research_orgs in ["grid.10420.37"] and year>2021 return year limit 1000
-> works

Request: Optional arguments for other API endpoints

Hi Massimo,

We have other instances of the Dimensions API that we would like to access through dimensionsR. Could you add an optional argument to the necessary functions to specify endpoints other than app.dimensions.ai?

Thanks,

Sam

Possibility of using facets?

Dear Massimo

I love all the resources you/your group provide for bibliometric analyses - what a great resource. I am trying to filter my outputs using other fields in particular the Fields of Research codes in Dimensions but have not succeeded in returning this data from an API query.

The category_for is the Fields of Research codes is available for filtering the data on the online GUI, but I haven't figured out how to incorporate it into the query easily from R using the API. See more about the use of these codes - its related to concepts but seems to be more detailed. If the link doesn't work the title is "Recategorising research: Mapping from FoR 2008 to FoR 2020 in Dimensions"

The category_for field seems to be available as a facet (does that mean a one to many relationship per publication? I think so).

Can you figure out how to add facets to your query functions? I am struggling to get the escapes and quotations right.

Cheers

Chris Buddenhagen

Question about "categories" in query

Hi, thanks for the helpful package. I'd like to better understand how "category" in the dsQueryBuild command corresponds to the Field of Research in the Dimensions data. If, for example, we would like to pull all the articles for field of research == "46 Information and computing sciences" for a particular time range, is there a way to achieve that using this R package? I tried doing this and the number of articles obtained does not seem to match the number I get by specifying a field of research on the Dimensions website. Thank you!

dsApi2df for extracting more columns

Hello,
Thank you for this package. The dsApi2df function only returns selected columns. However, I have more nested data about funder information (funders, funder_countries) for clinical trial IDs. Could the function be adapted for extracting these data frames too.

Thank you again.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.