michalovadek / eurlex Goto Github PK

View Code? Open in Web Editor NEW

29.0 4.0 3.0 12.11 MB

An R package for retrieving official data on European Union laws and policies

Home Page: https://michalovadek.github.io/eurlex/

R 100.00%

courts eurlex european-union law legislation sparql

eurlex's People

Stargazers

Watchers

Forkers

kakiac local-maxima michelemaroni

eurlex's Issues

Error in curl::curl_fetch_memory(url, handle = handle): Could not resolve host: 32001E0555

I am suddenly getting a weird problem after using the eurlex package for a few months. When I run the elx_fetch_data() function, I get the following error:

Error in curl::curl_fetch_memory(url, handle = handle): Could not resolve host: 32001E0555

Reprex:

# Load package
library(eurlex)

# Run function
elx_fetch_data("32001E0555", "title")

Thanks for all your work. Amazing package.

On another note, is it possible to gather the CELEX IDs for all acts in a given directory coder, e.g. CC = 18 which is Common Foreign and Security Policy, via REST instead of SPARQL?
Directory code CC = 18 is Common Foreign and Security Policy.

strip URIs by default

returning the URI for each resource is wasteful

SPARQL query by directory code (CC)

Like the EUR-Lex expert search, is it possible to add a directory code (CC) argument to the elx_make_query() function?

This would be incredibly useful for finding all legal acts in a larger policy area. For example, in tracking a country's EU defence policy, you would need to find all acts relating to Common Foreign and Security Policy (CC = 18).

On the expert search function of the EUR-Lex website, you are able to find EU legal acts by directory code, which is very useful for finding acts within larger areas, e.g. Common Foreign and Security Policy (CC = 18). I have attached a screenshot of this below.

EUR-Lex Expert Search: https://eur-lex.europa.eu/expert-search-form.html

add option to retrieve subject matter data

might be problematic to find a general option due to different resource types having different (often several) subject matter descriptors

subject matter

particularly relevant for CJEU decisions

Specify several resource types at once

Not sure if possible without a clumsy work around given the limitations of the underlying SPARQL syntax but worth looking into

suggestions for improving elx_dowload_xml and make query

Hi Michal,

Thanks for releasing v0.4.0, I updated R and eurlex and i am using it.
I recently used elx_dowload_xml and I wanted to suggest some improvements:

line 28 should likely be : notice type must be correctly specified" = notice %in% c("tree", "branch", "object")) (this is more of an issue)
file = basename(url) could be file = paste(basename(url), ".xml)"
With the current settings when object is passed to notice the object expression notice is retrieved (p 44 of cellar), however this does not contain metadata. I'd suggest to drop the language header and use ?language= a the end of the url when object is passed (p 42 of cellar), so that the object notice with the object metadata is retrieved.
elx_dowload_xml could encapsulate a function that returns the xml notice as a string. So a user could decide wether to directly dowload the xml notice, or to get the xml notice as a string an parse it to get other fields and complement the make_query and run_query functions.
About elx_make_query, you remember that there was the issue of the 10e6 limit? A workaraound/improvement could be to group together multiple items of the same property of a work. e.g. if i pass include_authors = TRUE, it could help to use (group_concat(distinct ?author_;separator=", ") as ?author) in the select statement and OPTIONAL{?work cdm:work_created_by_agent ?author_.} in the where statement of the sparql query. The uri would still be inside, but i see this less of an issue to clean it afterwards. This would help in not having duplicated works when running queries.

What do you think about theese?

All the best

maximum 200 EuroVoc URIs can be labelled in one call

elx_label_eurovoc is bugged because it currently doesn't take into account a server-imposed limit of 200 URIs

directories prefLabel

return prefLabel or equivalent when include_directory = TRUE

return title via SPARQL query

it should be possible to return document titles via SPARQL queries, but need to move from WORK to EXPRESSION (language)

Council votes on legislative acts

Is it also possible to access the Council votes on legislative acts?

Query result are limited to 10e6

Dear Michal,

Thank you for developing such a useful package, writing useful and clear documentation, and also congratulation for the very interesting article published in Political Research Exchange.

I tried you package and a noticed that when I run a large query the results are limited to 10e6 rows. Is there a way to resolve this limit?

A reproducible example is provided here:
"
library(eurlex)
library(dplyr)
library(ggplot2)
legal <- elx_make_query(resource_type = "any", sector = 3,
include_celex = TRUE, include_force = TRUE,
include_date = TRUE, include_date_force = TRUE,
include_date_endvalid = TRUE, include_eurovoc = TRUE,
include_directory = TRUE, include_citations = TRUE) %>%
elx_run_query()
preparatory <- elx_make_query(resource_type = "any", sector = 5,
include_celex = TRUE, include_date = TRUE,
include_eurovoc = TRUE, include_directory = TRUE,
include_citations = TRUE) %>%
elx_run_query()
dat <- as_tibble(data.frame(X=rep(0, 16000000),y=rep(0, 16000000),z=rep(0, 16000000)))
"

I provide you also with the sessionInfo output
"
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252 LC_MONETARY=Italian_Italy.1252
[4] LC_NUMERIC=C LC_TIME=Italian_Italy.1252

  attached base packages:
  [1] stats     graphics  grDevices utils     datasets  methods   base     
  
  other attached packages:
  [1] ggplot2_3.3.3        dplyr_1.0.5          eurlex_0.3.5         RevoUtils_11.0.2     RevoUtilsMath_11.0.0
  
  loaded via a namespace (and not attached):
   [1] rstudioapi_0.11  xml2_1.3.2       magrittr_1.5     tidyselect_1.1.0 munsell_0.5.0    colorspace_2.0-0
   [7] R6_2.4.1         rlang_0.4.10     httr_1.4.1       tools_4.0.2      grid_4.0.2       gtable_0.3.0    
  [13] withr_2.2.0      ellipsis_0.3.1   digest_0.6.25    tibble_3.0.2     lifecycle_1.0.0  crayon_1.3.4    
  [19] farver_2.1.0     tidyr_1.1.3      purrr_0.3.4      vctrs_0.3.7      curl_4.3         glue_1.4.1      
  [25] compiler_4.0.2   pillar_1.4.6     generics_0.1.0   scales_1.1.1     pkgconfig_2.0.3

Another very useful feature would the possibility to define a start date and an end date for the query.

Thank you once again.

Best regards.

include_author = TRUE should return prefLabel (or similar) instead of URI

add filter by year to query options

add option to download pdf files

Proposal for enchancements of elx_fetch_data

Dear Michal,

I use eurlex package on a regular basis to extract EU policy documents
with the purspose of mapping of terms related to UN 2030 SDGs in theese EU policy documents.
I use elx_fetch_data to batch dowload the raw text of the documents.

I would like to propose tow enhancements for this function:

Rather than return just the 'out' for the type of resource requested, the function could return the 'out' and the code of the HTTP request for a resource type. The function returns a named list where the first element is 'out' and the second the HTTP code.
This would allow to easily check if a resource was not retrieved, which is useful when dealing with a large number of documents.
Insert the document XML notice among the resource type option.
This could be an useful and efficient way to get a plethora of information for each document. The xml could be then parsed locally to extract data of interest like directory codes, the subject matter, the instruments cited, related documents etc.
In many cases it might be easier and faster to work with the XML notice than to develop and run a (complex) SPARQL query.
For an easier implementation, this option could ignore the language paramenters, so that one would get the document xml notice the same way is obtained from eurlex.

What do you think about theese enhancements, would it be difficult to implement them?

Once again, many thanks for developing and releasing such a useful and easy to use package.

Many thanks and have a nice day.

Best

add more date options (transposition deadline, end of validity)

alternative identifiers

provide options for alternative identifiers, in particular Official Journal number. Many documents are not CELEX indexed (especially preparatory, eg COM proposals, sector 5 more generally)

"summary" support for elx_fetch_data

First of all, thanks for this great project.

I wonder if there is any way of getting summary of regulation documents using elx_fetch_data method?

Error in read_xml.raw(charToRaw(enc2utf8(x)), "UTF-8", ..., as_html = as_html, : Space required after the Public Identifier [65]

Mysterious error which appears and disappears every other call to elx_run_query

event data

in latest iterations of Eur-Lex there seems to be an increasing focus on event data. It would be useful to be able to retrieve these, but likely to require a completely new function and type of SPARQL queries

add automatic detection and URL encoding to elx_fetch_data to deal with "()"

E.g.

paste("http://publications.europa.eu/resource/celex/", "32019H1115(01)", sep = "") %>% 
  str_replace("\\(","%28") %>% 
  str_replace("\\)","%29") %>% 
  elx_fetch_data(.,"title")

michalovadek / eurlex Goto Github PK

eurlex's People

Stargazers

Watchers

Forkers

eurlex's Issues

Recommend Projects

Recommend Topics

Recommend Org