michalovadek / eurlex Goto Github PK
View Code? Open in Web Editor NEWAn R package for retrieving official data on European Union laws and policies
Home Page: https://michalovadek.github.io/eurlex/
An R package for retrieving official data on European Union laws and policies
Home Page: https://michalovadek.github.io/eurlex/
I am suddenly getting a weird problem after using the eurlex
package for a few months. When I run the elx_fetch_data()
function, I get the following error:
Error in curl::curl_fetch_memory(url, handle = handle): Could not resolve host: 32001E0555
Reprex:
# Load package
library(eurlex)
# Run function
elx_fetch_data("32001E0555", "title")
Thanks for all your work. Amazing package.
On another note, is it possible to gather the CELEX IDs for all acts in a given directory coder, e.g. CC = 18 which is Common Foreign and Security Policy, via REST instead of SPARQL?
Directory code CC = 18 is Common Foreign and Security Policy.
returning the URI for each resource is wasteful
Like the EUR-Lex
expert search, is it possible to add a directory code (CC)
argument to the elx_make_query()
function?
This would be incredibly useful for finding all legal acts in a larger policy area. For example, in tracking a country's EU defence policy, you would need to find all acts relating to Common Foreign and Security Policy (CC = 18
).
On the expert search function of the EUR-Lex
website, you are able to find EU legal acts by directory code, which is very useful for finding acts within larger areas, e.g. Common Foreign and Security Policy (CC = 18). I have attached a screenshot of this below.
EUR-Lex
Expert Search: https://eur-lex.europa.eu/expert-search-form.html
might be problematic to find a general option due to different resource types having different (often several) subject matter descriptors
Not sure if possible without a clumsy work around given the limitations of the underlying SPARQL syntax but worth looking into
Hi Michal,
Thanks for releasing v0.4.0, I updated R and eurlex and i am using it.
I recently used elx_dowload_xml and I wanted to suggest some improvements:
notice type must be correctly specified" = notice %in% c("tree", "branch", "object"))
(this is more of an issue)file = basename(url)
could be file = paste(basename(url), ".xml)"
object
is passed to notice the object expression notice is retrieved (p 44 of cellar), however this does not contain metadata. I'd suggest to drop the language header and use ?language=
a the end of the url when object
is passed (p 42 of cellar), so that the object notice with the object metadata is retrieved.include_authors = TRUE
, it could help to use (group_concat(distinct ?author_;separator=", ") as ?author)
in the select
statement and OPTIONAL{?work cdm:work_created_by_agent ?author_.}
in the where
statement of the sparql query. The uri would still be inside, but i see this less of an issue to clean it afterwards. This would help in not having duplicated works when running queries.What do you think about theese?
All the best
elx_label_eurovoc is bugged because it currently doesn't take into account a server-imposed limit of 200 URIs
return prefLabel or equivalent when include_directory = TRUE
it should be possible to return document titles via SPARQL queries, but need to move from WORK to EXPRESSION (language)
Is it also possible to access the Council votes on legislative acts?
Dear Michal,
Thank you for developing such a useful package, writing useful and clear documentation, and also congratulation for the very interesting article published in Political Research Exchange.
I tried you package and a noticed that when I run a large query the results are limited to 10e6 rows. Is there a way to resolve this limit?
A reproducible example is provided here:
"
library(eurlex)
library(dplyr)
library(ggplot2)
legal <- elx_make_query(resource_type = "any", sector = 3,
include_celex = TRUE, include_force = TRUE,
include_date = TRUE, include_date_force = TRUE,
include_date_endvalid = TRUE, include_eurovoc = TRUE,
include_directory = TRUE, include_citations = TRUE) %>%
elx_run_query()
preparatory <- elx_make_query(resource_type = "any", sector = 5,
include_celex = TRUE, include_date = TRUE,
include_eurovoc = TRUE, include_directory = TRUE,
include_citations = TRUE) %>%
elx_run_query()
dat <- as_tibble(data.frame(X=rep(0, 16000000),y=rep(0, 16000000),z=rep(0, 16000000)))
"
I provide you also with the sessionInfo output
"
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252 LC_MONETARY=Italian_Italy.1252
[4] LC_NUMERIC=C LC_TIME=Italian_Italy.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_3.3.3 dplyr_1.0.5 eurlex_0.3.5 RevoUtils_11.0.2 RevoUtilsMath_11.0.0
loaded via a namespace (and not attached):
[1] rstudioapi_0.11 xml2_1.3.2 magrittr_1.5 tidyselect_1.1.0 munsell_0.5.0 colorspace_2.0-0
[7] R6_2.4.1 rlang_0.4.10 httr_1.4.1 tools_4.0.2 grid_4.0.2 gtable_0.3.0
[13] withr_2.2.0 ellipsis_0.3.1 digest_0.6.25 tibble_3.0.2 lifecycle_1.0.0 crayon_1.3.4
[19] farver_2.1.0 tidyr_1.1.3 purrr_0.3.4 vctrs_0.3.7 curl_4.3 glue_1.4.1
[25] compiler_4.0.2 pillar_1.4.6 generics_0.1.0 scales_1.1.1 pkgconfig_2.0.3
"
Another very useful feature would the possibility to define a start date and an end date for the query.
Thank you once again.
Best regards.
Dear Michal,
I use eurlex package on a regular basis to extract EU policy documents
with the purspose of mapping of terms related to UN 2030 SDGs in theese EU policy documents.
I use elx_fetch_data to batch dowload the raw text of the documents.
I would like to propose tow enhancements for this function:
Rather than return just the 'out' for the type of resource requested, the function could return the 'out' and the code of the HTTP request for a resource type. The function returns a named list where the first element is 'out' and the second the HTTP code.
This would allow to easily check if a resource was not retrieved, which is useful when dealing with a large number of documents.
Insert the document XML notice among the resource type option.
This could be an useful and efficient way to get a plethora of information for each document. The xml could be then parsed locally to extract data of interest like directory codes, the subject matter, the instruments cited, related documents etc.
In many cases it might be easier and faster to work with the XML notice than to develop and run a (complex) SPARQL query.
For an easier implementation, this option could ignore the language paramenters, so that one would get the document xml notice the same way is obtained from eurlex.
What do you think about theese enhancements, would it be difficult to implement them?
Once again, many thanks for developing and releasing such a useful and easy to use package.
Many thanks and have a nice day.
Best
provide options for alternative identifiers, in particular Official Journal number. Many documents are not CELEX indexed (especially preparatory, eg COM proposals, sector 5 more generally)
First of all, thanks for this great project.
I wonder if there is any way of getting summary of regulation documents using elx_fetch_data method?
Mysterious error which appears and disappears every other call to elx_run_query
in latest iterations of Eur-Lex there seems to be an increasing focus on event data. It would be useful to be able to retrieve these, but likely to require a completely new function and type of SPARQL queries
E.g.
paste("http://publications.europa.eu/resource/celex/", "32019H1115(01)", sep = "") %>%
str_replace("\\(","%28") %>%
str_replace("\\)","%29") %>%
elx_fetch_data(.,"title")
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.