Giter Site home page Giter Site logo

hublotr's Introduction

hublot

R-CMD-check

Package d'accès à clHub.

Installation

To install the latest stable version of this package, run the following line in your R console:

devtools::install_github("clessn/hublotr")

Set your credentials in .Renviron

  1. Exécuter usethis::edit_r_environ() sur la ligne de commande R
  2. Ajouter les lignes suivantes dans .Renviron:
# .Renviron
HUB3_URL      = "https://hublot.clessn.cloud/"
HUB3_USERNAME = "mon.nom.usager"
HUB3_PASSWORD = "mon.mdp"
  1. Redémarrer R (Session -> Restart R, ou Ctrl+Shift+F10)
  2. Taper dans la console R Sys.getenv("HUB3_USERNAME") pour confirmer que votre username apparait
  3. Terminé!

Get your credentials from .Renviron

credentials <- hublot::get_credentials(
            Sys.getenv("HUB3_URL"), 
            Sys.getenv("HUB3_USERNAME"), 
            Sys.getenv("HUB3_PASSWORD")
            )

Snippets

Valider si nous avons la dernière version

# Valider si nous avons la dernière version, sinon lève une erreur
hublot::check_version()

# Pour simplement lever un avertissement
hublot::check_version(warn_only = T)

Entrer ses informations de login

Entrer ses informations de login, qu'on stocke dans un objet "credentials":

credentials <- hublot::get_credentials("https://clhub.clessn.cloud/")

On nous demandera notre username et password en ligne de commande ou dans une fenêtre si sur RStudio. Alternativement, on peut passer les valeurs directement:

N.B.: NE PAS LAISSER VOTRE NOM D'UTILISATEUR OU MOT DE PASSE DANS UN PROJET GIT)

credentials <- hublot::get_credentials("https://clhub.clessn.cloud/", "admin", "motdepasse")

N.B.: UN MAUVAIS NOM D'UTILISATEUR OU DE MOT DE PASSE NE SERA PAS RAPPORTÉ AVANT UNE PREMIÈRE UTILISATION DE FONCTION.

C'est utile si les informations de connexion sont dans une variable d'environnement, qu'on peut alors récupérer comme suit:

username <- Sys.getenv("hublot_USERNAME")

Les fonctions

hublot::list_tables(credentials) # retourne la liste des tables
# avec le package tidyjson, on peut convertir ces listes de listes en tibble
tables <- tidyjson::spread_all(hublot::list_tables(credentials))

# admettons que j'ai sélectionné une table et je veux y extraire des données
my_table <- "clhub_tables_test_table"
hublot::count_table_items(my_table, credentials) # le nombre total d'éléments dans la table
# les éléments d'une table sont paginés, généralement à coup de 1000. Pour récupérer tous les éléments, on doit demander les données suivantes. On commence par une page, puis on demande une autre, jusqu'à ce que la page soit NULL

page <- hublot::list_table_items(my_table, credentials) # on récupère la première page et les informations pour les apges suivantes
data <- list() # on crée une liste vide pour contenir les données
repeat {
    data <- c(data, page$results)
    page <- hublot::list_next(page, credentials)
    if (is.null(page)) {
        break
    }
}
Dataframe <- tidyjson::spread_all(data) # on convertir maintenant les données en tibble

Télécharger un subset des données grâce au filtrage

Les fonctions pertinentes:

hublot::filter_table_items(table_name, credentials, filter)
hublot::filter_next(page, credentials)
hublot::filter_previous(page, credentials)

Un filtre est une liste nommée d'une certaine façon qui détermine la structure de la requête SQL Un filtre hublot est basé sur le Queryset field lookup API de django Il est re commandé de télécharger une page ou un élément et d'en observer la structure avant de créer un filtre.

Voici quelques exemples. Notez q'un lookup sépare la colonne par deux underscore __

my_filter <- list(
    id=27,
    key__exact="potato",
    key__iexact="PoTaTo",
    key__contains = "pota",
    key__contains="POTA",
    key__in=c("potato", "tomato"),
    timestamps__gte="2020-01-01",
    timestamps__lte="2020-01-31",
    timestamps__gt="2020-01-01",
    timestamps__lt="2020-01-31",
    timestamps__range=c("2020-01-01", "2020-01-31"), # non testé
    timestamps__year=2020, # non testé
    timestamps__month=1, # non testé
    timestamps__day=1, # non testé
    timestamps__week_day=1, # non testé (1=dimanche, 7=samedi)
    key__regex="^potato", # non testé
)

Ajouter un élément dans une table

hublot::add_table_item(table_name,
        body = list(
            key = key,
            timestamps <- "2020-01-01",
            data = jsonlite::toJSON(
                list(type = "potato", kind = "vegetable"), # stockage de json par des listes (nommées pour dict, non nommées pour arrays)
                auto_unbox = T # très important, sinon les valeurs json seront stockées comme liste d'un objet (ie. {"type": ["potato"], "kind": ["vegetable"]})
            )
        ),
        credentials
    )

Obtenir les tables de l'entrepôt vs. celles de datamarts

marts <- tidyjson::spread_all(
    hublot::filter_tables(credentials,
        list(metadata__contains=list(type="mart"))
    )
)

warehouses <- tidyjson::spread_all(
    hublot::filter_tables(credentials,
        list(metadata__contains=list(type="warehouse"))
    )
)

Upload a file

To upload a file, endpoints work a bit differently. You need to convert the json yourself (in this example, the metadata):

hublot::add_lake_item(body = list(
    key = "mylakeitem",
    path = "test/items",
    file = httr::upload_file("test_upload.txt"),
    metadata = jsonlite::toJSON(list(type = "text"), auto_unbox = T)
), credentials)

Read a file

To read a file (for example a dictionary)

file_info <- hublot::retrieve_file("dictionnaire_LexicoderFR-enjeux", credentials)
dict <- read.csv(file_info$file)

Pour les logs

hublot::log(app_id, "info", "Starting...", credentials)
hublot::log(app_id, "debug", "test123", credentials)
hublot::log(app_id, "warning", "this might be a problem later", credentials)
hublot::log(app_id, "error", "something went wrong", credentials)
hublot::log(app_id, "critical", "something went terribly wrong", credentials)
hublot::log(app_id, "success", "good! everything worked!", credentials)

hublotr's People

Contributors

judith-bourque avatar olichose123 avatar p2xcode avatar

Stargazers

Hugues-Étienne Moisan-Plante avatar  avatar

Watchers

Yannick Dufresne avatar

hublotr's Issues

Prettify R source code

Issue

Not all code and documentation is indented and spaced in a similar way. It makes it hard to read here and there.

Proposed solution

Instead of doing the changes manually, prettify the source code with a function:

  • styler::style_pkg()
  • Validate changes

Document functions ‘count’ ‘filter_journals’ ‘filter_logs’ ‘handle_response’ as internal and not an export

Issue

── R CMD check results ────────────────────────────────────────────────────────────────────────────────────── hublot 1.6.0 ────
Duration: 6.2s

❯ checking for missing documentation entries ... WARNING
  Undocumented code objects:
    ‘count’ ‘filter_journals’ ‘filter_logs’ ‘handle_response’
  All user-level objects in a package should have documentation entries.
  See chapter ‘Writing R documentation files’ in the ‘Writing R
  Extensions’ manual.

Proposed solution

Document functions ‘count’ ‘filter_journals’ ‘filter_logs’ ‘handle_response’ as internal and not an export with @keywords internal

Fix warnings and notes from R CMD check results

Issue

When I run devtools::check(), I get 3 warnings and 3 notes. 🥳

── R CMD check results ────────────────────────────────────────────────────────────────────────────────────── hublot 1.6.0 ────
Duration: 6.2s

❯ checking dependencies in R code ... WARNING
  '::' or ':::' import not declared from: ‘stringr’
  Namespace in Imports field not imported from: ‘tidyjson’
    All declared Imports should be used.

❯ checking for missing documentation entries ... WARNING
  Undocumented code objects:
    ‘count’ ‘filter_journals’ ‘filter_logs’ ‘handle_response’
  All user-level objects in a package should have documentation entries.
  See chapter ‘Writing R documentation files’ in the ‘Writing R
  Extensions’ manual.

❯ checking Rd \usage sections ... WARNING
  Undocumented arguments in documentation object 'add_file'
    ‘id’ ‘body’ ‘credentials’
  
  Undocumented arguments in documentation object 'add_lake_item'
    ‘body’ ‘credentials’
  
# ...
  
  Undocumented arguments in documentation object 'update_tag'
    ‘id’ ‘body’ ‘credentials’
  
  Functions with \usage entries need to have the appropriate \alias
  entries, and all their arguments documented.
  The \usage entries must correspond to syntactically valid R code.
  See chapter ‘Writing R documentation files’ in the ‘Writing R
  Extensions’ manual.

❯ checking DESCRIPTION meta-information ... NOTE
  Malformed Description field: should contain one or more complete sentences.

❯ checking top-level files ... NOTE
  Non-standard file/directory found at top level:
    ‘playzone.R’

❯ checking R code for possible problems ... NOTE
  check_version: no visible global function definition for
    ‘packageVersion’
  Undefined global functions or variables:
    packageVersion
  Consider adding
    importFrom("utils", "packageVersion")
  to your NAMESPACE file.

0 errors ✔ | 3 warnings ✖ | 3 notes ✖

Proposed solutions

Warnings

Notes

Some are related to #34, #39

Document arguments in most functions

Issue

── R CMD check results ────────────────────────────────────────────────────────────────────────────────────── hublot 1.6.0 ────
Duration: 6.2s

❯ checking Rd \usage sections ... WARNING
  Undocumented arguments in documentation object 'add_file'
    ‘id’ ‘body’ ‘credentials’
  
  Undocumented arguments in documentation object 'add_lake_item'
    ‘body’ ‘credentials’
  
# ...
  
  Undocumented arguments in documentation object 'update_tag'
    ‘id’ ‘body’ ‘credentials’
  
  Functions with \usage entries need to have the appropriate \alias
  entries, and all their arguments documented.
  The \usage entries must correspond to syntactically valid R code.
  See chapter ‘Writing R documentation files’ in the ‘Writing R
  Extensions’ manual.

Proposed solution

Namespace in Imports field not imported from: ‘tidyjson’

Issue

── R CMD check results ─────────────────────────────────────────────────────────────────────── hublot 1.6.5 ────
Duration: 9.3s

❯ checking dependencies in R code ... WARNING
  Namespace in Imports field not imported from: ‘tidyjson’
    All declared Imports should be used.

tidyjson is not used in the package. It is only used in the README examples:

tables <- tidyjson::spread_all(hublot::list_tables(credentials))

It is very useful to interpret the data from hublot, but should be included in the examples, not in the imports.

Proposed solution

Remove tidyjson from Imports in DESCRIPTION

Declare imports from stringr

Issue

── R CMD check results ────────────────────────────────────────────────────────────────────────────────────── hublot 1.6.0 ────
Duration: 6.2s

❯ checking dependencies in R code ... WARNING
  '::' or ':::' import not declared from: ‘stringr’

Besoin que la fonction hublot::update_file soit implémentée

Pour permettre aux chercheurs d'importer des 'persons' dans hub2 ou hublot à partir d'un fichier. J'ai besoin que la fonction hublot::update_file soit implémentée à tout le moins pour pouvoir changer une métadonnée d'un file.

Cette métadonnées c'est "imported". Si imported=true alors le chargeur va ignorer le file. Si imported=false alors le chargeur va prendre le file et importer son contenu dans les 'persons' de hub2 ou hublot (dépendamment d'une autre métadonnée).

Quand le scraper va rouler, il va mettre la imported <- TRUE. De cette façon quand il roulera la prochaine foie il va l'ignorer à moins qu'un chercheur update le contenu du file et remette imported <- FALSE.

Merci

Cannot upload audio file

When trying to upload lake item and attaching audio file (see screenshot)
Capture d’écran, le 2022-09-21 à 14 10 30

output.wav is 476MB
Capture d’écran, le 2022-09-21 à 14 11 25

It gives this error message.
Capture d’écran, le 2022-09-21 à 14 07 55

Generate NAMESPACE using roxygen2

Issue

Every time I run devtools::document(), I get the following warning message:

> document()
ℹ Updating hublot documentation
ℹ Loading hublot
Warning message:
Skipping NAMESPACE
✖ It already exists and was not generated by roxygen2. 

Namespace currently forces ALL functions to be exported, which means that internal functions are exported:

exportPattern("^[[:alpha:]]+")

Proposed solution

In R Packages, it's indicated that exportPattern(): exports all functions that match a pattern. We feel it’s safer to always use explicit exports and we avoid the use of this directive.

In the devtools workflow, the NAMESPACE file is not written by hand! Instead, we prefer to generate NAMESPACE with the roxygen2 package, using specific tags located in a roxygen comment above each function’s definition in the R/*.R files.

To follow best practices, it is suggested to remove the content of NAMESPACE and use devtools::document() to recreate it.

Potential dependency issue

With the current code, all functions are technically exports. If the proposed solution is applied, only the tagged exports will be exported. This could result in breaks with current dependencies. Then again, functions that aren't tagged for exports should only be for internal use.

This issue could be avoided by either

  • Validating that all untagged functions are indeed exported
  • Adding export tags to ALL functions
  • Checking for dependencies in current CLESSN repos

And if one's feeling particularly brave, may I suggest:

  • Messing around and finding out 😎

Reuse existing parameter description

Issue

#61

Proposed solution

To reuse arguments:

  • Use @inheritParams <package::function> to re-use documentation from a function in a separate package.
  • @inheritParams ${1:source}: Inherit argument documentation from another function. Only inherits documentation for arguments that aren't already documented locally. (source)

Create Rd files

Issue

Documentation for code can't be accessed by users using ?hublot::get_credentials.

Proposed solution

Produce the .Rd files using devtools::document()

Correct a parameter

Issue

When I ran devtools::document() in #25, I got the following warning:

Warning messages:
1: [util.R:229] @cursor is not a known tag 
2: Skipping NAMESPACEIt already exists and was not generated by roxygen2.

@cursor is found here:

#' @cursor Optional. The cursor to use for pagination. Defaults to NULL

Proposed solution

Rewrite it as a proper parameter:

#' @param cursor Optional. The cursor to use for pagination. Defaults to NULL

R version and package versions not up to date

Issue

When I opened the Rproject for branch #25, I got the following error message:

# Bootstrapping renv 0.15.4 --------------------------------------------------
* Downloading renv 0.15.4 ... OK
* Installing renv 0.15.4 ... Done!
* Successfully installed and loaded renv 0.15.4.
* Project '~/Dropbox/github/clessn/hublotr' loaded. [renv 0.15.4]
Warning message:
This project is configured to use R version '3.6.3', but '4.2.2' is currently being used. 
* The project library is out of sync with the lockfile.
* Use `renv::restore()` to install packages recorded in the lockfile.

With further inspection, here's what I learned

> renv::status()
The following package(s) are installed but not recorded in the lockfile:
             _
  tidyjson     [0.3.2]
  bit64        [4.0.5]
  progress     [1.2.2]
  tidyselect   [1.2.0]
  bit          [4.0.5]
  readr        [2.1.3]
  generics     [0.1.3]
  vroom        [1.6.1]
  dplyr        [1.0.10]
  hms          [1.1.2]
  tzdb         [0.3.0]
  tidyr        [1.2.1]
  assertthat   [0.2.1]

Use `renv::snapshot()` to add these packages to your lockfile.

The following package(s) are out of sync:

     Package   Lockfile Version   Library Version
        Rcpp              1.0.9            1.0.10
        brew              1.0-7             1.0-8
       bslib              0.4.0             0.4.2
       callr              3.7.1             3.7.3
         cli              3.3.0             3.6.0
  commonmark              1.8.0             1.8.1
       cpp11              0.4.2             0.4.3
      crayon              1.5.1             1.5.2
        curl              4.3.2             5.0.0
        desc              1.4.1             1.4.2
    devtools              2.4.4             2.4.5
      digest             0.6.29            0.6.31
    evaluate               0.15              0.20
       fansi              1.0.3             1.0.4
 fontawesome              0.3.0             0.4.0
        gert              1.6.0             1.9.2
          gh              1.3.0             1.3.1
    gitcreds              0.1.1             0.1.2
       highr                0.9              0.10
   htmltools              0.5.3             0.5.4
 htmlwidgets              1.5.4             1.6.1
      httpuv              1.6.5             1.6.8
        httr              1.4.2             1.4.4
    jsonlite              1.8.0             1.8.4
       knitr               1.39              1.41
   lifecycle              1.0.1             1.0.3
     openssl              2.0.0             2.0.5
      pillar              1.8.0             1.8.1
    pkgbuild              1.3.1             1.4.0
     pkgdown              2.0.6             2.0.7
     pkgload              1.3.0             1.3.2
    processx              3.7.0             3.8.0
          ps              1.7.1             1.7.2
       purrr              0.3.4             1.0.1
        ragg              1.2.2             1.2.5
       rlang              1.0.4             1.0.6
   rmarkdown               2.14              2.20
    roxygen2              7.2.1             7.2.3
  rstudioapi               0.13              0.14
   rversions              2.1.1             2.1.2
        sass              0.4.2             0.4.4
       shiny              1.7.2             1.7.4
     stringi              1.7.8            1.7.12
     stringr              1.4.0             1.5.0
         sys                3.4             3.4.1
    testthat              3.1.4             3.1.6
     tinytex               0.40              0.43
       vctrs              0.4.1             0.5.1
     whisker                0.4             0.4.1
        xfun               0.31              0.36
        yaml              2.3.5             2.3.6
         zip              2.2.0             2.2.2

Use `renv::snapshot()` to save the state of your library to the lockfile.
Use `renv::restore()` to restore your library from the lockfile.

Proposed solution

  • Use renv::snapshot() to add the packages to your lockfile.

batch_create_table_items : Besoin d’un paramètre (ou plusieurs paramètres) pour permettre de mettre à jour les rows d’une table qui ont une clé qui existe déjà

Idéalement dans batch_create_table_items, ce serait bien d’avoir un paramètre

  1. Soit refresh_data = TRUE/FALSE, qui si TRUE, mets à jour les rows d’une table qui ont une clé déjà existante

  2. Soit return_existing_keys = TRUE / FALSE, qui si TRUE retourne les keys (ou les ids) qu’il n’a pas pu INSERT parce qu’elles existent déjà. Ce cette façon dans le wrapper clessnverse, je peux utiliser update_table_item pour les mettre à jour individuellement.

Add license

Issue

When running devtools::check() for #25, got the following error message:

── R CMD check results ───────────────────────────────────────────────── hublot 1.5.0 ────
Duration: 522ms

❯ checking for file ‘hublot/DESCRIPTION’ ... ERROR
  Required field missing or empty:
    ‘License’

1 error ✖ | 0 warnings ✔ | 0 notes ✔

Proposed solution

Add license with usethis::use_mit_license()

Document functions

  • Add titles
  • Add parameter description with roxygen2
  • Combine related functions in Rd files
  • Add reusable credentials parameter

Please DELETE tables from hublot

Because of the table rename bug, please delete the following tables from hublot so that they disappear from the Django admin:

[clhub_tables_clhub_tables_mart_political_parties_press_releases_freqmart_political_parties_press_releases_freq] which was a copy/paste fuck up - tries to rename but didn't work. Tries to delete but getting error 500.

[clhub_tables_mart_vd_shiny_medias_prototype] which I tried to rename and now it's no longer accessible

Set up basic automated tests

Issue

To facilitate continuous integration or development, there should be automated tests.

Proposed solution

Set up GitHub Actions to run R CMD check.

For a basic test, use:

usethis::use_github_actions()

Eventually, can set up tests for Linux, MacOs, Windows with

usethis::use_github_action_check_standard()

Remove `@export` tag from not implemented functions

Issue

Some functions aren't implemented. They have parameters that aren't documented. This throws a warning note during RMD check like shown in #61.

Proposed solution

For a quick fix, remove @export tag from functions that aren't implemented

Batch commit of rows in a table

When building a data mart of 1.5M rows, I realized that it takes approximately 1 seconds to write one row. This means that my data mart will be ready in 17 days.

This is because my refiner reads the 'tweets' table from the data warehouse into a data frame, then it processes each row of the dataframe, enriches it, refines it and then writes it to hublot.

It would be faster if my refiner would work on the dataframe in memory and then push this dataframe in batch to an existing mart table in hublot, either in "overwrite mode" (overwrite the entire table) or in "append mode" append the dataframe to the existing content of the table.

Can this be done?

Missing or unexported objects

Issue

── R CMD check results ─────────────────────────────────────────────────────────────────────── hublot 1.6.6 ────
Duration: 9.4s

❯ checking dependencies in R code ... WARNING
  Missing or unexported objects:
    ‘hublot::count’ ‘hublot::handle_response’

count() and handle_response() are probably internal functions that used to be exported before #34.

Proposed solution

Either:

  • Remove hublot prefix
  • Add the functions as exports with @keywords internal

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.