Giter Site home page Giter Site logo

linkrot's Introduction

{linkrot}

Project Status: Concept – Minimal or no implementation has been done yet, or the repository is only intended to be a limited example, demo, or proof-of-concept. R-CMD-check Codecov test coverage rostrum.blog post

An R package to help detect linkrot, which is when links to a web page break because they’ve been taken down or moved.

Very much a concept. I wrote it to detect linkrot on my personal blog and it works for my needs. Feel free to contribute.

Install

This package is only available on GitHub. Install from an R session with:

install.packages("remotes")
remotes::install_github("matt-dray/linkrot")

Example

Pass a webpage URL to detect_rot() and get a tibble with each link on that page and what its response status code is (ideally we want 200).

Here’s a check on one of my older blog posts. The printout tells you the URL you’re looking at, with a period printed for each successful check.

library(linkrot)
page <-  "https://www.rostrum.blog/2018/04/14/r-trek-exploring-stardates/"
rot_page <- detect_rot(page)
#> Checking <https://www.rostrum.blog/2018/04/14/r-trek-exploring-stardates/> ..............................
rot_page
#> # A tibble: 30 x 6
#>    page    link_url    link_text response_code response_catego… response_success
#>    <chr>   <chr>       <chr>             <dbl> <chr>            <lgl>           
#>  1 https:… https://ww… R statis…           200 Success          TRUE            
#>  2 https:… https://en… Star Tre…           200 Success          TRUE            
#>  3 https:… http://www… Star Tre…           200 Success          TRUE            
#>  4 https:… https://gi… regex               200 Success          TRUE            
#>  5 https:… http://vit… tidy                200 Success          TRUE            
#>  6 https:… https://en… Wikipedia           200 Success          TRUE            
#>  7 https:… http://sel… Selector…           200 Success          TRUE            
#>  8 https:… https://cr… how-to v…           404 Client error     FALSE           
#>  9 https:… https://ww… htmlwidg…           200 Success          TRUE            
#> 10 https:… https://gi… ggsci               200 Success          TRUE            
#> # … with 20 more rows

Uh oh, at least one is broken: it has a response_code of 404.

You could iterate over multiple pages with {purrr}:

pages <- c(
  "https://www.rostrum.blog/2018/04/14/r-trek-exploring-stardates/",
  "https://www.rostrum.blog/2018/04/27/two-dogs-in-toilet-elderly-lady-involved/",
  "https://www.rostrum.blog/2018/05/19/pokeballs-in-super-smash-bros/"
)

library(purrr)
rot_pages <- set_names(map(pages, detect_rot), basename(pages))
#> Checking <https://www.rostrum.blog/2018/04/14/r-trek-exploring-stardates/> ..............................
#> Checking <https://www.rostrum.blog/2018/04/27/two-dogs-in-toilet-elderly-lady-involved/> ........................................
#> Checking <https://www.rostrum.blog/2018/05/19/pokeballs-in-super-smash-bros/> .....................
rot_pages
#> $`r-trek-exploring-stardates`
#> # A tibble: 30 x 6
#>    page    link_url    link_text response_code response_catego… response_success
#>    <chr>   <chr>       <chr>             <dbl> <chr>            <lgl>           
#>  1 https:… https://ww… R statis…           200 Success          TRUE            
#>  2 https:… https://en… Star Tre…           200 Success          TRUE            
#>  3 https:… http://www… Star Tre…           200 Success          TRUE            
#>  4 https:… https://gi… regex               200 Success          TRUE            
#>  5 https:… http://vit… tidy                200 Success          TRUE            
#>  6 https:… https://en… Wikipedia           200 Success          TRUE            
#>  7 https:… http://sel… Selector…           200 Success          TRUE            
#>  8 https:… https://cr… how-to v…           404 Client error     FALSE           
#>  9 https:… https://ww… htmlwidg…           200 Success          TRUE            
#> 10 https:… https://gi… ggsci               200 Success          TRUE            
#> # … with 20 more rows
#> 
#> $`two-dogs-in-toilet-elderly-lady-involved`
#> # A tibble: 40 x 6
#>    page     link_url   link_text response_code response_catego… response_success
#>    <chr>    <chr>      <chr>             <dbl> <chr>            <lgl>           
#>  1 https:/… https://w… @mattdray           200 Success          TRUE            
#>  2 https:/… https://d… the Lond…           200 Success          TRUE            
#>  3 https:/… https://g… the sf p…           200 Success          TRUE            
#>  4 https:/… https://r… interact…           200 Success          TRUE            
#>  5 https:/… https://e… eastings…           200 Success          TRUE            
#>  6 https:/… https://e… latitude            200 Success          TRUE            
#>  7 https:/… https://e… longitude           200 Success          TRUE            
#>  8 https:/… https://r… leaflet             200 Success          TRUE            
#>  9 https:/… https://w… R                   200 Success          TRUE            
#> 10 https:/… https://g… sf (‘sim…           200 Success          TRUE            
#> # … with 30 more rows
#> 
#> $`pokeballs-in-super-smash-bros`
#> # A tibble: 21 x 6
#>    page    link_url    link_text response_code response_catego… response_success
#>    <chr>   <chr>       <chr>             <dbl> <chr>            <lgl>           
#>  1 https:… https://en… Super Sm…           200 Success          TRUE            
#>  2 https:… https://en… Super Sm…           400 Client error     FALSE           
#>  3 https:… https://en… SSB Mele…           200 Success          TRUE            
#>  4 https:… https://en… SSB Braw…           200 Success          TRUE            
#>  5 https:… https://en… SSB ‘4’,…           200 Success          TRUE            
#>  6 https:… https://ww… a series…           200 Success          TRUE            
#>  7 https:… https://en… the Supe…           200 Success          TRUE            
#>  8 https:… https://en… Zelda               200 Success          TRUE            
#>  9 https:… https://en… EarthBou…           200 Success          TRUE            
#> 10 https:… https://en… the Poké…           400 Client error     FALSE           
#> # … with 11 more rows

Uh-oh, more broken links.

Code of Conduct

Please note that the {linkrot} project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

linkrot's People

Contributors

matt-dray avatar

Watchers

James Cloos avatar  avatar

linkrot's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.