Giter Site home page Giter Site logo

Comments (24)

realauggieheschmeyer avatar realauggieheschmeyer commented on August 23, 2024 2

I dug around into the source code for the crypto_history function which led me to the scraper function and I believe that is where the error lies. Within your code, you have:

table   <- rvest::html_nodes(page, css = "table") %>% .[1] %>%
    rvest::html_table(fill = TRUE) %>%
    replace(!nzchar(.), NA)

I believe the error lies in pulling the first object within the output of html_nodes. It looks like the way that Coin Market Cap is now set up stores the historical data within the third object of html_nodes. When I copied the code and ran the following...

table   <- rvest::html_nodes(page, css = "table") %>% .[3] %>%
    rvest::html_table(fill = TRUE) %>%
    replace(!nzchar(.), NA)

I was able to scrape the data with no problems (minus the fact that I don't have the cool progress bar function). I'll try opening a pull request to get this fixed.

from crypto.

thelmmortal1 avatar thelmmortal1 commented on August 23, 2024

I'm going to second this issue on crypto_history. Unlike realauggieschmeyer I am not smart enough to figure out why it is happening, but it is happening for me as well.

I generally run something similar to this code:

crypto_history(start_date = 20190101, limit = 6, sleep = 7.1)

I get the subscript error but I still am able to look at the data frame. A couple things stand out. Now there are 4 additional columns that are simply counting the rows and it doesn't correctly limit the pull to the top 6 market caps.

from crypto.

realauggieheschmeyer avatar realauggieheschmeyer commented on August 23, 2024

@thelmmortal1, thanks for the kind words!

I haven't been able to open a PR to fix this issue yet, but I'm going to post the code I adapted that has been working for me. If you copy these into your script and have them override the original crypto functions, then it should be functional. It is for me at least ¯\(ツ)

scraper <- function(attributes, slug, sleep = NULL) {
  .            <- "."
  history_url  <- as.character(attributes)
  coin_slug    <- as.character(slug)
  if (!is.null(sleep)) Sys.sleep(sleep)
  
  page <- tryCatch(
    xml2::read_html(history_url,
                    handle = curl::new_handle("useragent" = "Mozilla/5.0")),
    error = function(e) e)
  
  if (inherits(page, "error")) {
    closeAllConnections()
    message("\n")
    message(cli::cat_bullet("Rate limit hit. Sleeping for 60 seconds.", bullet = "warning", bullet_col = "red"), appendLF = TRUE)
    Sys.sleep(65)
    page <- xml2::read_html(history_url,
                            handle = curl::new_handle("useragent" = "Mozilla/5.0"))
  }
  
  table   <- rvest::html_nodes(page, css = "table") %>% .[3] %>%
    rvest::html_table(fill = TRUE) %>%
    replace(!nzchar(.), NA)
  
  scraper <- table[[1]] %>% tibble::as.tibble() %>%
    dplyr::mutate(slug = coin_slug)
  
  return(scraper)
}
crypto_list <- function(coin = NULL,
                        start_date = NULL,
                        end_date = NULL,
                        coin_list = NULL) {
  if (is.null(coin_list)) {
    json   <- "https://s2.coinmarketcap.com/generated/search/quick_search.json"
    coins  <- jsonlite::fromJSON(json)
  } else {
    ifelse(coin_list == "api",
           coins <- get_coinlist_api(),
           coins <- get_coinlist_static())
  }
  
  if (!is.null(coin)) {
    name   <- coins$name
    slug   <- coins$slug
    symbol <- coins$symbol
    c1     <- subset(coins, toupper(name) %in% toupper(coin))
    c2     <- subset(coins, symbol %in% toupper(coin))
    c3     <- subset(coins, slug %in% tolower(coin))
    coins  <- tibble::tibble()
    if (nrow(c1) > 0) { coins     <- rbind(coins, c1) }
    if (nrow(c2) > 0) { coins     <- rbind(coins, c2) }
    if (nrow(c3) > 0) { coins     <- rbind(coins, c3) }
    if (nrow(coins) > 1L) { coins <- unique(coins) }
  }
  coins <-
    tibble::tibble(
      symbol = coins$symbol,
      name   = coins$name,
      slug   = coins$slug,
      rank   = coins$rank
    )
  if (is.null(start_date)) { start_date <- "20130428" }
  if (is.null(end_date)) { end_date <- gsub("-", "", lubridate::today()) }
  exchangeurl <- paste0("https://coinmarketcap.com/currencies/", coins$slug, "/#markets")
  historyurl <-
    paste0(
      "https://coinmarketcap.com/currencies/",
      coins$slug,
      "/historical-data/?start=",
      start_date,
      "&end=",
      end_date
    )
  exchange_url       <- c(exchangeurl)
  history_url        <- c(historyurl)
  coins$symbol       <- as.character(toupper(coins$symbol))
  coins$name         <- as.character(coins$name)
  coins$slug         <- as.character(coins$slug)
  coins$exchange_url <- as.character(exchange_url)
  coins$history_url  <- as.character(history_url)
  coins$rank         <- as.numeric(coins$rank)
  return(coins)
}
crypto_history <- function(coin = NULL, limit = NULL, start_date = NULL, end_date = NULL,
                           coin_list = NULL, sleep = NULL) {
  pink <- crayon::make_style(grDevices::rgb(0.93, 0.19, 0.65))
  options(scipen = 999)
  i <- "i"
  low <- NULL
  high <- NULL
  close <- NULL
  ranknow <- NULL
  
  message(cli::cat_bullet("If this helps you become rich please consider donating",
                          bullet = "heart", bullet_col = pink))
  message("ERC-20: 0x375923Bf82F0b728d23A5704261a6e16341fd860", appendLF = TRUE)
  message("XRP: rK59semLsuJZEWftxBFhWuNE6uhznjz2bK", appendLF = TRUE)
  message("\n")
  
  coins <- crypto_list(coin, start_date, end_date, coin_list)
  
  if (!is.null(limit))
    coins <- coins[1:limit, ]
  
  coin_names <- tibble::tibble(symbol = coins$symbol, name = coins$name, rank = coins$rank,
                               slug = coins$slug)
  to_scrape <- tibble::tibble(attributes = coins$history_url, slug = coins$slug)
  loop_data <- vector("list", nrow(to_scrape))
  
  message(cli::cat_bullet("Scraping historical crypto data", bullet = "pointer",
                          bullet_col = "green"))
  
  for (i in seq_len(nrow(to_scrape))) {
    loop_data[[i]] <- scraper(to_scrape$attributes[i], to_scrape$slug[i], sleep)
  }
  
  results <- do.call(rbind, loop_data) %>% tibble::as.tibble()
  
  if (length(results) == 0L)
    stop("No data currently exists for this crypto currency.", call. = FALSE)
  
  market_data <- merge(results, coin_names, by = "slug")
  colnames(market_data) <- c("slug", "date", "open", "high", "low", "close", "volume",
                             "market", "symbol", "name", "ranknow")
  market_data <- market_data[c("slug", "symbol", "name", "date", "ranknow", "open",
                               "high", "low", "close", "volume", "market")]
  market_data$date <- lubridate::mdy(market_data$date, locale = platform_locale())
  
  market_data[, 5:11] <- apply(market_data[, 5:11], 2, function(x) gsub(",", "",
                                                                        x))
  market_data[, 7:11] <- apply(market_data[, 7:11], 2, function(x) gsub("-", "0",
                                                                        x))
  market_data$volume <- market_data$volume %>% tidyr::replace_na(0) %>% as.numeric()
  market_data$market <- market_data$market %>% tidyr::replace_na(0) %>% as.numeric()
  market_data[, 5:11] <- apply(market_data[, 5:11], 2, function(x) as.numeric(x))
  market_data <- na.omit(market_data)
  
  market_data <- market_data %>% dplyr::mutate(close_ratio = (close - low)/(high -
                                                                              low) %>% round(4) %>% as.numeric(), spread = (high - low) %>% round(2) %>%
                                                 as.numeric())
  
  market_data$close_ratio <- market_data$close_ratio %>% tidyr::replace_na(0)
  history_results <- market_data %>% dplyr::arrange(ranknow, date)
  return(history_results)
}

from crypto.

JesseVent avatar JesseVent commented on August 23, 2024

Hey guys, sorry haven’t posted sooner. I’ve fixed the issue in latest version you can install off of github and have submitted it to CRAN.

It’s because coin market cap have changed the way their pages render so technically there were multiple tables being returned and could be indexed different depending which currency it was. Now I’m dynamically working out the size of all the tables and returning the one which has the most rows. Please retest

from crypto.

thelmmortal1 avatar thelmmortal1 commented on August 23, 2024

Hey guys, sorry haven’t posted sooner. I’ve fixed the issue in latest version you can install off of github and have submitted it to CRAN.

It’s because coin market cap have changed the way their pages render so technically there were multiple tables being returned and could be indexed different depending which currency it was. Now I’m dynamically working out the size of all the tables and returning the one which has the most rows. Please retest

It seems to work if I'm running it only for a limited number of coins. When I run it for a broader set of coins it still fails

crypto_history(start_date = 20190101, limit = 600, sleep = 7.1)
♥ If this helps you become rich please consider donating

ERC-20: 0x375923Bf82F0b728d23A5704261a6e16341fd860
XRP: rK59semLsuJZEWftxBFhWuNE6uhznjz2bK

Scraping historical crypto data

| [332 / 600] [=================================================================>-----------------------------------------------------] 55% in 00:47:07 ETA: 38mError in result[[1]] : subscript out of bound

from crypto.

realauggieheschmeyer avatar realauggieheschmeyer commented on August 23, 2024

It worked for me when I tried it. However, I mapped crypto_history to a list of currency names, so I may not have opened myself up to the same problem as @thelmmortal1.

Thanks for fixing this issue, @JesseVent!

from crypto.

thelmmortal1 avatar thelmmortal1 commented on August 23, 2024

from crypto.

thelmmortal1 avatar thelmmortal1 commented on August 23, 2024

I'm still getting the same error. Any ideas?

from crypto.

thelmmortal1 avatar thelmmortal1 commented on August 23, 2024

crypto_history(start_date = 20190101, limit = 600, sleep = 7.1)
♥ If this helps you become rich please consider donating

ERC-20: 0x375923Bf82F0b728d23A5704261a6e16341fd860
XRP: rK59semLsuJZEWftxBFhWuNE6uhznjz2bK

Scraping historical crypto data

| [316 / 600] [==============================================================>--------------------------------------------------------] 53% in 00:44:58 ETA: 40mError in result[[1]] : subscript out of bounds

from crypto.

dmrodz avatar dmrodz commented on August 23, 2024

Having this same issue, intermittently. Any updates?
Thanks!

from crypto.

thelmmortal1 avatar thelmmortal1 commented on August 23, 2024

I'm still getting the same error. Any potential fixes out there?

from crypto.

JesseVent avatar JesseVent commented on August 23, 2024

I'm about to commit a fix for something else, but the only thing I could think of without being able to reproduce the issue is remove the start_date argument in your function call. It should be more reliable to retrieve all the rows for the coin (hence populating the table) rather than limiting it to a specific date and then you can filter out the rows you don't need, as opposed to getting the web service to apply the filtering.

Only an idea - not tested or verified.

from crypto.

thelmmortal1 avatar thelmmortal1 commented on August 23, 2024

Thanks for the suggestion.

I ran it without the date and still got the subscript out of bounds error. The following is what I used.

crypto_history(limit = 600, sleep = 7.5)

I get the error around the ~160th coin

from crypto.

dukes00 avatar dukes00 commented on August 23, 2024

Hi, I'm having the same error at exactly the 160th coin just as @thelmmortal1 mentioned.
Any updates?

from crypto.

MarkYueMa avatar MarkYueMa commented on August 23, 2024

It worked for me when I tried it. However, I mapped crypto_history to a list of currency names, so I may not have opened myself up to the same problem as @thelmmortal1.

Thanks for fixing this issue, @JesseVent!

could you share how many coins were in your map and how long was the sleep time between each query? and also did you use furrr for multiprocessing. Any comments would be appreciated.

from crypto.

MarkYueMa avatar MarkYueMa commented on August 23, 2024

It worked for me when I tried it. However, I mapped crypto_history to a list of currency names, so I may not have opened myself up to the same problem as @thelmmortal1.
Thanks for fixing this issue, @JesseVent!

could you share how many coins were in your map and how long was the sleep time between each query? and also did you use furrr for multiprocessing. Any comments would be appreciated.

and how big was the final dataset? Thanks! @realauggieheschmeyer

from crypto.

dukes00 avatar dukes00 commented on August 23, 2024

@MarkYueMa I tried to code my own solution to this problem and I believe it is an issue with certain currencies, and not with the number of coins. As of now, I was able to download 2369 out of 3410 coins with a tryCatch and either a custom scraper or crypto_history()

from crypto.

MarkYueMa avatar MarkYueMa commented on August 23, 2024

from crypto.

dukes00 avatar dukes00 commented on August 23, 2024

I suppose so, all the "major" ones were downloaded successfully.

from crypto.

thelmmortal1 avatar thelmmortal1 commented on August 23, 2024

from crypto.

realauggieheschmeyer avatar realauggieheschmeyer commented on August 23, 2024

It worked for me when I tried it. However, I mapped crypto_history to a list of currency names, so I may not have opened myself up to the same problem as @thelmmortal1.
Thanks for fixing this issue, @JesseVent!

could you share how many coins were in your map and how long was the sleep time between each query? and also did you use furrr for multiprocessing. Any comments would be appreciated.

Hey @MarkYueMa. I only had about 15 or so coins in my map. I use Coinbase as my crypto trading platform and they only have so many tradable currencies. I didn't change the sleep time, but when I hit about 10 queries, the query puts itself to sleep for 60 seconds. As for furrr, I didn't feel it was necessary for this particular request as there were only a small number of currencies. If I was doing 1500 currency requests, then I would definitely think about parallel processing that request.

from crypto.

neelanjanghosh avatar neelanjanghosh commented on August 23, 2024

Hey @JesseVent ,
I am still getting this issue on running.

x = crypto_history("DOT",start_date = 20200101,limit = 100,sleep = 7.1)

image

from crypto.

alienalex6 avatar alienalex6 commented on August 23, 2024

Hey @JesseVent I'm having the same issue as @neelanjanghosh.

probably another change to CoinMarketCap structure? It would be awesome if you could help out. This package has helped a lot!

from crypto.

demirelesad avatar demirelesad commented on August 23, 2024

Is it necessary to switch to the pro plan to access historical data? Do you guys know another option ? Thanks

from crypto.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.