Giter Site home page Giter Site logo

eddelbuettel / anytime Goto Github PK

View Code? Open in Web Editor NEW
159.0 12.0 18.0 3.03 MB

Anything to POSIXct or Date Converter

Home Page: https://eddelbuettel.github.io/anytime

License: GNU General Public License v2.0

R 66.98% C++ 27.42% Shell 0.71% TeX 4.89%
r posixct date datetime boost conversions cran c-plus-plus-11 rcpp cpp11

anytime's Introduction

anytime: Anything to 'POSIXct' or 'Date' Converter

CI License CRAN r-universe Dependencies CRAN use Downloads (monthly) Downloads (total) Code Coverage Last Commit Documentation JOSS

Motivation

R excels at computing with dates, and times. Using typed representation for your data is highly recommended not only because of the functionality offered but also because of the added safety stemming from proper representation.

But there is a small nuisance cost in interactive work as well as in programming. Users must have told as.POSIXct() about a million times that the origin is (of course) the epoch. Do we really have to say it a million more times? Similarly, when parsing dates that are some form of YYYYMMDD format, do we really have to manually convert from integer or numeric or factor or ordered to character? Having one of several common separators and/or date / time month forms (YYYY-MM-DD, YYYY/MM/DD, YYYYMMDD, YYYY-mon-DD and so on, with or without times), do we really need a format string? Or could a smart converter function do this?

anytime() aims to be that general purpose converter returning a proper POSIXct (or Date) object no matter the input (provided it was somewhat parseable), relying on Boost date_time for the (efficient, performant) conversion. anydate() is an additional wrapper returning a Date object instead.

Documentation

Package documentation, help pages, a vignette, and more is available here.

Examples

We show some simple examples on Date types.

(Note that in the first few examples, and for numeric conversion in this range we now only use anydate as anytime is consistent in computing seconds since epoch. If you want the behaviour of version older than 0.3.0, set oldHeuristic=TRUE, see help(anytime) for more.)

From Integer or Numeric or Factor or Ordered

library(anytime)                      ## also caches TZ in local env
options(digits.secs=6)                ## for fractional seconds below

## integer
anydate(20160101L + 0:2)              ## older version used anytime for this too
[1] "2016-01-01 CST" "2016-01-02 CST" "2016-01-03 CST"

## numeric
anydate(20160101 + 0:2)
[1] "2016-01-01 CST" "2016-01-02 CST" "2016-01-03 CST"

## factor
anydate(as.factor(20160101 + 0:2))
[1] "2016-01-01 CST" "2016-01-02 CST" "2016-01-03 CST"

## ordered
anydate(as.ordered(20160101 + 0:2))
[1] "2016-01-01 CST" "2016-01-02 CST" "2016-01-03 CST"

Character: Simple

## Dates: Character
anydate(as.character(20160101 + 0:2))
[1] "2016-01-01 CST" "2016-01-02 CST" "2016-01-03 CST"

## Dates: alternate formats
anydate(c("20160101", "2016/01/02", "2016-01-03"))
[1] "2016-01-01 CST" "2016-01-02 CST" "2016-01-03 CST"

Character: ISO

## Datetime: ISO with/without fractional seconds
anytime(c("2016-01-01 10:11:12", "2016-01-01 10:11:12.345678"))
[1] "2016-01-01 10:11:12.000000 CST" "2016-01-01 10:11:12.345678 CST"

## Datetime: ISO alternate (?) with 'T' separator
anytime(c("20160101T101112", "20160101T101112.345678"))
[1] "2016-01-01 10:11:12.000000 CST" "2016-01-01 10:11:12.345678 CST"

Character: Textual month formats

## ISO style
anytime(c("2016-Sep-01 10:11:12", "Sep/01/2016 10:11:12", "Sep-01-2016 10:11:12"))
[1] "2016-09-01 10:11:12 CDT" "2016-09-01 10:11:12 CDT" "2016-09-01 10:11:12 CDT"

## Datetime: Mixed format (cf https://stackoverflow.com/questions/39259184)
anytime(c("Thu Sep 01 10:11:12 2016", "Thu Sep 01 10:11:12.345678 2016"))
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"

Character: Dealing with DST

This shows an important aspect. When not working localtime (by overriding to UTC) the changing difference UTC is correctly covered (which the underlying Boost Date_Time library does not by itself).

## Datetime: pre/post DST
anytime(c("2016-01-31 12:13:14", "2016-08-31 12:13:14"))
[1] "2016-01-31 12:13:14 CST" "2016-08-31 12:13:14 CDT"
anytime(c("2016-01-31 12:13:14", "2016-08-31 12:13:14"), tz="UTC")  # important: catches change
[1] "2016-01-31 18:13:14 UTC" "2016-08-31 17:13:14 UTC"

Technical Details

The heavy lifting is done by a combination of Boost lexical_cast to go from anything to string representation which is then parsed by Boost Date_Time. We use the BH package to access Boost, and rely on Rcpp for a seamless C++ interface to and from R.

Further, as the Boost Date_Time library cannot resolve timezones on the Windows platform (where timezone information is typically provided by R itself for its use), we offer a fallback of calling into R (via facilities from Rcpp); see the help for the useR argument for more details.

Status

The package should work as expected.

Example Uses

Several different CRAN packages import this package. Among them are the following research-focused packages:

  • adheRenceRX by Beal assesses medication adherence;
  • AGread by Hibbing et al which reads and transforms ActiGraph physical activity measures;
  • cqcr by Odell accesses 'Care Quality Commission' data from the health and adult social care regulator for England;
  • datadogr by Yutani queries metrics from Datadog;
  • E4tools by Kleiman which reads data from Empatica wearable physiology monitors;
  • nprcgenekeepr by Raboin et al provides genetic tools for colony management ;
  • RDS by Handcock et al which is part of the "RDS Ananlyst" suite for analysing respondent-driven sampling data;
  • rtsdata by RTSVizTeam manages time series data dtorage;
  • threesixtygiving by Odell accesses download charitable grants from the '360Giving' Platform;
  • tsbox by Sax for format-agnostic time series data representation and conversions;
  • tsibble by Wang et al for temporal data in an explicit data- and model-oriented format.

Changes

See the NEWS.Rd file on CRAN or GitHub. In particular, version 0.3.0 corrects an overly optimistic heuristic for integer or numeric arguments and now behaves more like R itself. Specifically, epoch offsets are interpreted as seconds for datetime objects, and days for date objects. The prior behaviour can be restored with an option which also be be set globally, see the help page for details.

Installation

The package is now on CRAN and can be installed via a standard

install.packages("anytime")

Continued Testing

As we rely on the tinytest package, the already-installed package can also be verified via

tinytest::test_package("anytime")

at any later point.

Contributing

Any problems, bug reports, or features requests for the package can be submitted and handled most conveniently as Github issues in the repository.

Before submitting pull requests, it is frequently preferable to first discuss need and scope in such an issue ticket. See the file Contributing.md (in the Rcpp repo) for a brief discussion.

Author

Dirk Eddelbuettel

License

GPL (>= 2)

anytime's People

Contributors

bobjansen avatar christophsax avatar eddelbuettel avatar nachti avatar russellpierce avatar stephenbfroehlich avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

anytime's Issues

Parsing ISO 8601 compatible standard timestamp format (yyyy-mm-ddThh:mm:ss+-ZONE)

Is there any possibility to support ISO 8601 compatible standard timestamp format (yyyy-mm-ddThh:mm:ss+-ZONE) with timezone offset?

library(anytime)
Sys.setenv(TZ=anytime:::getTZ())      ## helper function to try to get TZ
anytime("2016-12-13T17:09:48+01:00")

> anytime("2016-12-13T17:09:48+01:00")
[1] "2016-12-12 23:00:00 UTC"

Thanks,
Dusan

Occasionally producing inconsistent results

Occasionally anydate() produces inconsistent results with integer representing dates in yyyyMMdd format. But it's not easy to constantly reproduce.

> anytime::anydate(20160101L)
[1] "1400-01-01"

> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora 24 (Server Edition)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_3.3.3        Rcpp_0.12.11       RApiDatetime_0.0.3 anytime_0.3.0     

new format, possibly

Not a bug. Old USGS river flow files, widely consulted, use the following datetime format:
19881001001500 PDT. Anytime seems unhappy with this format.

Milliseconds rounding issues

Dirk,

I was glancing through the anytime functionalities and ended up discovering there must be an issue with rounding.

> options(digits.secs=6)
> format(anytime("2016-02-25 17:34:00.376", tz="America/New_York"), "%Y-%m-%d %H:%M:%OS %Z")
[1] "2016-02-25 11:34:00.375999 EST"
> format(anytime("2016-02-25 17:34:00.377", tz="America/New_York"), "%Y-%m-%d %H:%M:%OS %Z")
[1] "2016-02-25 11:34:00.377000 EST"
> format(anytime("2016-02-25 17:34:00.375", tz="America/New_York"), "%Y-%m-%d %H:%M:%OS %Z")
[1] "2016-02-25 11:34:00.375000 EST"
> options(digits.secs=3)
> format(anytime("2016-02-25 17:34:00.376", tz="America/New_York"), "%Y-%m-%d %H:%M:%OS %Z")
[1] "2016-02-25 11:34:00.375 EST"`

another date format - day missing preceding zeros

I'd like the package to handle this, if possible

library(anytime)
library(stringr)
library(dplyr)
 df <- data.frame(date=c("Apr 3, 2016","Apr 13, 2016"),stringsAsFactors = FALSE)

As expected

df_new <-df %>% 
  mutate(newdate=anydate(date))
glimpse(df_new)

Observations: 2
Variables: 2
$ date    <chr> "Apr 3, 2016", "Apr 13, 2016"
$ newdate <date> NA, 2016-04-13

I started doing a hack (which would input a 0 if the ", "was in a particular position but at first stage had the newdate returned as a double

df_new <-df %>% 
  mutate(newdate=ifelse(str_sub(date,6,6)==",",anydate(Sys.Date()),anydate(date)))
glimpse(df_new)

Observations: 2
Variables: 2
$ date    <chr> "Apr 3, 2016", "Apr 13, 2016"
$ newdate <dbl> 17288, 16904

ISO 8601 dates: TZ is ignored

Hi,

I just ran into this issue and was wondering why anytime-0.2.0 truncated the TZ information.
For example, anytime("2011-09-08T18:49:05-07:00") returns [1] "2011-09-08 18:49:05 CEST", which is just wrong.

Wouldn't it be better to

  • support the TZ (compare parsedate for ISO 8601)?
  • at least give a warning in that case?

Thx!

Parsing single digit months

I wanted to answer this question on StackOverFlow, with "anytime".

This are the dates the have, the format is: "single digit month, day, full year"

2/10/2016  
4/4/2016  
5/8/2016  
10/1/2016

However, anydate() only works on the first entry, not the rest. In the PDF at CRAN, we find:

Issues
The Boost Date_Time library cannot parse single digit months or days. So while ‘2016/09/02’
works (as expected), ‘2016/9/2’ will not. Other non-standard formats may also fail.
The is a known issue (discussed at length in issue tick 5) where Australian times are off by an hour.
This seems to affect only Windows, not Linux.

So, apparently this is a known bug/issue. Are there anywork arounds?

Or we should just use as.POSIXct(df$final_date, format = "%d/%m/%Y") ?

anydate("2/10/2016")
[1] "2016-02-10"
anydate("4/4/2016")
[1] NA
anydate("5/8/2016")
[1] NA
anydate("10/1/2016")
[1] NA

R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=Spanish_Peru.1252  LC_CTYPE=Spanish_Peru.1252   
[3] LC_MONETARY=Spanish_Peru.1252 LC_NUMERIC=C                 
[5] LC_TIME=Spanish_Peru.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] anytime_0.0.4 dplyr_0.5.0  

loaded via a namespace (and not attached):
[1] lazyeval_0.2.0  magrittr_1.5    R6_2.1.3       
[4] assertthat_0.1  rsconnect_0.4.3 DBI_0.5        
[7] tools_3.3.1     tibble_1.2      Rcpp_0.12.7

anytime function changing the input variable

Hi,
The function anytime(x) will change the object x when applied, in the following example (anytime 0.2.2 with R 3.3.3). Is this the normal behavior of the package?

library(anytime)

# as.numeric(as.POSIXct("2017-01-01 12:34:56", tz = "Asia/Shanghai"))
unix_time1 = 1483245296 
unix_time2 = unix_time1
unix_time3 = unix_time1+1-1

time1 = anytime(unix_time1)
print(time1)
print(unix_time1)
print(unix_time2)
print(unix_time3)

My output is (respectively):
"2017-01-01 12:34:56 CST"
"2017-01-01 12:34:56 CST" #I expected 1483245296
"2017-01-01 12:34:56 CST" #I expected 1483245296
1483245296

asUTC=TRUE isn't working as expected

I could be misunderstanding the purpose of the parameter, but I was hoping that passing asUTC = TRUE would result in the first response having a Z at the end indicating that it is in UTC time.

rfc3339(anytime(today()-1, asUTC = TRUE))
[1] "2017-07-15T20:00:00-0400"
rfc3339(anytime(today()-1, asUTC = FALSE))
[1] "2017-07-15T20:00:00-0400"

Implement a robust testing strategy

Results of calls to anytime depend on local settings which makes it a bit more tricky to write good tests, they can fail in different environments. Ideally we make sure that

  1. Tests run correctly independent of the environment;
  2. Still correctly simulate a wide array of locales inside 1 R session (no stop session, change setting, start session)

Wrongfully recognition with %m/%d/%y

I want to convert strings that are formatted as such 01/15/17 (January 15th 2017). But when I add the %m/%d/%y formats, it recognises random strings as date.

library(anytime)
string <- "121013_3_1"
anytime(string) #NA

addFormats(c("%m/%d/%y"))
anytime(string) #1999-12-01 UTC

Interestingly, it seems like many characters are actually ignored.

vector <- c("121013_3_1", "1210111", "12z01z_3_1", "121013_3$1")
all(anytime(vector) == "1999-12-01 UTC") #TRUE

Is there another more specific format I could use?

Promblems if NA first in vector

A vector with date and NA (in that order) works as expected:

> anytime::anydate(c(Sys.Date(), NA))
[1] "2016-10-06" NA          

But the reversed order does not:

> anytime::anydate(c(NA, Sys.Date()))
Error in as.POSIXlt.numeric(anytime(x = x, tz = tz)) : 
  'origin' must be supplied

Would it be possible to recognise dates even if proceeded by NA:s?

Apparently anytime() not handling July in "%d-%b-%Y" format

Here is an an interesting issue I had with anytime()
consider this

library(anytime)
dtchar = c( "30-Jun-2014 23:30:00", "30-Jun-2014 23:45:00", "01-Jul-2014 00:00:00", "01-Jul-2014 00:15:00", "01-Jul-2014 00:30:00")

Here is what anytime () gave to me

anytime(dtchar)
 [1] "2014-06-30 18:30:00 EDT" "2014-06-30 18:45:00 EDT"
 [3] NA                        NA                       
 [5] NA

Not sure if this a but but it seems to me that in this format ("%d-%b-%Y %H:%M:%S"), anytime () is returning NA for the month of July
Testing with all the twelve months shows this

dtchar2 = c("01-Jan-2014 00:30:00", "01-Feb-2014 00:30:00","01-Mar-2014 00:30:00",
       "01-Apr-2014 00:30:00","01-May-2014 00:30:00", "30-Jun-2014 23:30:00",
  "30-Jul-2014 23:45:00", "01-Aug-2014 00:00:00", "01-Sep-2014 00:15:00",
 "01-Oct-2014 00:30:00", "01-Nov-2014 00:30:00", 
 "01-Dec-2014 00:30:00")

anytime(dtchar2)
[1] "2014-01-01 00:30:00 EST" "2014-02-01 00:30:00 EST"
 [3] "2014-03-01 00:30:00 EST" "2014-04-01 00:30:00 EDT"
 [5] "2014-05-01 00:30:00 EDT" "2014-06-30 23:30:00 EDT"
 [7] NA                        "2014-08-01 00:00:00 EDT"
 [9] "2014-09-01 00:15:00 EDT" "2014-10-01 00:30:00 EDT"
[11] "2014-11-01 00:30:00 EDT" "2014-12-01 00:30:00 EST"

while the base as.POSIXct() handles it correctly

as.POSIXct(dtchar2, format = "%d-%b-%Y %H:%M:%S")
[1] "2014-01-01 00:30:00 EST" "2014-02-01 00:30:00 EST"
 [3] "2014-03-01 00:30:00 EST" "2014-04-01 00:30:00 EDT"
 [5] "2014-05-01 00:30:00 EDT" "2014-06-30 23:30:00 EDT"
 [7] "2014-07-30 23:45:00 EDT" "2014-08-01 00:00:00 EDT"
 [9] "2014-09-01 00:15:00 EDT" "2014-10-01 00:30:00 EDT"
[11] "2014-11-01 00:30:00 EDT" "2014-12-01 00:30:00 EST"

anydate sometimes yields NAs

Hi Dirk, for some input values I am getting NAs. Is this a bug?

Might be related to bug#33 which I reported before but was fixed SOv report

library(anytime)  # V 0.3.0
a <- c("3/22/2013 0:00", "3/21/2012 0:00", "2/19/2014 0:00", "12/5/2013 0:00", "5/8/2013 0:00", "10/15/2010 0:00")
anydate(a)
# [1] "2013-03-22" "2012-03-21" "2014-02-19" NA           NA          
# [6] "2010-10-15"

sessionInfo()

R version 3.3.2 (2016-10-31)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 7 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base

other attached packages:
[1] anytime_0.3.0

loaded via a namespace (and not attached):
[1] tools_3.3.2 Rcpp_0.12.11 RApiDatetime_0.0.3

Dates before the epoch cause R to crash

In my data set I had a date of birth that was before January 1, 1970. It crashed Rstudio and R. It took me a long time to find out what was going on.

I don't really know what an epoch is in software but I know that anytime can't handle <1970-01-01

library (anytime)  
anydate("10/20/1970")
anydate("01/01/1970")
anydate("12/01/1969")

The first two work as expected. The last line crashes R

anytime not handling Dutch month abbreviations

Today I ran into an issue with pasing Dutch date strings.

A reproducible example:

# setting the locale to Dutch
Sys.setlocale("LC_TIME", "nl_NL.UTF-8")

Using anydate seems to fall back to UTC:

# Dutch month abbreviation doesn't work
> anydate("2013-mrt-14")
[1] NA

# English one does
> anydate("2013-mar-14")
[1] "2013-03-14"

Specifying the timezone with the tz parameter doesn't work either:

> anydate("2013-mrt-14", tz = 'Europe/Amsterdam')
[1] NA

Any idea what might be wrong?

anytime is overwriting inputs

This is such a useful package, thanks!

But when I use any of the functions on numeric output, it overwrites the input variable for some reason...

It works fine with integer or character input.

Here's an example:

  library(anytime)

  # this works fine
  x <- "1949-01-01"
  b <- anytime(x)
  x  # i.e., x is not modified
#> [1] "1949-01-01"

  x <- -662688000
  b <- anytime(x)
  x  # x is now equal to b !!!
#> [1] "1949-01-01 01:00:00 CET"

  # If I am feeding an integer, everything is fine again...
  x <- -662688000L
  b <- anytime(x)
  x  
#> [1] -662688000

is it possible to force the assumption of a constant format

I get a variety of different date formats, but every date column is internally consistent. But I may have to convert several dates on 30 million rows. So a data set with 30 million start and end dates requires 60 million conversions, which is time consuming.

I was thinking that, in this use case, once anytime figures out what the date format of the first non-missing date is, it doesn't need to do anymore checking. From there, it can parse all the rest of the dates using the same format.

My feature request is a function parameter to "assume" consistent date formatting, strictly for speed purposes.

Feel free to close if this doesn't seem consistent with the purpose of the package. And thanks for putting out anytime and all your other packages.

Boost / R conversion differences

In a comment to #36, @statquant shows some useful R code with datetime conversion between R and Boost.

It shows some residual differences for a fraction of the inputs, and we need to drill down where it stems from. In commit d5e3417 and a4fd956 we add a little helper script which converts numeric time using Boost to a string and back. For core years (from 1902 onwards) this works without discrepancy. We should expand from here to get to the bottom of the other differences.

Thanks & Get back index into format array that matched each date/time?

Derk, This is fantastic. I all too frequently get files with inconsistent date/time formats in a column (thanks mostly to Excel!). One feature request: When doing data quality audits, it would be useful to show the frequency of each format in the input stream which would be trivial if there was an option to return the index of format rather than the date/time. -Jim

Milliseconds are not parsed for certain formats

When I run (from a fresh install from CRAN and when running on my version of master)

options(digits.secs = 6)
dput(anytime(c("20160901 101112.345678", "01Sep2016 101112.345678")))
structure(c(1472688672.34568, 1472688672), 
          class = c("POSIXct","POSIXt"), tzone = "Australia/NSW")

the milliseconds seem not to be parsed. Something that doesn't seem to happen in tests/allFormats.Rout.save.

incorrect time returned from yyyymmddhhmmss

I often use a timestamp created from Sys.time() as a human-readable key (either as a char or int to show when scripts were run). Getting this back into POSIXct with anytime would be a nice addition, however, it doesn't quite get the time stamp correct:

t <- gsub("-|:| ", "", Sys.time())
> t
[1] "20160913122601"
> anytime(t, tz = "Australia/Melbourne")
[1] "2016-09-13 22:01:00 AEST"  ## the time appears incorrect

Is this something you think should be added?

ubsan warning at CRAN

See https://www.stats.ox.ac.uk/pub/bdr/memtests/clang-UBSAN/anytime/tests/allFormats.Rout:

> anytime(c("2016-09-01 10:11:12", "2016-09-01 10:11:12.345678"))
/data/gannet/ripley/R/test-clang/BH/include/boost/lexical_cast/detail/converter_lexical_streams.hpp:235:43: runtime error: downcast of address 0x7fffce329628 which does not point to an object of type 'buffer_t' (aka 'basic_unlockedbuf<std::basic_streambuf<char, char_traits<char> >, char>')
0x7fffce329628: note: object is of type 'std::__1::basic_stringbuf<char, std::__1::char_traits<char>, std::__1::allocator<char> >'
 9b 7f 00 00  50 15 42 38 9b 7f 00 00  80 01 46 46 9b 7f 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^~~~~~~~~~~~~~~~~~~~~~~
              vptr for 'std::__1::basic_stringbuf<char, std::__1::char_traits<char>, std::__1::allocator<char> >'
SUMMARY: AddressSanitizer: undefined-behavior /data/gannet/ripley/R/test-clang/BH/include/boost/lexical_cast/detail/converter_lexical_streams.hpp:235:43 in 
[1] "2016-09-01 09:11:12.000000 BST" "2016-09-01 09:11:12.345678 BST"

Dismissed at first as a Boost issue; is actually related to the SEXP conversion.

Issue with %d.%m.%Y format

Hi I am having issue with "%d.%m.%Y" format.

dates<-c("20.04.2018", "19.05.2018")
anydate(dates)

Output

[1] NA NA

I am using Window 10 64 bit machine.

 packageVersion("anytime")
[1] ‘0.3.0’

R version 3.4.3

Thanks!

Convert timestamp in milliseconds?

Thanks for a great package. I recently encountered a timestamp in milliseconds (it is strange, I know). Would it be worthwhile to have a function for converting these into POSIXCT or Date objects?

New format

R> anytime:::testFormat("%a %d %b %Y %H:%M:%S %q", "Sat, 22 Oct 2016 09:06:43 -0400")
[1] "2016-10-22 09:06:43 CDT"
R>

The string Sat, 22 Oct 2016 09:06:43 -0400 is a common email header time format. If I recall correctly this is the RFC822 format (with the 2-digits year expanded to 4-digits). Would be good to have it.

Timezone offset is not parsed for input though per the Boost Date_time documentation.

Another format?

A common format is "Thu Jan 17 09:29:10 EST 2013" but the %Z conversion only works on output.
Hence:

R> library(anytime)
R> anytime("Thu Jan 17 09:29:10 EST 2013")
[1] NA
R> anytime:::testFormat("%a %b %d %H:%M:%S %z %Y", "Thu Jan 17 09:29:10 EST 2013")
[1] NA
R> anytime:::testFormat("%a %b %d %H:%M:%S %Z %Y", "Thu Jan 17 09:29:10 EST 2013")
[1] NA

One can substitute the timezone out:

R> anytime("Thu Jan 17 09:29:10  2013")
[1] "2013-01-17 09:29:10 CST"
R> 

but that is hackish.

A better trick seems to be to just block the three letters with some text, here xxx:

R> anytime:::testFormat("%a %b %d %H:%M:%S xxx %Y", "Thu Jan 17 09:29:10 EST 2013")
[1] "2013-01-17 09:29:10 CST"
R> 

That is probably more useful and worth adding.

[Feature request] Support for Japanese formatted dates

The date format yyyy年mm月dd日 is often used in Japan, which is similar to yyyyYmmMddD format in English.

Anytime handles the English case:

> library(anytime)
> anytime("2016Y12M31D")
[1] "2016-12-31 EST"

It'd be nice if it handled the Japanese case too:

> anytime("2016年12月31日")
[1] NA

Europe/London TZ Goes back an Hour

I haven't been able to independently reproduce this issue, however, as I'm sure you are aware anytime is failing on R-devel for some platforms. I didn't see an open issue here, so I thought I'd open one such that I could track it. It could very well be a fault in R-devel as opposed to anytime per-se. However, given that I haven't been able to reproduce the issue, I'm in a tough place in terms of evaluating whether that is so. What are your thoughts/impressions?

Unexpected shift of day when forcing timezone to GMT

The default timezone for my R session is 'Europe/Berlin':

> anytime:::getTZ()
[1] "Europe/Berlin"

However, we strive to store all our data using GMT, so in order to convert the date 20161010 to POSIXct I ran

> anytime(20161010, tz = "GMT")
[1] "2016-10-09 22:00:00 GMT"

which, to my surprise, returned 10pm on the previous day.

Similarly, anydate returns the previous day:

anydate(20161010, tz = "GMT")
[1] "2016-10-09"

Is this intentional?

This is what is stored under the hood:

> dput(anytime(20161010, tz = "GMT"))
structure(1476050400, class = c("POSIXct", "POSIXt"), tzone = "GMT")
> dput(anytime(20161010))
structure(1476050400, class = c("POSIXct", "POSIXt"), tzone = "Europe/Berlin")

R session crashes with nonsensical values on Windows

This crashes the R session on Windows:

anytime::anytime(c("2.343423423", "3.435435345"))

It works fine with useR = TRUE:

anytime::anytime(c("2.343423423", "3.435435345"), useR = TRUE)
# [1] NA NA

and also on other platforms.

Using R 3.5 on Windows 7, with latest anytime.

Counter intuitive result when using anydate, is it a config problem I have ?

I am trying to use anytime those days and I see results that seems counter intuitive to me.
I have the following:

anytime:::getTZ()
[1] "Europe/London"
anytime('2016-05-12 15:00:00')
[1] "2016-05-12 14:00:00 BST" # which I would have expected to be "2016-05-12 15:00:00 BST"
anytime('2016-12-12 15:00:00')
[1] "2016-12-12 14:00:00 GMT" # which I would have expected to be "2016-05-12 15:00:00 GMT"

Reading the vignette I understand that from
function (x, tz = getTZ(), asUTC = FALSE) that tz is NOT the timezone anytime uses to parse the data, which defaults to local (or UTC if asUTC is TRUE) but for "display". This makes sense as when you read data (from a string) you probably assume that the data is in your timezone.

When I use anydate I then get

anydate('2016-12-12')         
[1] "2016-12-11"
anydate(as.Date('2016-12-12')) # as as.Date('2016-12-12') is already a date 
[1] "2016-12-12"

Is this the expected behaviour ?
I am not sure what I miss here.

anydate transforms to previous day

anydate transforms date to previous day, while anytime correctly transforms the dates:

> anydate(20150101)
[1] "2014-12-31"
> anydate("2015/01/01")
[1] "2014-12-31"
> anytime(20150101)
[1] "2015-01-01 CET"
> anytime("2015/01/01")
[1] "2015-01-01 CET"

POSIXct to Date conversion could use timezone

anydate() seems to be missing the timezone argument to as.Date() internally ...

library(anytime)

tzone <- "America/New_York"
x <- as.POSIXct("2017-01-01 19:00",tzone)
anydate(x,tzone)
# "2017-01-02"

utcdate() should also be changed

Thanks for the awesome package!

Parsing time in a TZ other than localtime or UTC?

Have you ever considered expanding asUTC's functionality to allow users to parse time in any timezone they specify?

I regularly work with data that is neither in my local timezone nor in UTC. For example, I might receive data with timestamps in strings that should eventually be formatted as POSIXct with an Eastern timezone. I am physically located in California, and my computer's default timezone is Pacific.

In my standard workflow I would do something like:

> tz <- "America/New_York"
> Sys.setenv(TZ = tz)
> imported_data <- "2016-08-01 01:00:00"
> imported_data <- as.POSIXct(imported_data, format = "%Y-%m-%d %H:%M:%S",  tz = tz)
> imported_data
[1] "2016-08-01 01:00:00 EDT"

I'd love to switch to using anytime (so much faster! so much more intuitive!), but I can't get dates to parse in anything other than UTC (with asUTC = T) or my computer's actual localtime (ie PST, regardless of what I set in my R environment). The output is then adjusted to whatever timezone I've specified after the parsing happens, so the above example returns as "2016-08-01 04:00:00" instead of "2016-08-01 01:00:00". Creating the string after changing the R environment timezone also doesn't fix the issue:

> library(anytime)
> imported_data <- "2016-08-01 01:00:00"
> anytime(imported_data)
[1] "2016-08-01 01:00:00 PDT"
> tz <- "America/New_York"
> anytime:::setTZ(tz)
> anytime(imported_data)
[1] "2016-08-01 04:00:00 EDT"
> imported_data2 <- "2016-08-01 01:00:00"
> anytime(imported_data2)
[1] "2016-08-01 04:00:00 EDT"

In case it helps, I'm running Windows 10. A colleague running Ubuntu 1604 had the same thing happen:

On Ubuntu, R version 3.3.2:
> library(anytime)
Warning message:
In fun(libname, pkgname) : No TZ information found. Falling back to UTC.
> imported_data <- "2016-08-01 01:00:00"
> anytime(imported_data)
[1] "2016-08-01 08:00:00 UTC"
> tz <- "America/New_York"
> anytime:::setTZ(tz)
> anytime(imported_data)
[1] "2016-08-01 04:00:00 EDT"
> imported_data2 <- "2016-08-01 01:00:00"
> anytime(imported_data2)
[1] "2016-08-01 04:00:00 EDT" 

As far as I can tell from reading through some of the closed issues, this is the same problem (wanting to actually PARSE in UTC, as opposed to just convert an already-parsed date to UTC) that led to the creation of the asUTC feature. Would it be possible to get a more flexible implementation that would allow parsing in any specified timezone?

anytime(numeric) kills R session

Running below kills R session (RStudio Version 1.0.136.)

library(anytime) #anytime_0.2.1

anytime(41275)

Didn't test it on dev version, could be a duplicate of issue: #56

sessionInfo

R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] anytime_0.2.1

loaded via a namespace (and not attached):
[1] tools_3.3.2  Rcpp_0.12.10

Crash in RStudio (not in R) when calling testFormat

I'm on:

> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=nl_NL.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=nl_NL.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=nl_NL.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_3.2.3

If I run

anytime:::testFormat("%m/%d/%Y", "22/3/2016")

RStudio (Version 0.99.903) crashes immediately but R ran from the command line does not.

My first hypothesis is that memory corruption happens in anytime in a way that only causes problems when the rsession is started like RStudio does and will try to confirm.

utcdate and anydate don't seem to play well with factors (OSX)

I am running R on OSX within emacs or from terminal and run into problems like below:

as.factor(20160101 + 0:2)
### [1] 20160101 20160102 20160103
### Levels: 20160101 20160102 20160103
anytime(as.factor(20160101 + 0:2))
### [1] "2015-12-31 23:00:00 GMT" "2016-01-01 23:00:00 GMT"
### [3] "2016-01-02 23:00:00 GMT"
utctime(as.factor(20160101 + 0:2))
### [1] "2016-01-01 GMT" "2016-01-02 GMT" "2016-01-03 GMT"
utcdate(as.factor(20160101 + 0:2))
### [1] "1400-01-01" "1400-02-01" "1400-03-01"
anydate(as.factor(20160101 + 0:2))
[1] "1400-01-01" "1400-02-01" "1400-03-01"

Support for Excel day count as input

Excel stores dates internally as the number of days since 1899-12-30, i.e. today is 42664. Importing Excel files doesn't always work perfectly and it can happen that these dates are not recognized as dates by the import functionality. In that scenario, I'd like to convert these numbers using anytime.

Is support for this date format wanted? Could Excel type dates overlap with other formats?

Allow format strings to be added

Would allow to overcome the currently fixed set for odd formats like

  • %y for two-digit year, or even
  • %I %p for 12 hour time and am/pm,
  • any other format we don't currently have.

Condition handling for NA values

Hello,

Are there any plans to change condition handling for NA values so that their presence in a vector simply throws a warning versus an error? For instance...

anydate(c("20010101", "02/02/1902", NA))

... gives me the error:

Error in eval(substitute(expr), envir, enclos) : Inadmissable input: NA

I have gotten around this issue of parsing date formats while accounting for the presence of NA values by using lubridate::parse_date_time(), but think that adding such functionality to your package, combined with its parsing flexibility (and the keystrokes saved by not having to specify the orders parameter in parse_date_time()) would make it an extremely attractive option for daily use.

Thank you,
Mike

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.