Comments (2)
@rion-saeon Sorry if I wasn't clear. I was trying to make this issue being as much on point as possible. I will give more context to help clarify and try to address your points:
-
The data in
toy
is a sample of a column from a dataset collected from microcontroller-based hardware that saves it at CSV. -
Data contamination occurs due to noisy electrical transmission lines or other buggy software bit I might have little control over. Therefore, it is potentially challenging to know how to "fix" the random contamination using
str_replace
and maybe even scientifically wrong to do so given the uncertainty of the actual timestamp (i.e., in this case the month needed to be fixed, which has lesser uncertainty but we don't get to choose how electrical noise corrupts the data). Instead, I would prefer to have it beNA
as the coercion normally fails underparse_date_time()
. We can discuss about fillingNA
s, but that's beyond the point of reporting the bug concisely here. -
I have not written the library these devices use and have little to no control over the choices made. Hence, as much as I wanted to name the data however I pleased, I have to work with what the device gives me.
-
Since I am the developer of an R package that munges data from these devices, I do control what to do with corrupted data. I use
parse_date_time
to shift from thestring
I receive from CSV to properly formatteddatetime
. Having inconsistent behavior of the function broke my package (i.e., when using it inside mutate, you don't get NA but another form of more insidious corruption: a valid datetime which is incorrect!). -
Finally, I am not asking how to fix the corruption on this string. I'm documenting what I believe is a bug. I used the comparison from the
clock
package to show what I believe should be the expected behavior inlubridate
.
I believe the core of the issue still holds: parse_date_time()
should either return NA
both when called directly or inside a mutate
. The behavior that I saw with incorrect conversion to a valid but incorrect datetime might potentially affect other people.
Other people have reported weird behavior in similar issues. See here and here.
from lubridate.
@matiasandina OK cool and actually, thanks for pointing this potential bug, out. I now understand that this could be a bug and needs to be looked at because, I am also running parse_date_time
inside of mutate
but there shouldn't be corruption of my data. Instead, I specify different timestamp formats that come in which, similarly to you, I have little control over (data downloaded onto different computers cause annoying timestamp discrepancies among datasets).
I also agree that NA
should be the best result as one may end up incorporating more errors or only resolve some, when wanting to rectify things using elaborate workarounds.
However, if one starts to explore your corrupted data there would be a pattern you should be able to get around to using if_else
or case_when
functions, regex
and stringr
package.
In any case, it seems like it will be wise to move over to clock
to be safe and, the package seems to be more advanced. Pity it's not part of the tidyverse
as I am a bit of a tidy' snob.
Finally, I am thus, suprised and somewhat disappointed that this has not been looked at since your OP!
PS I deleted my post as it's irrelevant at this point.
from lubridate.
Related Issues (20)
- Problems with POSIXct in R 4.3.2 HOT 1
- 'OO' format option not recognized for parsing ISO 8601 time zone offsets
- `parse_date_time()` cannot match missing zeroes
- month() and otehrs fail on objects from class 'timeDate'
- ymd_hms() function left-pads some dates that have missing "seconds" values
- round_date in 0.1 sec doesn't work correctly
- FR: int_overlaps with exclusive endpoints
- unique() always zero for periods HOT 1
- Implement Set Operations methods for Dates
- Parsing dates with `my` seems to have a limit size
- Do we need something like `%m-%` to subtract years from leap 02/29? HOT 1
- Cannot compute `<date> + lubridate::year(1)` when `<date>` is a leap day.
- m:s:ms time data
- `dmy()` not failing (and returning incorrect date) on wrong date format
- ceiling_date() issue when using multi units
- Fractional Seconds with conversion and rounding/truncation?
- mdy("04 July 2019") GIVES "2019-04-20" : Instead should give an error.
- data.table merge doesn't work with intervals
- yearmonth() throws an error with C_force_tz
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lubridate.