Giter Site home page Giter Site logo

Comments (2)

matiasandina avatar matiasandina commented on June 20, 2024

@rion-saeon Sorry if I wasn't clear. I was trying to make this issue being as much on point as possible. I will give more context to help clarify and try to address your points:

  • The data in toy is a sample of a column from a dataset collected from microcontroller-based hardware that saves it at CSV.

  • Data contamination occurs due to noisy electrical transmission lines or other buggy software bit I might have little control over. Therefore, it is potentially challenging to know how to "fix" the random contamination using str_replace and maybe even scientifically wrong to do so given the uncertainty of the actual timestamp (i.e., in this case the month needed to be fixed, which has lesser uncertainty but we don't get to choose how electrical noise corrupts the data). Instead, I would prefer to have it be NA as the coercion normally fails under parse_date_time(). We can discuss about filling NAs, but that's beyond the point of reporting the bug concisely here.

  • I have not written the library these devices use and have little to no control over the choices made. Hence, as much as I wanted to name the data however I pleased, I have to work with what the device gives me.

  • Since I am the developer of an R package that munges data from these devices, I do control what to do with corrupted data. I use parse_date_time to shift from the string I receive from CSV to properly formatted datetime. Having inconsistent behavior of the function broke my package (i.e., when using it inside mutate, you don't get NA but another form of more insidious corruption: a valid datetime which is incorrect!).

  • Finally, I am not asking how to fix the corruption on this string. I'm documenting what I believe is a bug. I used the comparison from the clock package to show what I believe should be the expected behavior in lubridate.

I believe the core of the issue still holds: parse_date_time() should either return NA both when called directly or inside a mutate. The behavior that I saw with incorrect conversion to a valid but incorrect datetime might potentially affect other people.

Other people have reported weird behavior in similar issues. See here and here.

from lubridate.

rion-saeon avatar rion-saeon commented on June 20, 2024

@matiasandina OK cool and actually, thanks for pointing this potential bug, out. I now understand that this could be a bug and needs to be looked at because, I am also running parse_date_time inside of mutate but there shouldn't be corruption of my data. Instead, I specify different timestamp formats that come in which, similarly to you, I have little control over (data downloaded onto different computers cause annoying timestamp discrepancies among datasets).

I also agree that NA should be the best result as one may end up incorporating more errors or only resolve some, when wanting to rectify things using elaborate workarounds.

However, if one starts to explore your corrupted data there would be a pattern you should be able to get around to using if_else or case_when functions, regex and stringr package.

In any case, it seems like it will be wise to move over to clock to be safe and, the package seems to be more advanced. Pity it's not part of the tidyverse as I am a bit of a tidy' snob.

Finally, I am thus, suprised and somewhat disappointed that this has not been looked at since your OP!

PS I deleted my post as it's irrelevant at this point.

from lubridate.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.