Giter Site home page Giter Site logo

Comments (5)

sdispater avatar sdispater commented on May 18, 2024

It's not limited to parse():

>>> import pendulum
>>> pendulum.create(2016, 11, 12, 2, 9, 39, 594000, 'America/Panama').microsecond
593999

After checking, it seems that the wrong value is due to the C extension to calculate offsets. If you install pendulum without C extensions it works properly:

PENDULUM_EXTENSIONS=0 pip install pendulum

from pendulum.

dokai avatar dokai commented on May 18, 2024

It seems the difference between the Python and C versions is that Python rounds the value before casting to int which the C version does not.

C

microsecond = (int64_t) (unix_time * 1000000) % 1000000;
    if (microsecond < 0) {
        microsecond += 1000000;
    }

Python

microsecond = int(round(unix_time % 1, 6) * 1e6)

Using the 594000 microseconds from above we can see that due to the way floats are handled the modulo value varies:

>>> 1.594000 % 1
0.5940000000000001

>>> 123.594000 % 1
0.5939999999999941

and without the round() call the Python version would be similar to C.

>>> int((123.594000 % 1 ) * 1e6)
593999
>>> int(round(123.594000 % 1, 6) * 1e6)
594000

from pendulum.

danilobellini avatar danilobellini commented on May 18, 2024

As @dokai pointed, it's a floating point truncating error, but I think this has something similar to what I described in #71. Always rounding instead of truncating would be a workaround to the issue, but I think Pendulum should never use floating point numbers internally, unless the result itself has to be a float number.

In this case, it goes back to the timezone _normalize method (pendulum.tz.Timezone._normalize) when it calculates the unix timestamp as:

unix_time = tr.unix_time - (tr.pre_time - dt).total_seconds()
unix_time = tr.unix_time + (dt - tr.time).total_seconds()

The method total_seconds returns a floating point. Every single call to delta.total_seconds() internal to Pendulum should be replaced by delta.days * 86400 + delta.seconds dealing with the microseconds part elsewhere (an extra local_time parameter in both C and Python implementations). This way, nothing is float in between.

The given example:

>>> import pendulum
>>> from datetime import datetime
>>> dt = datetime(2016, 11, 12, 2, 9, 39, 594000)
>>> tz = pendulum.timezone("America/Panama")
>>> tr = tz.transitions[-1]

This method would be internally called on creation:

>>> tz._normalize(dt, "post")
(2016, 11, 12, 2, 9, 39, 593999, <TimezoneInfo [America/Panama, -18000, False]>)

And it does this:

>>> unix_time = tr.unix_time + (dt - tr.time).total_seconds()
>>> offset = tz._tzinfos[tr._transition_type_index].offset
>>> pendulum._extensions._helpers.local_time(unix_time, offset)
(2016, 11, 12, 2, 9, 39, 593999)

But for very high year values, the microsecond is simply lost (i.e., the unix_time value itself isn't valid), no matter the Python/C implementation:

>>> dt = datetime(2316, 11, 12, 2, 9, 39, 857)
>>> unix_time = tr.unix_time + (dt - tr.time).total_seconds()
>>> "%.18f" % unix_time # Rounding/truncating wouldn't be enough
'10945955379.000856399536132812'
>>> dt = datetime(2222, 11, 12, 2, 9, 39, 1454)
>>> unix_time = tr.unix_time + (dt - tr.time).total_seconds()
>>> "%.18f" % unix_time # Neither rounding nor truncating would do it, again
'7979584179.001453399658203125'
>>> dt = datetime(2180, 7, 4, 13, 16, 8, 12)
>>> unix_time = tr.unix_time + (dt - tr.time).total_seconds()
>>> "%.18f" % unix_time
'6643016168.000011444091796875'

from pendulum.

sdispater avatar sdispater commented on May 18, 2024

@danilobellini I agree that working with floating point numbers are prone to error and approximations.

I will check if I can cook up a better implementation to be sure we don't lose information.

from pendulum.

sdispater avatar sdispater commented on May 18, 2024

Commit 184b94a on the develop branch fixes the issue:

>>> import pendulum
>>> dt = pendulum.parse('2016-11-12T02:09:39.594000', 'America/Panama')
>>> dt.isoformat()
'2016-11-12T02:09:39.594000-05:00'
>>> dt.microsecond
594000
>>> dt = pendulum.create(2316, 11, 12, 2, 9, 39, 857, 'America/Panama')
>>> dt.isoformat()
'2316-11-12T02:09:39.000857-05:00'
>>> dt.microsecond
857

Basically, microseconds are now treated separately to avoid having to round the value.

from pendulum.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.