Giter Site home page Giter Site logo

Comments (16)

PhoenixDai avatar PhoenixDai commented on August 23, 2024 2

I also tested with loading data with timezone=UTC set and save the data as parquet format. Then load the parquet file without setting timezone. The date didn't change back. Hope this would help those who worry about setting timezone=UTC may mess up timestamp in other data sources.

from spark-sas7bdat.

witwall avatar witwall commented on August 23, 2024 1

if in the same timezone, get correct answer without more configure.
if different timezone, need setup timezone of spark to get the correct answer.

from spark-sas7bdat.

kylebarron avatar kylebarron commented on August 23, 2024 1

I think this should be added to the README because it's very easy to get tripped up by this and not realize.

from spark-sas7bdat.

Tagar avatar Tagar commented on August 23, 2024 1

There is a very active timestamp incompatibilities discussion on Spark dev list.

Here's a good summary, details and suggestions by the Spark development community
https://goo.gl/VV88c5

In short, the following is being proposed:

  • The TIMESTAMP WITHOUT TIME ZONE type should have LocalDateTime
    semantics.
  • The TIMESTAMP WITH LOCAL TIME ZONE type should have Instant semantics.
  • The TIMESTAMP WITH TIME ZONE type should have OffsetDateTime semantics.

This proposal is in accordance with the SQL standard and many major DB
engines.

If stars will be aligned properly, this will be part of Spark 3.0 release..

from spark-sas7bdat.

nelson2005 avatar nelson2005 commented on August 23, 2024

Putting this on hold while I test more

from spark-sas7bdat.

witwall avatar witwall commented on August 23, 2024

@nelson2005 I just noticed this issue too.
xy.sas7bdat.zip

right date is 2018-03-03

but got 2018-03-02

from spark-sas7bdat.

nelson2005 avatar nelson2005 commented on August 23, 2024

Okay, this does appear to be really happening as described in the top post. To get the correct dates I've added a spark UDF like

spark.udf.register(Tags.date2timestamp_p1day, (date: java.sql.Date) => Try(java.sql.Timestamp.valueOf(date.toLocalDate.atStartOfDay().plusDays(1))).toOption

from spark-sas7bdat.

witwall avatar witwall commented on August 23, 2024

after double check, I am afraid it is not a bug, but because of the timezone issue of you server.

from spark-sas7bdat.

nelson2005 avatar nelson2005 commented on August 23, 2024

@witwall can you elaborate? Both the SAS server where the sas7bdat was created and the spark server are in the same timezone

from spark-sas7bdat.

PhoenixDai avatar PhoenixDai commented on August 23, 2024

@nelson2005 With hint from witwall, I've figured out the following working configs:

  • For spark shell: spark-shell --master=local[*] --driver-java-options="-Duser.timezone=UTC" I believe spark-submit would work in this way too. (didn't test)
  • For inside a program like pyspark: .set('spark.driver.extraJavaOptions','-Duser.timezone=UTC').set('spark.executor.extraJavaOptions','-Duser.timezone=UTC')

Hope this can help you.

from spark-sas7bdat.

nelson2005 avatar nelson2005 commented on August 23, 2024

@PhoenixDai thanks, that seems to work.

from spark-sas7bdat.

nelson2005 avatar nelson2005 commented on August 23, 2024

It is easy to get tripped up on this (and might be worth a mention README), but I think it's fundamentally a Spark ecosystem issue, nothing specific to this project. People (including me) get tripped up all the time with this, for example here because Hive applies the server timezone to timestamps.

from spark-sas7bdat.

kylebarron avatar kylebarron commented on August 23, 2024

Interesting. I'm a relative newcomer to Spark, so this is the first time I'd gotten tripped up by it.

from spark-sas7bdat.

ss23697 avatar ss23697 commented on August 23, 2024

Hi,

Can someone please help me, Im facing a similar issue. I tried setting the timezone to UTC but it is not working in spark yarn mode. It is only working in local mode.

Can someone please provide a snippet for how to get the correct date by setting timezone in pyspark yarn mode.

Thanks
Subhankar

from spark-sas7bdat.

nelson2005 avatar nelson2005 commented on August 23, 2024

Do these options work for you?

--driver-java-options="-Droot.logger=ERROR,console -Duser.timezone=UTC" --conf "spark.yarn.extraJavaOptions=-Duser.timezone=UTC" --conf "spark.executor.extraJavaOptions=-Duser.timezone=UTC"

from spark-sas7bdat.

mrugesh1989 avatar mrugesh1989 commented on August 23, 2024

I am having similar issue for date conversion where date is off by 2 days for the date year > 4000. I think that issue is tackled in parso latest version 2.14 but the Saurfang latest version is using parso 2.11. Does anyone know how to make parso 2.14 version work with saurfang 3.0 version or 2.10 version?

from spark-sas7bdat.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.