Giter Site home page Giter Site logo

Comments (6)

logoshot avatar logoshot commented on May 18, 2024

I tried to print data from to_json, found that the time value is right before to_json, so the datetime value is changed during to_json.

    @staticmethod
    def to_jsonl(dataset, export_path, num_proc=1, **kwargs):
        """
        Export method for json/jsonl target files.

        :param dataset: the dataset to export.
        :param export_path: the path to store the exported dataset.
        :param num_proc: the number of processes used to export the dataset.
        :param kwargs: extra arguments.
        :return:
        """
        print(dataset['time'])
        dataset.to_json(export_path, force_ascii=False, num_proc=num_proc)

from data-juicer.

zhijianma avatar zhijianma commented on May 18, 2024

I tried to print data from to_json, found that the time value is right before to_json, so the datetime value is changed during to_json.

    @staticmethod
    def to_jsonl(dataset, export_path, num_proc=1, **kwargs):
        """
        Export method for json/jsonl target files.

        :param dataset: the dataset to export.
        :param export_path: the path to store the exported dataset.
        :param num_proc: the number of processes used to export the dataset.
        :param kwargs: extra arguments.
        :return:
        """
        print(dataset['time'])
        dataset.to_json(export_path, force_ascii=False, num_proc=num_proc)

The image bellow is the content after exporting dataset with iso in my local machine, and we can find the values of text remain unchaned , but with a ios format. Please check your local time and the Python dependencies, such as datasets, pandas, and pyarrow.
image

from data-juicer.

logoshot avatar logoshot commented on May 18, 2024

I install these package with command pip install -v -e .[all], so I think it is the default version, could you help me check your version, here is mine:

pandas 2.0.0
datasets 2.11.0
pyarrow 14.0.1

from data-juicer.

zhijianma avatar zhijianma commented on May 18, 2024

'2023-10-13 16:06:31'

This maybe a bug of pyarrow from v13.0.
I have tested from v11.0 to v14.0 with date_format iso, here is my results:

  • v11.0.0 and v12.0.0 write the right text 2023-10-13T16:06:31.000 to the jsonl file.
  • v13.0.0 and v14.0.0 write the wrong text 1970-01-01T00:00:01.697 to the jsonl file.
    So , we suggest you to downgrade pyarrw to v12.0.0 .

from data-juicer.

github-actions avatar github-actions commented on May 18, 2024

This issue is marked as stale because there has been no activity for 21 days. Remove stale label or add new comments or this issue will be closed in 3 day.

from data-juicer.

github-actions avatar github-actions commented on May 18, 2024

Close this stale issue.

from data-juicer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.