Giter Site home page Giter Site logo

Comments (2)

dannguyen avatar dannguyen commented on July 1, 2024

tl;dr: This doesn't answer your question, but it's a long explanation of why this feature (probably) doesn't exist. And also, why the behavior you've noticed ("001" being interpreted as a string and not a number, by an importing program) is just a quirk specific to that program – it's not CSV standard behavior.

FWIW, how a CSV file is interpreted – particularly, how to typecast each column, is generally up to the interpreting program. For example, importing your latter example (i..e. with quoted COUNTYFP10 values) into a pandas.DataFrame will result in the same as importing the non-quoted data: the COUNTYFP10 column will be interpreted as integers:

>>> import pandas as pd
>>> df = pd.read_csv('/tmp/thedata.csv')
>>> print(df)
       name  COUNTYFP10
0     Adair           1
1    Andrew           3
2  Atchison           5

>>> print(df.dtypes)
name          object
COUNTYFP10     int64
dtype: object

Not sure how Agate handles the import, just using Pandas as an example of another client program that has its own ways of auto-guessing the data types of imported CSVs.

The upshot of all this: I'd be really surprised if Agate or csvkit had a way to specify quoting by column when exporting to CSV, because there wouldn't be any point. There's just no accepted standard for how to assume datatypes of imported CSV, because CSV only has an understanding of every value being text...as opposed to something like JSON, in which there's a concept of strings, numbers, and booleans.

My advice is that you should accept having to configure datatyping on the importing program's side, e.g. in Excel's Text Import Wizard.

Or in Python pandas, it would be to set the dtype argument, e.g.

>>> df = pd.read_csv('/tmp/thedata.csv', dtype={'COUNTYFP10': str})
>>> print(df)
       name COUNTYFP10
0     Adair        001
1    Andrew        003
2  Atchison        005
>>> print(df.dtypes)
name          object
COUNTYFP10    object
dtype: object

It's a pain in the ass, but this kind of vaguery is just inherent to the CSV format. You don't want to depend on application-specific quirks when figuring out the import-export workflow.

from agate.

jpmckinney avatar jpmckinney commented on July 1, 2024

to_csv passes through keyword arguments to Python's csv module. You can import csv and pass quoting=csv.QUOTE_ALL or maybe even quoting=csv.QUOTE_MINIMAL to to_csv to achieve what you want. https://docs.python.org/3/library/csv.html#csv.QUOTE_ALL

from agate.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.