Giter Site home page Giter Site logo

xls2txt's Introduction

xls2txt: converting spreadsheets to text

Purpose

xls2txt and xsl2csv allow converting spreadsheet files to text for compatibility with terminals and command-line utilities (e.g. diff or less). Despite the name, they should work with both excel (xls, xlsx or xlsb) and OpenDocument (ods) files.

The two executables provided by this package broadly work the same way, returning one line per row separated by a unix newline (\n) and quoting field values if necessary, they differ only in their default field (cell) separator:

  • xls2txt separates cells with a TAB character
  • xls2csv uses a comma (,)

The tab separator seems like a better default for readability and compatibility with various unix utilities.

Interface

Both utilities have the same parameter and options, differing only by their default:

  • PATH is a mandatory path to a spreadsheet file to convert.

  • --sheet (-s) is the sheet to convert. By default, the first sheet is converted.

    sheet can be either the name of a sheet, or its position in the workbook (starting at 1, though 0 will be treated as 1).

  • --record-separator (-r) is the separator used between sheet rows, it defaults to a unix newline for both utilities.

  • --field-separator (f) is the separator used between sheet cells, it defaults to a TAB character for xsl2txt and a COMMA for xls2csv.

  • --formula specifies the formula display mode:

    • by default (cached-value), formulas are not displayed, if a formula cell has a cached value that is displayed, otherwise the cell is empty
    • if-empty will show the cached value if there is one, but will show the formula if there isn't one
    • always will always display the formula of a formula cell, never the cached value
  • the converted data is written to stdout

  • error messages can be written to stdout, including on success e.g. if an error occurs while parsing formulas and the formula mode is not the default, an error will be signaled then the formula mode will be reset to default

  • returns 0 if the entire conversion succeeded, 1 otherwise:

    • no input file was provided
    • the input file was not found or not recognized as a valid spreadsheet file
    • the provided record or field separator is invalid (it must be a single ascii character)
    • the specified sheet was not found
    • an error was found in one of the sheet cells
    • an problem occurred while writing data to stdout

Recipes

git text conversion

This allows viewing textual diffs of spreadsheet files using git log or git diff rather than get an unhelpful "binary files differ":

  1. create a $HOME/.gitattributes file, or set an arbitrary file as the attributes file (git config --global core.attributesFile <filename>)

  2. in that file, associate the relevant spreadsheet extensions with the proper category (hunk-header):

     *.ods diff=spreadsheet
     *.xls diff=spreadsheet
     *.xlsx diff=spreadsheet
     *.xlsb diff=spreadsheet
    
  3. set xls2txt (or xls2csv), possibly configured as you desire, as the diff text converter:

     git config --global diff.spreadsheet.textconv xls2txt
    

Changelog

1.1.0

  • Add support for displaying formulas

    • if there is no cached value for a formula cell
    • or instead of the cached value

1.0.2

  • Update dependencies to latest versions, bump edition

1.0.1

  • Switch back to calamine proper as 0.16.1 guarantees sheets are in-order

Thanks

  • calamine makes getting data out of spreadsheet files a breeze
  • rust-csv is not strictly necessary as generating value-separated data is fairly easy (as opposed to consuming it), but it provides confidence that the output will be properly quoted / escaped when necessary

xls2txt's People

Contributors

masklinn avatar w1m0r avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

viapeople-inc

xls2txt's Issues

Please commit Cargo.lock file

Please consider committing your Cargo.lock file: https://doc.rust-lang.org/cargo/faq.html#why-do-binaries-have-cargolock-in-version-control-but-not-libraries

The purpose of a Cargo.lock lockfile is to describe the state of the world at the time of a successful build. Cargo uses the lockfile to provide deterministic builds on different times and different systems, by ensuring that the exact same dependencies and versions are used as when the Cargo.lock file was originally generated.

This property is most desirable from applications and packages which are at the very end of the dependency chain (binaries). As a result, it is recommended that all binaries check in their Cargo.lock.

I'd like to make use of your package from a Nix derivation, and the tooling is built to consume Cargo.lock files: https://github.com/NixOS/nixpkgs/blob/master/doc/languages-frameworks/rust.section.md#buildrustpackage-compiling-rust-applications-with-cargo-compiling-rust-applications-with-cargo

Defaults are broken

❯ xls2csv thing.txt
error: the following required arguments were not provided:
  --record-separator <RECORD_SEPARATOR>
  --field-separator <FIELD_SEPARATOR>

Usage: xls2csv --record-separator <RECORD_SEPARATOR> --field-separator <FIELD_SEPARATOR> <PATH>

For more information, try '--help'.

Probably because of the messing about with the declarative API, this can be worked around by providing the default parameters explicitely but it's not great.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.