Giter Site home page Giter Site logo

martijn / xsv Goto Github PK

View Code? Open in Web Editor NEW
190.0 6.0 19.0 708 KB

High performance, lightweight .xlsx parser for Ruby that provides nothing a CSV parser wouldn't

Home Page: https://storck.io/posts/announcing-xsv-1-0-0/

License: MIT License

Ruby 99.72% Shell 0.28%
ruby xlsx excel

xsv's Introduction

Xsv .xlsx reader for Ruby

Test badge Yard Docs badge Gem Version badge

Xsv is a high performance, lightweight, pure Ruby parser for ISO/IEC 29500 Office Open XML spreadsheets (commonly known as Excel or .xlsx files). It strives to be minimal in the sense that it provides nothing a CSV reader wouldn't. This means it only deals with the minimal required formatting and cannot create or modify documents. Xsv can handle very large Excel files with minimal resources thanks to a custom streaming XML parser that is optimized for the Excel file format.

Xsv is designed for worksheets with a single table of data, optionally with a header row. It only casts values to basic Ruby types (integer, float, date and time) and does not deal with most formatting or more advanced functionality. Xsv has been production-ready since the initial release.

Xsv stands for 'Excel Separated Values', because Excel just gets in the way.

Installation

Add this line to your application's Gemfile:

gem 'xsv'

And then execute:

$ bundle

Or install it yourself as:

$ gem install xsv

Xsv targets ruby >= 2.7 and has a just single dependency, rubyzip. It has been tested successfully with MRI, JRuby, and TruffleRuby. It has no native extensions and is designed to be thread-safe.

Usage

Array and hash mode

Xsv has two modes of operation. By default, it returns an array for each row in the sheet:

workbook = Xsv.open("sheet.xlsx") # => #<Xsv::Workbook sheets=1>

# Access worksheet by index, 0 is the first sheet
sheet = workbook[0]
# or, access worksheet by name
sheet = workbook["Sheet1"]

# Iterate over rows
sheet.each do |row|
  row # => ["header1", "header2"]
end

# Access row by index (zero-based)
sheet[1] # => ["value1", "value2"]

Alternatively, it can load the headers from the first row and return a hash for every row by calling parse_headers! on the sheet or setting the parse_headers option on open:

# Parse headers for all sheets on open

workbook = Xsv.open("sheet.xlsx", parse_headers: true)

# Get the first row from the first sheet
workbook.first.first # => {"header1" => "value1", "header2" => "value2"}

# Manually parse headers for a single sheet

workbook = Xsv.open("sheet.xlsx")

sheet = workbook.first

sheet.first # => ["header1", "header2"]

sheet.parse_headers!

sheet.first # => {"header1" => "value1", "header2" => "value2"}

Xsv will raise Xsv::DuplicateHeaders if it detects duplicate values in the header row when calling #parse_headers! or when opening a workbook with parse_headers: true to ensure hash keys are unique.

Xsv::Sheet implements Enumerable so along with #each you can call methods like #first, #filter/#select, and #map on it. Likewise these methods can be used on Xsv::Workbook to iterate over sheets, for example:

# Get the name of all the sheets in a workbook
sheet_names = @workbook.map(&:name)

Opening a string or buffer instead of filename

Xsv.open accepts a filename, or an IO or String containing a workbook. Optionally, you can pass a block which will be called with the workbook as parameter, like File#open. Example of this together:

# Use an existing IO-like object as source

file = File.open("sheet.xlsx")

Xsv.open(file) do |workbook|
  puts workbook.inspect
end

# or even:

Xsv.open(file.read) do |workbook|
  puts workbook.inspect
end

Prior to Xsv 1.1.0, Xsv::Workbook.open was used instead of Xsv.open. The parameters are identical and the former is maintained for backwards compatibility.

Assumptions

Since Xsv treats worksheets like csv files it makes certain assumptions about your sheet:

  • In array mode, your data starts on the first row

  • In hash mode the first row of the sheet contains headers, followed by rows of data

If your data or headers do not start on the first row of the sheet you can tell Xsv to skip a number of rows:

sheet = workbook[0]
sheet.row_skip = 1

All operations will honour this offset, making the skipped rows unreachable.

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake test to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Performance and Benchmarks

Xsv is faster and more memory efficient than other gems because of two things: it only reads values from Excel files and it's based on a SAX-based parser instead of a DOM-based parser. If you want to read some background on this, check out my blog post on Efficient XML parsing in Ruby.

Jamie Schembri did a shootout of Xsv against various other Excel reading gems comparing parsing speed, memory usage, and allocations. Check our his blog post: Faster Excel parsing in Ruby.

Pre-1.0, Xsv used a native extension for XML parsing, which was faster than the native Ruby one (on MRI). But even the current native Ruby parser generally outperforms the competition. For maximum performance, it is recommended to enable YJIT.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/martijn/xsv. Please provide an .xlsx file with a minimum breaking example that is acceptable for inclusion in the source code repository.

License

Copyright © Martijn Storck and Xsv contributors

The gem is available as open source under the terms of the MIT License.

xsv's People

Contributors

dependabot[bot] avatar jdufresne avatar kevin-j-m avatar martijn avatar myabc avatar petergoldstein avatar sebikeller avatar senhalil avatar shkm avatar sinoue-1003 avatar texpert avatar til avatar victorlcampos avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

xsv's Issues

issue parsing xlsx (formatting?)

Hello when reading the following file only the numeric values are getting parsed correctly. all text-values are parsed to nil
formatting_issue.xlsx

this is what I did:
x = Xsv::Workbook.open(file_path)
sheet = x.sheets[0]
# Iterate over rows
sheet.each_row do |row|
p '-----------------', row # => ["header1", "header2"], etc.
end

This is what is printed:
"-----------------"
[nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil]
"-----------------"
[nil, nil, nil, nil, nil, 0, nil, nil, nil, nil, 1, nil, nil, nil, nil, nil, nil, nil, 1, nil, nil, nil, nil, nil, nil, nil, nil, 1, nil]
`

when copying all data from the xlsx into a simple txt-editor and then copying it back into excel. the data is getting parsed correctly:
"-----------------"
["PartNumber", "Brand", "SubBrand", "PopulationStatus", "PartType", "UniversalPart", "BasePartNumber", "Gtin", "GtinQualifierCode", "GtinQualifierName", "QuantitySize", "QuantitySizeUom", "ContainerType", "QuantityPerApplication", "QuantityPerApplicationUom", "QuantityPerApplicationQualifier", "EffectiveDate", "AvailableDate", "MinimumOrderQuantity", "MinimumOrderQuantityUom", "PartGroup", "PartSubGroup", "PartCategoryCode", "Unspsc", "VmrsCode", "Position", "DefaultMfrLabel", "DefaultApplicationQty", "Tags"]
"-----------------"
["P76550M-2", 1234, 1234, "Active (Publish to receiver database)", "Fuel Pump Module Assembly", 0, nil, 19826103970, "UP", "UPC", 1, "EA", "BX", 1, "EA", "MAX", nil, Fri, 23 Aug 2019, 1, "EA", nil, nil, 550303, 40151532, 44003258, nil, nil, 1, nil]

"Encountered unknown column type d"

I saw this issue #10 and i guess is the same error but for a different Column type. I have the stack from the sidekiq job:

2021-07-03T01:51:22.084Z 43432 TID-ovi6i5kkw WARN: Xsv::Error: Encountered unknown column type d
2021-07-03T01:51:22.084Z 43432 TID-ovi6i5kkw WARN: /Users/christian/.rvm/gems/ruby-2.6.1/gems/xsv-1.0.0/lib/xsv/sheet_rows_handler.rb:111:in `format_cell'
/Users/christian/.rvm/gems/ruby-2.6.1/gems/xsv-1.0.0/lib/xsv/sheet_rows_handler.rb:60:in `end_element'
/Users/christian/.rvm/gems/ruby-2.6.1/gems/xsv-1.0.0/lib/xsv/sax_parser.rb:68:in `block in parse'
/Users/christian/.rvm/gems/ruby-2.6.1/gems/xsv-1.0.0/lib/xsv/sax_parser.rb:19:in `loop'
/Users/christian/.rvm/gems/ruby-2.6.1/gems/xsv-1.0.0/lib/xsv/sax_parser.rb:19:in `parse'
/Users/christian/.rvm/gems/ruby-2.6.1/gems/xsv-1.0.0/lib/xsv/sheet.rb:64:in `each_row'
...

I put a log inside the format_cell method and i saw the problem was with the column of type custom with a date value "2021-07-01T05:11:26.00000026635825825".

I Attatch a sample excel file.

github_sample.xlsx

decimal records are not processed correctly

I have a file containing multiple decimal values.
0.001 | 0.01 | 0.1

when parsing the data as described in README the first decimal is not returned correctly!

This is what I do:

x = Xsv::Workbook.open("sheet.xlsx")
sheet = x.sheets[0]
sheet.each_row do |row|
  p '---row', row
end

This is what I see in the console (the first value is just 1 instead of 0.001):
[1, 0.01, 0.1]

Decimal values with %

I have a file with values: 0.0%, 0.1%, -0.3%

These are being interpreted as 0.000275, 0.000827, 0.003448

Is there a solution for this? I just need 0.0, 0.1, -0.3.

Xsv::Workbook.open does not accept Tempfile

Xsv::Workbook.open does not accept Tempfile, so its not possible to parse ActiveStorage::Blob files, its return

NoMethodError: undefined method `start_with?' for #<Tempfile: (closed)>

Encoding::UndefinedConversionError when parsing non-ASCII character

To reproduce, just create a new XLSX with a single cell with content :

Encoding::UndefinedConversionError ("\xC3" from ASCII-8BIT to UTF-8)

Also, after having input the date 05/05/1995, xsv gives me 1995-05-05. Is this expected? And is there a way to override this behavior?

I'm using version 1.0.0.pre

Thanks!

If PhoneticProperties data is included, NoMethodError (undefined method `+' for nil:NilClas) will be raised.

Asian strings such as Japanese can contain phonetic string data (furigana/yomigana)
In this case, the data structure will be as follows, and SharedStringsParser will not initialize @current_string, which will cause an error.

In order to deal with this, I would like to add a process to skip when an rPh is received.
Can I create a PR for this?

exapmle

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="35" uniqueCount="33">
<rPh><t>phonetic</t></rPh>
<si><t>Some strings</t></si>

error

irb(main):006:0>     @file = File.open("test/files/phonetic.xml")
irb(main):007:0>     strings = Xsv::SharedStringsParser.parse(@file)
Traceback (most recent call last):
        7: from bin/console:14:in `<main>'
        6: from (irb):7
        5: from /xsv/lib/xsv/shared_strings_parser.rb:9:in `parse'
        4: from /xsv/lib/xsv/sax_parser.rb:19:in `parse'
        3: from /xsv/lib/xsv/sax_parser.rb:19:in `loop'
        2: from //xsv/lib/xsv/sax_parser.rb:43:in `block in parse'
        1: from /xsv/lib/xsv/shared_strings_parser.rb:33:in `characters'
NoMethodError (undefined method `+' for nil:NilClass)

Generic HTML character entities not supported

Generic HTML entities in the source XML such as &#8212; or &#x2014; are not supported, but passed through unchanged.

I came across an XLSX file that had german umlauts encoded this way in it. They were not decoded to the actual characters but were output as &#x00E4; (should be ä).

Encountered unknown column type

Hi,
I'am try upload a xlsx file and i'm getting a erro on format cell:

"Encountered unknown column type n":["/opt/rubies/ruby-2.6.5/lib/ruby/gems/2.6.0/bundler/gems/xsv-027d612d4f74/lib/xsv/sheet_rows_handler.rb:28:in format_cell'","/opt/rubies/ruby-2.6.5/lib/ruby/gems/2.6.0/bundler/gems/xsv-027d612d4f74/lib/xsv/sheet_rows_handler.rb:98:in end_element'","/opt/rubies/ruby-2.6.5/lib/ruby/gems/2.6.0/bundler/gems/xsv-027d612d4f74/lib/xsv/sheet.rb:62:in `sax_parse'","/opt/rubies/ruby-2.6.5/lib/ruby/gems/2.6.0/bundler/gems/xsv-027d612d4f74/lib/xsv/sheet.rb:62:in

Can you give me a light to fix it?

Enhance parsing by headers

Hello there! Thank you for the gem, you made a really good job!

I wanna suggest one feature that I can implement as a contributor if we'll decide to do that. So, there's the subject.

In a gem called roo (definitely you know about it), there's a very good feature that allows passing a hash by a set of headers. My team uses roo for parsing datasheets with headers in Russian, and then we use the content of a datasheet to create some AR entities for example.

In roo it looks like:

SET_OF_HEADERS = {
  name: /Название организации|Название/i,
  inn: /ИНН/i,
  kpp: /КПП/i
}.freeze

xlsx = Roo::Excelx.new(filepath)
raw_data = xlsx.sheet(0).parse(SET_OF_HEADERS)

raw_data.first.keys # => [:name, :inn, :kpp]

That allows you to define the keys of your data items so there is no need to transform the keys of every hash to pass data to the next method for example. And with that feature, you can also automatically detect the offset between the first significant row of your data and some blank space, because sometimes docs that we parse looks like this:

image

As you can see, there are two rows that shouldn't be present in parsed data – it just the information to one who works with this template on how to fill rows.

So if this interesting for you I could contribute some time to implement that feature in xsv too.

Best regards.

NameError: uninitialized constant Xsv in Sidekiq Worker

It works great in my development environment and specs run successfully, but it keeps throwing that message in the AWS deployment. I'm using docker-compose to manage the deployments and I have erased the project images and containers multiple times.
Previosuly, I was using Roo Gem in this Job without loading problems.
Can you give me a hand please?

# With this call it throws _**NameError: uninitialized constant Xsv**_
x = ::Xsv::Workbook.open(report_result.document.url) 
# With this call it throws _**NameError: uninitialized constant ImportWorker::Xsv**_
x = Xsv::Workbook.open(report_result.document.url) 
**NameError: uninitialized constant Xsv**
/app/app/workers/import_worker.rb:16:in `perform'
/usr/local/bundle/gems/sidekiq-6.0.7/lib/sidekiq/processor.rb:196:in `execute_job'
/usr/local/bundle/gems/sidekiq-6.0.7/lib/sidekiq/processor.rb:164:in `block (2 levels) in process'
/usr/local/bundle/gems/sidekiq-6.0.7/lib/sidekiq/middleware/chain.rb:138:in `block in invoke'
/usr/local/bundle/gems/sidekiq-failures-1.0.0/lib/sidekiq/failures/middleware.rb:9:in `call'
/usr/local/bundle/gems/sidekiq-6.0.7/lib/sidekiq/middleware/chain.rb:140:in `block in invoke'
/usr/local/bundle/gems/sidekiq-6.0.7/lib/sidekiq/middleware/chain.rb:143:in `invoke'
/usr/local/bundle/gems/sidekiq-6.0.7/lib/sidekiq/processor.rb:163:in `block in process'
/usr/local/bundle/gems/sidekiq-6.0.7/lib/sidekiq/processor.rb:136:in `block (6 levels) in dispatch'
/usr/local/bundle/gems/sidekiq-6.0.7/lib/sidekiq/job_retry.rb:111:in `local'
/usr/local/bundle/gems/sidekiq-6.0.7/lib/sidekiq/processor.rb:135:in `block (5 levels) in dispatch'
/usr/local/bundle/gems/sidekiq-6.0.7/lib/sidekiq/rails.rb:43:in `block in call'
/usr/local/bundle/gems/activesupport-5.2.4.4/lib/active_support/execution_wrapper.rb:87:in `wrap'
/usr/local/bundle/gems/activesupport-5.2.4.4/lib/active_support/reloader.rb:73:in `block in wrap'
/usr/local/bundle/gems/activesupport-5.2.4.4/lib/active_support/execution_wrapper.rb:87:in `wrap'
/usr/local/bundle/gems/activesupport-5.2.4.4/lib/active_support/reloader.rb:72:in `wrap'
/usr/local/bundle/gems/sidekiq-6.0.7/lib/sidekiq/rails.rb:42:in `call'
/usr/local/bundle/gems/sidekiq-6.0.7/lib/sidekiq/processor.rb:131:in `block (4 levels) in dispatch'
/usr/local/bundle/gems/sidekiq-6.0.7/lib/sidekiq/processor.rb:257:in `stats'
/usr/local/bundle/gems/sidekiq-6.0.7/lib/sidekiq/processor.rb:126:in `block (3 levels) in dispatch'
/usr/local/bundle/gems/sidekiq-6.0.7/lib/sidekiq/job_logger.rb:13:in `call'
/usr/local/bundle/gems/sidekiq-6.0.7/lib/sidekiq/processor.rb:125:in `block (2 levels) in dispatch'
/usr/local/bundle/gems/sidekiq-6.0.7/lib/sidekiq/job_retry.rb:78:in `global'
/usr/local/bundle/gems/sidekiq-6.0.7/lib/sidekiq/processor.rb:124:in `block in dispatch'
/usr/local/bundle/gems/sidekiq-6.0.7/lib/sidekiq/logger.rb:10:in `with'
/usr/local/bundle/gems/sidekiq-6.0.7/lib/sidekiq/job_logger.rb:33:in `prepare'
/usr/local/bundle/gems/sidekiq-6.0.7/lib/sidekiq/processor.rb:123:in `dispatch'
/usr/local/bundle/gems/sidekiq-6.0.7/lib/sidekiq/processor.rb:162:in `process'
/usr/local/bundle/gems/sidekiq-6.0.7/lib/sidekiq/processor.rb:78:in `process_one'
/usr/local/bundle/gems/sidekiq-6.0.7/lib/sidekiq/processor.rb:68:in `run'
/usr/local/bundle/gems/sidekiq-6.0.7/lib/sidekiq/util.rb:15:in `watchdog'
/usr/local/bundle/gems/sidekiq-6.0.7/lib/sidekiq/util.rb:24:in `block in safe_thread'

Get raw value of the cell

I have xlsx file that has a date column. The underlying value is a float (has a time component). I want to get the raw value of the cell to convert it to datetime even though the cell is formatted as Date.

I looked through code and could not find a good was to access the cell's raw value. Is there way to return cell's raw value or override the formatting?

1.0.2 NoMethodError: undefined method `scan' for nil:NilClass

Staring with version 1.0.2 I'm seeing the following error:

     NoMethodError:
       undefined method `scan' for nil:NilClass
     # ./.bundle/ruby/2.7.0/gems/xsv-1.0.2/lib/xsv/helpers.rb:108:in `parse_number_format'
     # ./.bundle/ruby/2.7.0/gems/xsv-1.0.2/lib/xsv/sheet_rows_handler.rb:104:in `format_cell'
     # ./.bundle/ruby/2.7.0/gems/xsv-1.0.2/lib/xsv/sheet_rows_handler.rb:60:in `end_element'
     # ./.bundle/ruby/2.7.0/gems/xsv-1.0.2/lib/xsv/sax_parser.rb:68:in `block in parse'
     # ./.bundle/ruby/2.7.0/gems/xsv-1.0.2/lib/xsv/sax_parser.rb:19:in `loop'
     # ./.bundle/ruby/2.7.0/gems/xsv-1.0.2/lib/xsv/sax_parser.rb:19:in `parse'
     # ./.bundle/ruby/2.7.0/gems/xsv-1.0.2/lib/xsv/sheet.rb:64:in `each_row'

I'm guessing this was introduced in c653bb8

It seems the format passed to parse_number_format isn't always a string.

The document was generated by caxlsx.

Breakage on Excel files without cell column names

Hi, I happened to hit upon an Excel file that has cells without a column name. Crazy, right? Anyway, a few quick fixes sorted that out. What do you think?

helpers.rb:

    # Return the index number for the given Excel column name (i.e. "A1" => 0)
    def column_index(col)
      return if col == nil # <-- if receiving a nil column name, return
      col.each_codepoint.reduce(0) do |sum, n|
        break sum - 1 if n < A_CODEPOINT # reached a number

        sum * 26 + (n - A_CODEPOINT + 1)
      end
    end

sheet_bounds_handler.rb:

    def start_element(name, attrs)
      case name
      #...
      when "v"
        col = column_index(@cell)
        @max_column = col if col != nil && col > @max_column # <-- Don't update @max_column
        @max_row = @row if @row > @max_row
      #...

sheet_rows_handler.rb:

    def initialize(mode, empty_row, workbook, row_skip, last_row, &block)
      #...
      @current_cell = {}
      @current_cell_number = 0 # <-- Trackable cell number
      #...

    def start_element(name, attrs)
      case name
      when "c"
        @current_cell = attrs
        @current_cell_number += 1 # <-- Advance cell number
        @current_value.clear
      when "v", "is", "t"
        @store_characters = true
      when "row"
        @current_row = @empty_row.dup
        @current_cell_number = 0 # <-- Reset cell number
        @current_row_number = attrs[:r].to_i
      end
    end

    def end_element(name)
      case name
      #...
      when "c"
        col_index = column_index(@current_cell[:r])
        col_index = @current_cell_number unless col_index != nil # <-- use tracked cell number if there is no column name
      #...

Sorry for the quick code drop, don't have time for a pull request. Does this help?

multiple spaces in cell value converted to 1 space

multiple spaces in cell value converted to 1 space
had cell value with "foo--bar" where each "-" was a space.
when read, xsv reported the cell value as "foo-bar", having only a single space.
windows, ruby 2.5.8p224, xsv (0.3.13)

Cell values are mapped to incorrect keys in Hash mode

To reproduce:

Parsing the following file complex-headers.xlsx

@sheet = Xsv.open("./complex-headers.xlsx").sheets[0]
@sheet.parse_headers!
p @sheet[0]

Expected

{nil=>nil, "Customer nr."=>537066, "Customer name"=>"Unilever", "Item number"=>391790102, "Description"=>"Soap", "Qty outst."=>6, "Another Ref"=>"18-01-2024", "Reference"=>"#JL-65568", "Site address"=>"5 Cross Street", "Zdepot"=>"GG"}

Actual

=> {nil=>"GG", "Customer nr."=>nil, "Customer name"=>537066, "Item number"=>"Unilever", "Description"=>391790102, "Qty outst."=>"Soap", "Another Ref"=>6, "Reference"=>"18-01-2024", "Site address"=>"#JL-65568", "Zdepot"=>"5 Cross Street"}

See #55 for a failing test case.

Handle null values in cells.

It turns out Excel completely skips the <c /> element for cells without a value as illustrated in cell D3 and E3 in the example below. This calls for some more sophisticated row loading

<row r="3" spans="1:18" x14ac:dyDescent="0.3">
  <c r="A3" s="4" t="str"><f t="shared" si="0"/><v>ABCD</v></c>
  <c r="B3" t="s"><v>97</v></c>
  <c r="C3"><v>16</v></c>
  <c r="F3" t="s"><v>147</v></c>
  <c r="G3" t="s"><v>8</v></c>

Wrongly parsed hash

The hash parsing the F2022-10.xlsx is wrong:

$ pry
[1] pry(main)> require 'xsv'
=> true
[2] pry(main)> x = Xsv.open 'F2022-10.xlsx', :parse_headers => true
=> #<Xsv::Workbook:300 sheets=1 trim_empty_rows=false>
[3] pry(main)> s = x.sheets.first
=> #<Xsv::Sheet:320 mode=hash>
[4] pry(main)> s[0]
=> {"Interní číslo"=>"2022009",
 nil=>61,
 "Typ faktury"=>"FAKTURA",
 "Název firmy nebo jméno osoby"=>"Foo bar",
 "IČO"=>1234567,
 "DIČ"=>"CZ1234567",
 "Ulice"=>nil,
 "PSČ"=>"Velehradská 1",
 "Město"=>"686 03",
 "Datum vystavení"=>"Staré Město",
 "Datum splatnosti"=>nil,
 "Měna"=>#<Date: 2022-08-01 ((2459793j,0s,0n),+0s,2299161j)>,
 "Pořadí"=>#<Date: 2022-08-15 ((2459807j,0s,0n),+0s,2299161j)>,
 "Název"=>nil,
 "Množství"=>"EUR",
 "Cena za MJ"=>nil}

The correct output should be:

--- wrong
+++ correct
@@ -7,18 +7,17 @@
 => #<Xsv::Sheet:320 mode=hash>
 [4] pry(main)> s[0]
 => {"Interní číslo"=>"2022009",
- nil=>61,
  "Typ faktury"=>"FAKTURA",
  "Název firmy nebo jméno osoby"=>"Foo bar",
  "IČO"=>1234567,
  "DIČ"=>"CZ1234567",
- "Ulice"=>nil,
- "PSČ"=>"Velehradská 1",
- "Město"=>"686 03",
- "Datum vystavení"=>"Staré Město",
- "Datum splatnosti"=>nil,
- "Měna"=>#<Date: 2022-08-01 ((2459793j,0s,0n),+0s,2299161j)>,
- "Pořadí"=>#<Date: 2022-08-15 ((2459807j,0s,0n),+0s,2299161j)>,
- "Název"=>nil,
- "Množství"=>"EUR",
- "Cena za MJ"=>nil}
+ "Ulice"=>"Velehradská 1",
+ "PSČ"=>"686 03",
+ "Město"=>"Staré Město",
+ "Datum vystavení"=>#<Date: 2022-08-01 ((2459793j,0s,0n),+0s,2299161j)>,
+ "Datum splatnosti"=>#<Date: 2022-08-15 ((2459807j,0s,0n),+0s,2299161j)>,
+ "Měna"=>"EUR",
+ "Pořadí"=>1,
+ "Název"=>"Item 1,
+ "Množství"=>2,
+ "Cena za MJ"=>61}

I think that the parser is confused by the empty columns.

NoMethodError: undefined method `[]' for nil:NilClass

I'm using xsv v1.1.0, and trying to open this file.

The error I get is:

Xsv::Workbook.open(path_to_file)
NoMethodError: undefined method `[]' for nil:NilClass
from /Users/fionasanggang/.asdf/installs/ruby/2.7.1/lib/ruby/gems/2.7.0/gems/xsv-1.1.0/lib/xsv/sheet.rb:36:in `initialize'

I believe this document was generated using the Open XML SDK 2.5 for Office, due to the <x:worksheet>, <x:sheetData> elements in the sheet.xml file.

Do you have any hints or suggestions?

hash mode as an argument or parse_headers! can return self

It is possible to write the following for array mode but for hash mode one is forced to create intermediate variables.

Xsv::Workbook.open('file.xlsx').sheets[0].each_row do |row|
  row # => ["value1", "value2"]
end

If there was a :mode argument to open function or if the function parse_headers! would return self instead of a fixed true (or both? 🤔) this would let the user do one of the following:

Xsv::Workbook.open('file.xlsx', mode: :hash).sheets[0].each_row do |row|
  row # => {"header1" => "value1", "header2" => "value2"}
end

Xsv::Workbook.open('file.xlsx').sheets[0].parse_headers!.each_row do |row|
  row # => {"header1" => "value1", "header2" => "value2"}
end

:symbolize_names option for hash mode

It would fit well with other hash returning functions like JSON.parse and would be of use.

I am not sure but I suspect there would be less memory allocation since currently, each row allocates a new string key for the header labels in the :hash mode.

Similar to #24 this could be done in two ways (or both? 🤔) with a symbolize_names: true argument i) to open function and/or ii) to parse_headers! function.

Parse empty file

Cannot parse empty excel with code:

workbook = Xsv::Workbook.open(@io, trim_empty_rows: true)
sheet = workbook.sheets[0]
sheet.parse_headers!

sheet.each_row do |row|
 row # => ["header1", "header2"], etc.
end

because of error:

NoMethodError: undefined method `zip' for nil:NilClass
from /Users/me/.rbenv/versions/2.7.1/lib/ruby/gems/2.7.0/gems/xsv-0.3.16/lib/xsv/sheet.rb:123:in `empty_row'
Caused by CSV::MalformedCSVError: Invalid byte sequence in UTF-8 in line 1.
from /Users/me/.rbenv/versions/2.7.1/lib/ruby/2.7.0/csv/parser.rb:346:in `rescue in parse'
Caused by CSV::Parser::InvalidEncoding: CSV::Parser::InvalidEncoding
from /Users/me/.rbenv/versions/2.7.1/lib/ruby/2.7.0/csv/parser.rb:229:in `read_chunk'

Thanks for help.

#<NoMethodError: undefined method `get_input_stream' for nil:NilClass>

Hi, I tried to parse this xlsx file, but I am getting this error:

x = Xsv::Workbook.open(file)

NoMethodError: undefined method get_input_stream' for nil:NilClass from /Users/tomasdurcak/.rbenv/versions/2.6.6/lib/ruby/gems/2.6.0/gems/xsv-0.3.14/lib/xsv/workbook.rb:78:in fetch_shared_strings'
Caused by NoMethodError: undefined method get_input_stream' for nil:NilClass from /Users/tomasdurcak/.rbenv/versions/2.6.6/lib/ruby/gems/2.6.0/gems/xsv-0.3.14/lib/xsv/workbook.rb:78:in fetch_shared_strings'
Caused by CSV::MalformedCSVError: Illegal quoting in line 2.
from /Users/tomasdurcak/.rbenv/versions/2.6.6/lib/ruby/2.6.0/csv/parser.rb:879:in `parse_quotable_robust'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.