ruby-docx / docx Goto Github PK

a ruby library/gem for interacting with .docx files

License: MIT License

Ruby 100.00%

docx word ruby rubygem office-open-xml ooxml ooxml-parser hacktoberfest

docx's Introduction

docx

A ruby library/gem for interacting with .docx files. currently capabilities include reading paragraphs/bookmarks, inserting text at bookmarks, reading tables/rows/columns/cells and saving the document.

Usage

Prerequisites

Ruby 2.6 or later

Install

Add the following line to your application's Gemfile:

gem 'docx'

And then execute:

bundle install

Or install it yourself as:

gem install docx

Reading

require 'docx'

# Create a Docx::Document object for our existing docx file
doc = Docx::Document.open('example.docx')

# Retrieve and display paragraphs
doc.paragraphs.each do |p|
  puts p
end

# Retrieve and display bookmarks, returned as hash with bookmark names as keys and objects as values
doc.bookmarks.each_pair do |bookmark_name, bookmark_object|
  puts bookmark_name
end

Don't have a local file but a buffer? Docx handles those to:

require 'docx'

# Create a Docx::Document object from a remote file
doc = Docx::Document.open(buffer)

# Everything about reading is the same as shown above

Rendering html

require 'docx'

# Retrieve and display paragraphs as html
doc = Docx::Document.open('example.docx')
doc.paragraphs.each do |p|
  puts p.to_html
end

Reading tables

require 'docx'

# Create a Docx::Document object for our existing docx file
doc = Docx::Document.open('tables.docx')

first_table = doc.tables[0]
puts first_table.row_count
puts first_table.column_count
puts first_table.rows[0].cells[0].text
puts first_table.columns[0].cells[0].text

# Iterate through tables
doc.tables.each do |table|
  table.rows.each do |row| # Row-based iteration
    row.cells.each do |cell|
      puts cell.text
    end
  end

  table.columns.each do |column| # Column-based iteration
    column.cells.each do |cell|
      puts cell.text
    end
  end
end

Writing

require 'docx'

# Create a Docx::Document object for our existing docx file
doc = Docx::Document.open('example.docx')

# Insert a single line of text after one of our bookmarks
doc.bookmarks['example_bookmark'].insert_text_after("Hello world.")

# Insert multiple lines of text at our bookmark
doc.bookmarks['example_bookmark_2'].insert_multiple_lines_after(['Hello', 'World', 'foo'])

# Remove paragraphs
doc.paragraphs.each do |p|
  p.remove! if p.to_s =~ /TODO/
end

# Substitute text, preserving formatting
doc.paragraphs.each do |p|
  p.each_text_run do |tr|
    tr.substitute('_placeholder_', 'replacement value')
  end
end

# Save document to specified path
doc.save('example-edited.docx')

Writing to tables

require 'docx'

# Create a Docx::Document object for our existing docx file
doc = Docx::Document.open('tables.docx')

# Iterate over each table
doc.tables.each do |table|
  last_row = table.rows.last
  
  # Copy last row and insert a new one before last row
  new_row = last_row.copy
  new_row.insert_before(last_row)

  # Substitute text in each cell of this new row
  new_row.cells.each do |cell|
    cell.paragraphs.each do |paragraph|
      paragraph.each_text_run do |text|
        text.substitute('_placeholder_', 'replacement value')
      end
    end
  end
end

doc.save('tables-edited.docx')

Advanced

require 'docx'

d = Docx::Document.open('example.docx')

# The Nokogiri::XML::Node on which an element is based can be accessed using #node
d.paragraphs.each do |p|
  puts p.node.inspect
end

# The #xpath and #at_xpath methods are delegated to the node from the element, saving a step
p_element = d.paragraphs.first
p_children = p_element.xpath("//child::*") # selects all children
p_child = p_element.at_xpath("//child::*") # selects first child

Writing and Manipulating Styles

require 'docx'

d = Docx::Document.open('example.docx')
existing_style = d.styles_configuration.style_of("Heading 1")
existing_style.font_color = "000000"

# see attributes below
new_style = d.styles_configuration.add_style("Red", name: "Red", font_color: "FF0000", font_size: 20)
new_style.bold = true

d.paragraphs.each do |p|
  p.style = "Red"
end

d.paragraphs.each do |p|
  p.style = "Heading 1"
end

d.styles_configuration.remove_style("Red")

Style Attributes

The following is a list of attributes and what they control within the style.

id: The unique identifier of the style. (required)
name: The human-readable name of the style. (required)
type: Indicates the type of the style (e.g., paragraph, character).
keep_next: Boolean value controlling whether to keep a paragraph and the next one on the same page. Valid values: true/false.
keep_lines: Boolean value specifying whether to keep all lines of a paragraph together on one page. Valid values: true/false.
page_break_before: Boolean value indicating whether to insert a page break before the paragraph. Valid values: true/false.
widow_control: Boolean value controlling widow and orphan lines in a paragraph. Valid values: true/false.
shading_style: Defines the shading pattern style.
shading_color: Specifies the color of the shading pattern. Valid values: Hex color codes.
shading_fill: Indicates the background fill color of shading.
suppress_auto_hyphens: Boolean value controlling automatic hyphenation. Valid values: true/false.
bidirectional_text: Boolean value indicating if the paragraph contains bidirectional text. Valid values: true/false.
spacing_before: Defines the spacing before a paragraph.
spacing_after: Specifies the spacing after a paragraph.
line_spacing: Indicates the line spacing of a paragraph.
line_rule: Defines how line spacing is calculated.
indent_left: Sets the left indentation of a paragraph.
indent_right: Specifies the right indentation of a paragraph.
indent_first_line: Indicates the first line indentation of a paragraph.
align: Controls the text alignment within a paragraph.
font: Sets the font for different scripts (ASCII, complex script, East Asian, etc.).
font_ascii: Specifies the font for ASCII characters.
font_cs: Indicates the font for complex script characters.
font_hAnsi: Sets the font for high ANSI characters.
font_eastAsia: Specifies the font for East Asian characters.
bold: Boolean value controlling bold formatting. Valid values: true/false.
italic: Boolean value indicating italic formatting. Valid values: true/false.
caps: Boolean value controlling capitalization. Valid values: true/false.
small_caps: Boolean value specifying small capital letters. Valid values: true/false.
strike: Boolean value indicating strikethrough formatting. Valid values: true/false.
double_strike: Boolean value defining double strikethrough formatting. Valid values: true/false.
outline: Boolean value specifying outline effects. Valid values: true/false.
outline_level: Indicates the outline level in a document's hierarchy.
font_color: Sets the text color. Valid values: Hex color codes.
font_size: Controls the font size.
font_size_cs: Specifies the font size for complex script characters.
underline_style: Indicates the style of underlining.
underline_color: Specifies the color of the underline. Valid values: Hex color codes.
spacing: Controls character spacing.
kerning: Sets the space between characters.
position: Controls the position of characters (superscript/subscript).
text_fill_color: Sets the fill color of text. Valid values: Hex color codes.
vertical_alignment: Controls the vertical alignment of text within a line.
lang: Specifies the language tag for the text.

Development

todo

Calculate element formatting based on values present in element properties as well as properties inherited from parents
Default formatting of inserted elements to inherited values
Implement formattable elements.
Easier multi-line text insertion at a single bookmark (inserting paragraph nodes after the one containing the bookmark)

docx's People

Contributors

Stargazers

Watchers

Forkers

bitcababy adamtao jebw bwstearns aforward-oss sguy12 yannp martiantim isendir bchadfield holodigm lastzactionhero verbalizeit nathanvda vonkingsley phoenixwizard usctrojan manurdevang joshvoigts vitall924 4itosik bitops schuchhardt cichaczem dinubs sergklimov bramj subin-tp chrisdobler rajsite andyw8 mohaballi deniskorobicyn tigraine zaparka englertjoseph omxhealth sskyang ljkbennett morsedigital caoimhinp harenderymca shtzr840329 qelphybox gvoicu hoffjul vincentvanbush xn tsipiniuk vladislav-yashin egilburg 3014zhangshuo andyentity mstewio esergion nigelthorne leei aashish mifrill jevy yuriihabrusiev pratheepv quimbee baleezo tombroomfield rkjbnz fragkakis alecksjohannes joshuarose acima-credit jandinter fakeleft iedidiah joelbarker2011 nicolalorusso s-hack zh3w4ng goco-inc happyfuncorp echan00 lcrepet rohanpujaris bineanshi tshida951 maddale visoft thucnh ollieh-m abartov moisesnarvaez suddani yjukaku misdoro melvinsembrano mvz douglara zimobi mmizutani enzymecorp hgwr

docx's Issues

SECURITY VULNERABILITY: Pls update the required rubyzip version to 1.2.1

The version of the rubyzip gem that is currently required is vulnerable to directory traversal attacks, see this issue. This also affects this gem.

Updating the rubyzip to version 1.2.1 does not break anything afaik, so it's no effort.

Zip::ZipError, I need help

Hi
I installed docx gem in my project, and I tried open a docx file using Docx::Document.open("test.docx"). Then I got an error - "Zip::ZipError: zip end of central directory signature not found". I knew it 's relative to my env, but I don't know how to solve this error. ps: I use ubuntu system.
Thx

[Ask] Install in rails 4

Dear all

I am new developer in rails.
Could i install this gem in rails 4 ?
I have put it in Gemfile. when i bundle install, i get the issues that my rubyzip is not compitible. this is my message error
In Gemfile:
docx (>= 0) ruby depends on
rubyzip (~> 0.9) ruby

roo (>= 0) ruby depends on
  rubyzip (1.1.0)

Please let me know about my problem
Sory for my bad english.
Thanks

Can't open document using a Pathname. Breaking change between 0.4.0 and 0.6.0

Describe the bug

Upgrading gem from 0.4 to 0.6. Code no longer works since I call Document.open with a Pathname

To Reproduce

see example

example

require 'docx'
pn = Pathname.new("/home/me/mydoc.docx")
doc = Docx::Document.new(pn)

=> undefined method `close' for nil:NilClass

Sample docx file

N/A

Expected behavior

document object could be opened with a Pathname as in version 0.4

Environment

Ruby version: (jruby 9.2.14.0 - ruby 2.5.7)
docx gem version: [0.6.0
OS: Ubuntu

Update TOC

Hi, I'm looking for a library that does a simple task:

Open a word document,
Update all tables, graphs and table of contents.
Is docx able to do this task?

Thanks in advance

undefined method `node' for "some text":String

Looks like there's an issue when replacing text ;

doc.bookmarks['abookmark'].insert_after("some text")

Following the example you supply this gives the error above.

gem 'nokogiri',">1.5"
gem 'rubyzip', ">0.9"
rails 3.2.1
ruby 1.9.2

Add support for Office 365 files (with word/document2.xml)

When I create a brand new Word online file and save (download) it to my PC and then try and open it with this library I get the following error:

No such file or directory - word/document.xml (Errno::ENOENT)
/var/task/vendor/bundle/ruby/2.5.0/gems/rubyzip-2.2.0/lib/zip/file.rb:397:in `get_entry'
/var/task/vendor/bundle/ruby/2.5.0/gems/rubyzip-2.2.0/lib/zip/file.rb:255:in `get_input_stream'
/var/task/vendor/bundle/ruby/2.5.0/gems/rubyzip-2.2.0/lib/zip/file.rb:287:in `read'
/var/task/vendor/bundle/ruby/2.5.0/gems/docx-0.4.0/lib/docx/document.rb:26:in `initialize'
/var/task/vendor/bundle/ruby/2.5.0/gems/docx-0.4.0/lib/docx/document.rb:50:in `new'
/var/task/vendor/bundle/ruby/2.5.0/gems/docx-0.4.0/lib/docx/document.rb:50:in `open'
/var/task/event.rb:87:in `stamp_docx_properties'
/var/task/event.rb:26:in `block in handle'
/var/task/event.rb:20:in `each'
/var/task/event.rb:20:in `handle'

Upon inspection of the docx file on my system it seems to have a word/document2.xml file instead of a word/document.xml file.

To replicate this blank docx file navigate to https://www.office.com/launch/word and create a blank document then save it to your local PC.

I am not sure why docx files created in word online have word/document2.xml instead of word/document.xml but could you please add support for this?

Can we fetch Page Number while reading the docx?

Hi, folks!

Is there a way to fetch the page number of docx file while reading them.

Aligned images breaks the docx file

After numerous testing of a document that won't open when saved with this gem, I've been able to pinpoint that a docx containing an aligned image will break after saving.

Here is an example docx with just one aligned image:
www.docx

Here is the resulting docx when saved with this gem:
www-broken.docx

Delete a paragraph

What would be the easiest way to delete a paragraph? The spec shows deleting the text content of a paragraph using #blank! but the paragraph still remains within the document. I'm thinking about a method like #delete! which would remove the entire w:p node.

I'm thinking about something within paragraph.rb like:

def delete!
self.node.remove
end

Dewayne
o-*

README out of date

The Install paragraph of README.md has code for specifying version '~> 0.2.07' but the master branch has been bumped to version 3.

Before I submit a pull request, is there any reason for this?

Additionally, I believe the require option :require => ["docx"] is unnecessary as the gem's main file is identical to the gem name.

Accessing existing header and footer

I don't think there is a way to add a header and footer directly but if I open a document that already has a header and footer, how would I access it? I tried referencing a bookmark that I placed inside the header but it doesn't seem to be included in doc.bookmarks. Any ideas?

Replace Header/Footer bookmarks doesn't work

Hello,

Targetting a bookmark that is located in the footer or the header of a specified .docx doesn't work for me. If I get those bookmarks out of these special document parts, they get normally replaced, but a bookmark located there is not found by my ruby on rails app.

Any idea ? Is it a feature to come ?

Thanks in advance for your answer.

[Question] Some question about the gem

docx is the most popular gem which can manipulate docx documents. However, I have some questions which is not described in the document.

Must we use a valid existed docx to read or write?

The answer probably "YES" though some issues in the repo

Can we insert tables just including texts?

How to use docx gem with s3 files

I have this create action to extract data from doc and docx files (if available) using the docx gem and the msworddoc-extractor gem

if @subject.save
  if @subject.odoc.present?
    @odoc_url = @subject.odoc.url
    if File.extname(URI.parse(@odoc_url).path) == ".docx"
      @subject.homework= ""
      doc = Docx::Document.open(@odoc_url)
      doc.paragraphs.each do |p|
        @subject.homework = @subject.homework+p.to_html
      end
    else
      MSWordDoc::Extractor.load(@odoc_url) do |doc|
        @subject.homework= doc.whole_contents
      end
    end
    @subject.save
  end

now, doc files works fine.. My problem is with doc = Docx::Document.open(@odoc_url) when i use the code on my local machine it works fine.. when i push into production i get an error Zip::Error: File s3.amazonaws.com/~~~ not found I'm not really sure how to load the file to be accessible to the docx gem

Change tracking

Is it possible to hook into the change tracking functionality with this gem?

Errno::ENOENT: No such file or directory - word/styles.xml

I'm getting the following error when I try to open a .docx file. Errno::ENOENT: No such file or directory - word/styles.xml in the following line.

4] pry(#)> doc = Docx::Document.open("./tmp/511676831_1.docx")
Errno::ENOENT: No such file or directory - word/styles.xml

Here is the file I'm trying to open:

https://www.dropbox.com/s/guhjbcevwu8nrb7/511676831_1.docx?dl=0

Replacing text in paragraphs

I'm trying to figure out how to replace the paragraphs with translated text. I can't quite figure out how the 'replace_entry' function works. Is that the best way to go about it? The documentation is pretty scarce, it would be awesome if someone could give me some pointers!

Support Macro detection

It would be useful if there were a function to simply identify when Macros are present. This way this library could be used to help reject documents that are uploaded that contain macros, for security reasons.

This OWASP page discusses how to do this in Java:
https://www.owasp.org/index.php/Protect_FileUpload_Against_Malicious_File

Note this is not a request to support macros beyond being able to identify if a macro is present.

ENOENT error because internal doc is word/document22.xml

Describe the bug

I found a .docx file that appears to be completely valid (it opens in Word) but raises an error ENOENT when opening with this gem.

To Reproduce

Docx::Document.new with this file: weird_docx.docx

The code looks for word/document.xml and word/document2.xml but not word/document22.xml, which is what's inside this doc.

Isn't Word just obnoxious?

How to open password protected documents?

Is this possible if I have the document password? If not, can someone point me to where/how I can add this functionality?

insert_text_before and insert_text_after are switched

From bookmark.rb:

# Insert text before bookmarkStart node
      def insert_text_before(text)
        text_run = get_run_after
        text_run.text = "#{text}#{text_run.text}"
      end

      # Insert text after bookmarkStart node
      def insert_text_after(text)
        text_run = get_run_before
        text_run.text = "#{text_run.text}#{text}"
      end

As you can see, insert_text_before does what insert_text_after should do, and vice versa.

Thanks

Unable to read Binary String

Describe the bug

I am reading a Docx file saved as Blob field in Mysql database. The output from the Mysql table is in the form of a Binary String as extracted from "Event" of Logstash. I am able to write the binary string to a file and then read it using Docx. However, if i pass the data directly to Docx, it gives error.

To Reproduce

Steps to reproduce the behavior or put a short code to reproduce the bug.

example

require 'docx'
# I WRITE THE BINARY STRING TO A DOCX FILE AND READ IT
File.binwrite('c:\path\filename.doc', event.get('Blob field'))
doc = Docx::Document.new('/path/to/your/docx/filename.docx')
#ERROR--THIS DOES NOT WORK
doc = Docx::Document.new('event.get('Blob field'))
# TRIED TO CONVERT THE DATA TO A STRINGIO, BUT DID NOT WORK
file_to_read=StringIO.New(event.get('Blob field'))
doc = Docx::Document.new(file_to_read)

## Expected behavior

Is there a way to pass stringIO directly to Docx or any other way around to circumvent writing the file to Disk and then reading it.
Sorry for the wrong Label

## Environment
- Ruby version: [e.g 2.7.1]
- `docx` gem version: [e.g 0.5.0]
- Windows

doccex

Hi,

there is a rails engine called doccex - https://github.com/mustardseeddatabase/doccex.
Is this a fork of this library? And/or what are the different basic characteristics of your project?

Thankx

How to get XML from docx file

I'm trying to convert a docx file into PDF. The process I thought about was as follows, convert the docx file into an HTML file and from HTML into PDF. However, using this process the outcome wasn't what I expected.
testing.pdf

This is what it looks like after the process mentioned above. Here is a link to the origin docx file
https://www.dropbox.com/s/f1klwguv4r9iyje/testing.docx?dl=0

I think word documents use XML so this might improve how documents are displayed if I saved the file from docx to xml and then into PDF(You might have better direction on this.)

So far I have doc = Docx::Document.open('testing.docx') When I try to get the XML from the document I get nil.

[61] pry(#<PDFProducer>)> doc.xml
=> nil

Can one get XML from the word document? Or am I wrong in my assumption that word documents use XML?

https://stackoverflow.com/questions/56450113/font-size-convert-docx-into-pdf-in-ruby-using-wickedpdf-and-docx

Import picture inplace of bookmark

Need to import picture inplace of bookmark

to_s crash, presumably when doc has word/document22.xml inside

As discussed in #103, one bug led to another.

With the file referenced in that PR, a crash occurs when getting the file's text:

Source:

    Docx::Document.open(file) do |doc|
      return doc.to_s
    end

Trace:

     NoMethodError:
       undefined method `xpath' for nil:NilClass
     # /Users/bmulholland/.rvm/gems/ruby-2.7.2@recital-backend/gems/docx-0.6.1/lib/docx/document.rb:94:in `hyperlink_relationships'
     # /Users/bmulholland/.rvm/gems/ruby-2.7.2@recital-backend/gems/docx-0.6.1/lib/docx/document.rb:88:in `hyperlinks'
     # /Users/bmulholland/.rvm/gems/ruby-2.7.2@recital-backend/gems/docx-0.6.1/lib/docx/document.rb:48:in `document_properties'
     # /Users/bmulholland/.rvm/gems/ruby-2.7.2@recital-backend/gems/docx-0.6.1/lib/docx/document.rb:189:in `parse_paragraph_from'
     # /Users/bmulholland/.rvm/gems/ruby-2.7.2@recital-backend/gems/docx-0.6.1/lib/docx/document.rb:61:in `block in paragraphs'
     # /Users/bmulholland/.rvm/gems/ruby-2.7.2@recital-backend/gems/nokogiri-1.11.3-x86_64-darwin/lib/nokogiri/xml/node_set.rb:239:in `block in each'
     # /Users/bmulholland/.rvm/gems/ruby-2.7.2@recital-backend/gems/nokogiri-1.11.3-x86_64-darwin/lib/nokogiri/xml/node_set.rb:238:in `upto'
     # /Users/bmulholland/.rvm/gems/ruby-2.7.2@recital-backend/gems/nokogiri-1.11.3-x86_64-darwin/lib/nokogiri/xml/node_set.rb:238:in `each'
     # /Users/bmulholland/.rvm/gems/ruby-2.7.2@recital-backend/gems/docx-0.6.1/lib/docx/document.rb:61:in `map'
     # /Users/bmulholland/.rvm/gems/ruby-2.7.2@recital-backend/gems/docx-0.6.1/lib/docx/document.rb:61:in `paragraphs'
     # /Users/bmulholland/.rvm/gems/ruby-2.7.2@recital-backend/gems/docx-0.6.1/lib/docx/document.rb:110:in `to_s'

Edit content from node

There is some way to replace the content of inspect?
If I did something like this

d = Docx::Document.open('example.docx')

d.paragraphs.each do |p|
  p.node.inspect = p.node.inspect.gsub('Old text', 'Next text')
end

is it possible to insert tables?

I wondered if this gem could be used to insert more complex elements into a document like tables.
Thanks

Does docx gem supports .doc also?

Is there support for .doc also?
doc = Docx::Document.open("example.doc")

[7] pry(LambdaFunctions::LambdaHandler)> File.open("test.doc", 'wb') do |f|
[7] pry(LambdaFunctions::LambdaHandler)*   f.write raw
[7] pry(LambdaFunctions::LambdaHandler)* end  
=> 69632
[8] pry(LambdaFunctions::LambdaHandler)> doc = Docx::Document.open("test.doc")
Errno::ENOENT: No such file or directory - word/document.xml
from /Users/staguilar/.rvm/gems/ruby-2.5.3/gems/rubyzip-1.2.3/lib/zip/file.rb:361:in `get_entry'

Can't call .stream on Document

I need to get the current state as an IO - ready to be uploaded. I can see the stream method when I look at the Document class but get the following error when I try and call it:

NoMethodError:
       undefined method `stream' for #<Docx::Document:0x000055fa7bb472a8>

Example of what I am trying to do:

doc = Docx::Document.open(downloaded_file)
# make changes to doc.zip
stream = doc.stream

File out.docx has zero size

Hi, I am getting following error while trying to read empty docx file created using File.new("file.docx", "w") command:

error: Zip::Error (File file.docx has zero size. Did you mean to pass the create flag?)

can anybody tell what is happening?

is it possible to insert multiple lines after bookmark?

Im looking to output the contents of an array into a document line-by-line, but so far have not been able to figure it out. Everything so far is displayed on a single line.

Get Doc contents with respect to pages

Hello,

Thanks for making this gem!

I'm working on a web app which requires reading content from doc files, although docx gem. It just prints the content regardless of page, I'd like to retrieve content with respect to pages.
How can I achieve this with docx gem?

Thanks,
Ankit

`get_entry': No such file or directory - word/document.xml (Errno::ENOENT)

Please see title .

When i used docx gem why met this problem ?

The following code:

require 'docx'

d = Docx::Document.open('111.doc')

d.each_paragraph do |p|
puts d
end

error:`get_entry': No such file or directory - word/document.xml (Errno::ENOENT)

Need to indicate exactly that i used root user to ran this program.

show diff between 2 docx versions.

Given I have uploaded two versions of a docx document to the web app
When I click "View Difference"
Then I should see reds and greens of text deleted and added between the documents.

Is that possible with this gem?

paragraph remove function

create password protected documents?

is it possible to create a password protected document using this library?

font color?

Hi, is there anyway to set the color of the text that is inserted using insert_text_after?

If not, I can try to add it, but can someone give me a hint of where to look or generally what to do? (I don't know anything about docx format).

thanks
Joel

Update the nokogiri dependency?

Hi!

There's a security vulnerability in nokogiri fixed in version 1.10.4. This gem requires ~>1.8, holding back upgrades of nokogiri. Could you release a version requiring the newer nokogiri, or just >1.8 open-endedly? Thanks!

How to write data without bookmarks ?

Describe the bug

1/ My word document has not bookmarked. So how to write data to word?
2/ I have the word

How to keep the format when I get it?

Expected behavior

1/ I want to write (append) data and then save a new file
2/ Keep format

Thanks!

Environment

Ruby version: [e.g 2.7.1]
docx gem version: [e.g 0.5.0]
OS: [e.g. iOS]

Zlib buffer error

Hello.

I'm trying to use the gem to make a simple edit to some paragraphs in a docx file. The processing goes well, but when attempting to call #save on the document object, I get a Zlib buffer error, with the following stack trace:

["ruby-2.3.4/gems/rubyzip-1.2.1/lib/zip/inflater.rb:44:in `inflate'", "ruby-2.3.4/gems/rubyzip-1.2.1/lib/zip/inflater.rb:44:in `internal_produce_input'", "ruby-2.3.4/gems/rubyzip-1.2.1/lib/zip/inflater.rb:15:in `sysread'", "ruby-2.3.4/gems/rubyzip-1.2.1/lib/zip/input_stream.rb:82:in `sysread'", "ruby-2.3.4/gems/rubyzip-1.2.1/lib/zip/ioextras/abstract_input_stream.rb:33:in `read'", "ruby-2.3.4/gems/rubyzip-1.2.1/lib/zip/file.rb:264:in `block in read'", "ruby-2.3.4/gems/rubyzip-1.2.1/lib/zip/entry.rb:501:in `get_input_stream'", "ruby-2.3.4/gems/rubyzip-1.2.1/lib/zip/file.rb:232:in `get_input_stream'", "ruby-2.3.4/gems/rubyzip-1.2.1/lib/zip/file.rb:264:in `read'", "ruby-2.3.4/gems/docx-0.3.0/lib/docx/document.rb:110:in `block (2 levels) in save'", "ruby-2.3.4/gems/rubyzip-1.2.1/lib/zip/entry_set.rb:38:in `block in each'"]

Any ideas on what the problem may be?

Add support for Content Control fields in paragraph text runs

Some word documents I was parsing use Content Control fields to create forms like the following screenshot shows:

_{I cannot provide the example word doc I was using but I can possibly make an example if necessary.}

The following example would not print the value "Approved" that was in the Content Control field above:

# Retrieve and display paragraphs as html
doc = Docx::Document.open('example.docx')
doc.paragraphs.each do |p|
  puts p.to_html
end

However if I modified paragraph.rb to look like the following, the Content Control fields are parsed:

@node.xpath('w:r|w:hyperlink/w:r|w:sdt/w:sdtContent/w:r').map { |r_node| Containers::TextRun.new(r_node, @document_properties) }

LoadError: cannot load such file -- zip

I have the following code:

Gemfile

gem 'docx', '~> 0.2.07'

Rake task

require 'docx'
 docx = Docx::Document.open(template_docx)
...

And I get the following error:

LoadError: cannot load such file -- zip

Am I doing something wrong?

Not gettin docx class working properly on rails

Describe the bug

I'm trying to create a .docx template using rails but I'm getting problem in loading the 'docx' class.

Repository

NoMethodError in ClientsController#create
`undefined method `close' for nil:NilClass`

Image

Controller

  def templater
    require "docx"
    doc = Docx::Document.open("base.docx")
      doc.paragraphs.each do |p|
        p.each_text_run do |tr|
          tr.substitute('_placeholder', 'teste')
        end
      end
      doc.save('base-edit.docx')
  end

Gemfile
gem 'docx', :require => ["docx"]

Environment

Ruby version: [e.g 2.6.5]
docx gem version: generic gem docx (last version)
OS: [e.g. Ubuntu]
gem 'rails', '~> 5.2.4', '>= 5.2.4.1'

Question: only paragraphs that are not part of a table?

Hello! Since OpenOffice XML distinguishes into paragraphs and tables, but every table cell also appears as a paragraph: is there any way to access only those paragraphs that are not inside of a table (MWE where table cells are also appearing in the list of paragraphs)?

undefined method `close' for nil:NilClass

I wrote this simple code:

require 'docx'

# Create a Docx::Document object for our existing docx file
doc = Docx::Document.open("a.docx")

# Retrieve and display paragraphs
doc.paragraphs.each do |p|
  puts p
end

...and keep getting "undefined method `close' for nil:NilClass".

I am sure the file is there, and tried many different locations for the file, so I'm suspecting this gem is not maintained.

Can anyone guide me? Thank you

Already initialized constant TAG, Ruby On Rails

I am getting the following errors when using docx in my rails app. I have a rails model called Tag.

/Users/me/.rvm/gems/jruby-1.7.4@neo4jtest/gems/docx-0.2.03/lib/docx/elements/bookmark.rb:10 warning: already initialized constant TAG
/Users/me/.rvm/gems/jruby-1.7.4@neo4jtest/gems/docx-0.2.03/lib/docx/elements/text.rb:7 warning: already initialized constant TAG
/Users/me/.rvm/gems/jruby-1.7.4@neo4jtest/gems/docx-0.2.03/lib/docx/containers/text_run.rb:17 warning: already initialized constant TAG
/Users/me/.rvm/gems/jruby-1.7.4@neo4jtest/gems/docx-0.2.03/lib/docx/containers/paragraph.rb:12 warning: already initialized constant TAG

I am guessing this can be fixed with better use of namespaces.

NameError: uninitialized constant Zip::File

when I try to read the doc by doing
doc = Docx::Document.open("tmp/document.docx")
I get an error saying "NameError: uninitialized constant Zip::File"
I have gem 'zip' & gem 'rubyzip' both in my gemfile.
What can be the possible cause and solution for this?

Editing *.docx files by WPS Office.

Hello @chrahunt, some of my colleagues edit docx files by WPS Office. Output docx files succesfully processed by your gem, but not saved. There is an exeption:

/var/lib/gems/2.3.0/gems/rubyzip-1.1.7/lib/zip/file.rb:262:in `block in read': undefined method `read' for Zip::NullInputStream:Module (NoMethodError)
    from /var/lib/gems/2.3.0/gems/rubyzip-1.1.7/lib/zip/entry.rb:483:in `get_input_stream'
    from /var/lib/gems/2.3.0/gems/rubyzip-1.1.7/lib/zip/file.rb:230:in `get_input_stream'
    from /var/lib/gems/2.3.0/gems/rubyzip-1.1.7/lib/zip/file.rb:262:in `read'
    from /vagrant/lib/docx/document.rb:111:in `block (2 levels) in save'
    from /var/lib/gems/2.3.0/gems/rubyzip-1.1.7/lib/zip/entry_set.rb:42:in `block in each'
    from /var/lib/gems/2.3.0/gems/rubyzip-1.1.7/lib/zip/entry_set.rb:41:in `each'
    from /var/lib/gems/2.3.0/gems/rubyzip-1.1.7/lib/zip/entry_set.rb:41:in `each'
    from /var/lib/gems/2.3.0/gems/rubyzip-1.1.7/lib/zip/central_directory.rb:182:in `each'
    from /vagrant/lib/docx/document.rb:104:in `block in save'
    from /var/lib/gems/2.3.0/gems/rubyzip-1.1.7/lib/zip/output_stream.rb:53:in `open'
    from /vagrant/lib/docx/document.rb:103:in `save'

Obviously, rubyzip gem processes that docx files different from docx files by MS Office.

ruby-docx / docx Goto Github PK

docx's Introduction

docx

Usage

Prerequisites

Install

Reading

Rendering html

Reading tables

Writing

Writing to tables

Advanced

Writing and Manipulating Styles

Style Attributes

Development

todo

docx's People

Contributors

Stargazers

Watchers

Forkers

docx's Issues

Describe the bug

To Reproduce

example

Sample docx file

Expected behavior

Environment

Describe the bug

To Reproduce

Describe the bug

To Reproduce

example

Describe the bug

Expected behavior

Environment

Describe the bug

Environment

Recommend Projects

Recommend Topics

Recommend Org