Comments (4)
Pushed a commit with nascent support for this. Please read the new manual page for read_docx()
.
# original
read_docx(
system.file("examples/trackchanges.docx", package="docxtractr")
) %>%
docx_extract_all_tbls(guess_header = FALSE)
#> NOTE: header=FALSE but table has a marked header row in the Word document
#> [[1]]
#> # A tibble: 1 x 1
#> V1
#> <chr>
#> 1 21
# accept
read_docx(
system.file("examples/trackchanges.docx", package="docxtractr"),
track_changes = "accept"
) %>%
docx_extract_all_tbls(guess_header = FALSE)
#> [[1]]
#> # A tibble: 1 x 1
#> V1
#> <chr>
#> 1 2
# reject
read_docx(
system.file("examples/trackchanges.docx", package="docxtractr"),
track_changes = "reject"
) %>%
docx_extract_all_tbls(guess_header = FALSE)
#> [[1]]
#> # A tibble: 1 x 1
#> V1
#> <chr>
#> 1 1
If this does work for you would you be open to submitting a PR and add yourself in a new person()
record to the DESCRIPTION
as a contributor?
from docxtractr.
Thanks for quick response. This sounds promising. However, I am getting some Pandoc error which is difficult to understand. Any clue?:
> library(docxtractr)
>
> path<-"C:\\Users\\tomas_hovorka\\Documents\\docxtractr_bug.docx"
>
> d1<-read_docx(path,track_changes = "accept")
Warning message:
running command '"C:/Users/tomas_hovorka/Documents/TomasH/SW/RStudio/bin/pandoc" -f docx -t docx -o C:\Users\TOMAS_~1\AppData\Local\Temp\RtmpW6mfe7\file166425a052de.zip --track-changes=accept C:\Users\TOMAS_~1\AppData\Local\Temp\RtmpW6mfe7\file166425a052de.zip' had status 127
> t1a<-docx_extract_tbl(d1, 1)
> t1a
A tibble: 0 x 1
... with 1 variable: `21` <chr>
>
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
locale:
[1] LC_COLLATE=Czech_Czech Republic.1250 LC_CTYPE=Czech_Czech Republic.1250 LC_MONETARY=Czech_Czech Republic.1250 LC_NUMERIC=C
[5] LC_TIME=Czech_Czech Republic.1250
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] docxtractr_0.6.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.16 utf8_1.1.4 crayon_1.3.4 dplyr_0.7.4 assertthat_0.1 R6_2.2.2 magrittr_1.5
[8] pillar_1.2.3 httr_1.3.1 rlang_0.2.0 rstudioapi_0.7.0-9000 bindrcpp_0.2 xml2_1.2.0 tools_3.3.1
[15] glue_1.2.0 purrr_0.2.4 pkgconfig_2.0.1 bindr_0.1.1 tibble_1.4.2
> library(rmarkdown)
Warning message:
package ‘rmarkdown’ was built under R version 3.3.3
> pandoc_available(version = NULL, error = FALSE)
[1] TRUE
> pandoc_version()
[1] ‘1.17.2’
```
from docxtractr.
Thx for testing! And, #sigh. Tis very likely the pandoc version is the culprit. I build it from source on my linux systems and run RStudio dailies on my non-linux systems and both of those actions install pandoc v2.x.y vs pandoc v1.x.y and only pandoc 2+ has the ms word track changes integration. I'll add some checks for version but you'll have to wait until RStudio's forthcoming release candidate is ready or live "dangerously" (FWIW I run the dailies and they never impede my $DAYJOB work) and use the RStudio Preview builds since they have pandoc 2.x in them. Given the legacy operating system you're using, I'd be wary of trying to build pandoc on your system but there are Windows binary packages for pandoc 2.x.y via https://github.com/jgm/pandoc/releases/tag/2.3.1 (I may need to add a specific check for the directory that tends to install pandoc into).
from docxtractr.
I do not get errors, but I do get incorrect behavior. The same result in all three cases (and the same problem with the file I am actually using).
NOTE: header=FALSE but table has a marked header row in the Word document
[[1]]
# A tibble: 1 x 1
V1
<chr>
1 21
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
system code page: 65001
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] docxtractr_0.6.5 bookdown_0.24 MCMCglmm_2.32 ape_5.5 coda_0.19-4 Matrix_1.3-4 knitr_1.36
[8] kableExtra_1.3.4 flextable_0.6.10 magrittr_2.0.1 Hmisc_4.6-0 Formula_1.2-4 survival_3.2-13 lattice_0.20-45
[15] pewmethods_1.0 stringi_1.7.5 forcats_0.5.1 dplyr_1.0.7 purrr_0.3.4 readr_2.1.0 tidyr_1.1.4
[22] tibble_3.1.6 tidyverse_1.3.1 devtools_2.4.2 usethis_2.1.3 mice_3.13.0 pdftables_0.1 pdftools_3.0.1
[29] tabulizer_0.2.2 stringr_1.4.0 ggplot2_3.3.5 labelled_2.9.0 haven_2.4.3 data.table_1.14.2 readxl_1.3.1
loaded via a namespace (and not attached):
[1] cubature_2.0.4.2 colorspace_2.0-2 ellipsis_0.3.2 rprojroot_2.0.2 htmlTable_2.3.0 corpcor_1.6.10 base64enc_0.1-3
[8] fs_1.5.0 rstudioapi_0.13 remotes_2.4.1 fansi_0.5.0 lubridate_1.8.0 ranger_0.13.1 xml2_1.3.2
[15] splines_4.1.0 cachem_1.0.6 pkgload_1.2.3 jsonlite_1.7.2 rJava_1.0-5 broom_0.7.10 cluster_2.1.2
[22] dbplyr_2.1.1 png_0.1-7 compiler_4.1.0 httr_1.4.2 backports_1.3.0 assertthat_0.2.1 fastmap_1.1.0
[29] survey_4.1-1 cli_3.1.0 htmltools_0.5.2 prettyunits_1.1.1 tools_4.1.0 gtable_0.3.0 glue_1.5.0
[36] Rcpp_1.0.7 cellranger_1.1.0 vctrs_0.3.8 nlme_3.1-153 svglite_2.0.0 tensorA_0.36.2 xfun_0.28
[43] ps_1.6.0 openxlsx_4.2.4 testthat_3.1.0 rvest_1.0.2 lifecycle_1.0.1 scales_1.1.1 hms_1.1.1
[50] parallel_4.1.0 RColorBrewer_1.1-2 yaml_2.2.1 memoise_2.0.0 gridExtra_2.3 gdtools_0.2.3 rpart_4.1-15
[57] latticeExtra_0.6-29 desc_1.4.0 checkmate_2.0.0 pkgbuild_1.2.0 zip_2.2.0 systemfonts_1.0.3 rlang_0.4.12
[64] pkgconfig_2.0.3 evaluate_0.14 tabulizerjars_1.0.1 htmlwidgets_1.5.4 processx_3.5.2 tidyselect_1.1.1 R6_2.5.1
[71] generics_0.1.1 DBI_1.1.1 pillar_1.6.4 foreign_0.8-81 withr_2.4.2 nnet_7.3-16 modelr_0.1.8
[78] crayon_1.4.2 uuid_1.0-3 utf8_1.2.2 officer_0.4.1 tzdb_0.2.0 rmarkdown_2.11 jpeg_0.1-9
[85] grid_4.1.0 qpdf_1.1 callr_3.7.0 webshot_0.5.2 reprex_2.0.1 digest_0.6.28 munsell_0.5.0
[92] viridisLite_0.4.0 mitools_2.4 sessioninfo_1.2.1 askpass_1.1
from docxtractr.
Related Issues (20)
- error when read_docx has url argument HOT 3
- extract text associated with the comment HOT 3
- Extract contents of document footers? HOT 2
- Fix the errors the recent tidyverse update introduced
- Alternative way of Supporting for doc-files
- Read special symbols within the tables in a .docx file
- Feature request: selected_text in docx_extract_all_cmnts() HOT 2
- CRAN Submission HOT 3
- DOC: soffice failure on Plumber
- docx_extract_all_cmnts(..., include_text = TRUE) failing on edge case
- Get table heading or page number for tables
- Numbers are lost when reading cells with numbered lists
- Edit and upload comments to word docx
- convert_to_pdf() fails but command-line equivalent works HOT 2
- Error when assigning column names if the table has only one column
- input issue HOT 2
- Possible to have output as tibble? HOT 2
- can't read from a local file HOT 2
- Is there a way to conserve newlines from extracted tables? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from docxtractr.