spgarbet / tangram Goto Github PK

Table Grammar package for R

R 100.00%

tangram's Introduction

A Grammar of Tables 'tangram'

Quick show me some really impressive results in Rmarkdown! See example.html or see the equivalent in LaTeX example.pdf

Version 0.8.2:

Fixed bug in backtick operator for naming.

Version 0.7.x: longtable

Tables are now all longtables in LaTeX. Vignettes are split to another project: tangram-vignettes.

NOTICE: Major Refactor

Just a quick note that this release is a major refactor that makes the tangram call context aware when used with Rmarkdown. Calls to rendering for html or latex are no longer required (but could still be used). Also, each cell's rendering now has a dispatch table in the transform so one can easily override how numbers are formated. This release will be pushed to CRAN when I finished testing the entire LaTeX UNICODE support.

Quick Overview

What began as an extensible library to quickly generate tables from formulas, has evolved into a library that supports magrittr %>% style commands on abstract table objects. The formula interface is a complicated piece of code in it's own right, but is only one of many methods now available in the generation of tables. There were a lot of lessons learned to get to this final point, and it's worth talking about what is now the core of the library, and what has become the best practices in the design of the interface.

It's now been used to make 30-40 page reproducible DSMB (data safety monitoring board) reports on multiple clinical trials. Internally, several biostatistical reports are using it to improve the quality of presentation. This shakedown of formats and usage has vastly improved the overall quality of the package. See fda-example.html for some examples--don't be put off by size of the examples. They are reusable over and over for custom content devoted to a given task. These two transforms and their constructions, once built have to date been used on at least 5 different submitted reports.

A tangram object is at it's heart a list of lists containing cells which can be subclassed from just about anything, but the best overall choice is basically a vector of character, which contains text with minor extensions to Rmarkdown. There are two types of style, one is internal to a cell and it's formatting of text. The other is overall styling of a table which is a choice best left to the rendering call. The Rmarkdown/extensions supported are as follows:

*italic* or _italic_ Makes the font chosen italic.
**bold** or __bold__ Makes the font chosen bold.
`inline code` for inline code.
~~strikethrough~~ for ~~strikethrough~~.
# Header for header font, and the various other multiples of hash marks.
~subscript~ for a subscript (extension).
^superscript^ for a superscript (extension).

The original library had custom objects for specific statistical meanings. However, most of these are not required or needed anymore and with each version I am working to eliminate most if not all of them. There is one exception that is difficult and that is the fraction. There is no clean way to support fractions in Rmarkdown that I've come up with. I am considering another extension in the support. However, in the long run all of these custom cell objects and their handling will probably vanish.

This tangram object representing the abstract table in memory now has all the internal formatting of a cell representable in a simple and direct manner. A cell object can have the following attributes:

sep a seperator to use for rendering between character strings in a vector
reference a reference character to append as superscript to the field. Not required, as this can be handled now via the Rmarkdown above. Will probably deprecate soon.
units the units of the described label. Maintains compability with Hmisc handling of units on objects.
colspan the ability of a cell to take up multiple columns that follow to the right. The cells it covers should be set to NA.
rowspan the ability of a cell to take up multiple rows that follow below. The cells it covers should be set to NA.

A tangram object itself can have an attribute footnote to contain footnotes to display. Some formula transforms automatically supply this with their bundle for reference.

Additional subclassing of cells carries through to special handling. For example in the HTML rendering code all of these become CSS classes so that one can specify CSS rendering however an end user likes. Several of the core statistical cells also allow for flexbox rendering in CSS. LaTeX is more fixed in it's rendering, but deals with these issues handily.

Cell Helpers

Functions like cell_label will generate a cell label.

> cell_label("Joe")
Joe
> class(cell_label("Joe"))
[1] "cell_label" "cell"       "character"

The class information is important for style decisions later in the manner of CSS, and helper functions exist for the top level class all labeled as cell_*. The current handled information by the provided styles (you can of course write your own) is as follows:

cell, character
cell_label, cell, character
cell_header, cell_label, cell, character
cell_subheader, cell_label, cell, character
cell_value, cell, any base class
cell_n, cell_value, cell, numeric

Note: Most of the previous versoin classes are all deprecated in favor of straight Markdown

Wickham Style

Tables are composible

> tbl <- tangram(drug ~ sex, pbc) + tangram(drug ~ bili, pbc)
> tbl
===========================================================================================================
                          N   D-penicillamine       placebo        not randomized       Test Statistic     
                                    154               158               106                                
-----------------------------------------------------------------------------------------------------------
sex : female             418   0.903  139/154    0.867  137/158    0.925   98/106     X^2_2=2.38, P=0.304  
Serum Bilirubin (mg/dl)  418  0.70 *1.30* 3.60  0.80 *1.40* 3.22  0.70 *1.40* 3.12  F_{2,415}=0.03, P=0.972
===========================================================================================================

There are some basic operators, but adding more is quite easy. Just drop me a suggestion and I can generally turn it around pretty quickly. In fact, I'm focused on adding these in general right now.

> tbl %>% 
  del_row(2) %>%
  del_col(2) %>%
  insert_row(1, "", "Yabba", "Dabba", cell_header("DOOOO"), "", class="cell_header") %>%
  drop_statistics() %>%
  add_indent(2)
===============================================================================
                           D-penicillamine       placebo        not randomized 
                                Yabba             Dabba             DOOOO      
-------------------------------------------------------------------------------
  sex : female              0.903  139/154    0.867  137/158    0.925   98/106 
  Serum Bilirubin (mg/dl)  0.70 *1.30* 3.60  0.80 *1.40* 3.22  0.70 *1.40* 3.12
===============================================================================

By the way, referring to specific rows and columns in a table and using operators on these introduce brittleness into your table reproducibility. Say a variable is added or removed, all the absolute references are now broken! Sometimes it's necessary, but in general to be avoided if possible.

The formula interface allows for reproducible and consistent formatting into tables from data frames. I cannot stress the idea of consistent enough in this regard. While working on this project I've seen numerous professional tables where the method of display for information of the same type changes several times in the same table. The reader is forced to adapt his cognitition multiple times and this makes communication of the message in the data more difficult. Consistency of representation of data in a table is paramount to good design.

dplyr Lovers Delight

But I want to use dplyr to generate my table! Well, you can have your cake and eat it too.

> library(dplyr)
> mtcars %>%
  group_by(cyl) %>%
  summarise(disp = round(mean(disp),2), N=n()) %>%
  tangram()
===============
cyl   disp   N 
---------------
4    105.14  11
6    183.31  7 
8    353.1   14
===============

This allows for all the downstream rich rendering choices into LaTex, HTML5, rmd, or rtf to work with your summaries from dplyr.

I want percents from Hmisc style summary transform

There were several requests to modify exactly how a cell in a table was generated in the provided Hmisc transform. I've added callback tables as part of the transform object so that things can be overridden. This was used to provide styles as well, the original statistical transforms were just modified with different cell rendering via these callbacks.

The hmisc transform looks like this now:

hmisc <- list(
  Type        = hmisc_data_type,
  Numerical   = list(
                  Numerical   = summarize_spearman,
                  Categorical = summarize_kruskal_horz
            ),
  Categorical = list(
                  Numerical   = summarize_kruskal_vert,
                  Categorical = summarize_chisq
            ),
  Cell        = hmisc_cell,
  Footnote    = "N is the number of non-missing value. ^1^Kruskal-Wallis. ^2^Pearson. ^3^Wilcoxon."
)

The Cell item in the list contains the list of call backs used by the hmisc transform. Thus one can still define their own set of transforms, or just use an existing an modify the portion needed.

hmisc_cell <- list(
  n        = cell_n,
  iqr      = hmisc_iqr,
  fraction = hmisc_fraction,
  fstat    = hmisc_fstat,
  chi2     = hmisc_chi2,
  spearman = hmisc_spearman,
  wilcox   = hmisc_wilcox,
  p        = hmisc_p
)

Here's an example that keeps hmisc the same but adds percentages to fractions.

> my_transform <- hmisc
> my_transform[['Cell']][['fraction']] <- function(numerator, denominator, format=3, ...)
  { paste0('%', render_f(100*numerator/denominator, format)) }
> tangram(1 ~ sex[1]+drug[1]+bili, pbc, test=FALSE, id="override", transform=my_transform)
=========================================
                     N         All       
                             (N=418)     
-----------------------------------------
sex : female        418       %89.5      
drug                418                  
   D-penicillamine            %36.8      
   placebo                    %37.8      
   not randomized             %25.4      
Serum Bilirubin     418  0.80 *1.40* 3.40
=========================================
N is the number of non-missing value. ^1 Kruskal-Wallis. ^2 Pearson. ^3 Wilcoxon.

Thus the framework is entirely separate from the specification of transforms at the formula and the cell level.

Look in the R/cell-hmisc.R file for details on cell rendering and the interface expected by the hmisc transform.

Email

P.S. I tested copy and paste from the HTML of a vignette into an email with gmail and it worked flawlessly. Nice.

Release Notes

July 21 2018 v0.6 Major Refactor. Callbacks added for cell rendering. Rmarkdown and knitr aware of context. Style support improved and UNICODE to LaTeX mappings in progress.

Jun 4 2018 v0.4 Numerous bug fixes. The package was used for DSMB report submissions to the FDA, and several examples of IRR and other wonderful things have been produced. The full shakedown is complete, and fixes and updates are becoming smaller and smaller as the package stabilizes. LaTeX is a workable render format. It has rtf output that works acceptibly. Work continues towards formatted output in Word, with traceability.

Jun 27 2017 v0.3 A major refactor, cells in the table object are no longer 'special' and are just straight S3 objects. Two adapter layers of code are deleted, interface is stablizing. Used tangram as an S3 object and replaced various table generating calls to a single overloaded function.

Sep 27 2017 v0.3.2 Multiple bug fixes from earlier refactor. Added support for LaTeX. A new example from FDA work.

Original Goal

The idea of creating a quick summary of a data set has been around a good while. The use of a statistical formulas to create summaries exists in SAS in PROC REPORT, and in the R package Hmisc. The SAS has a rich syntax which allows for generation of a wide array of summary tables, but is limited to a subset of SAS functions. The SAS generation is further limited to a fairly crude appearing table, with limited options for output generation. Hmisc offers wonderful output, but is fixed in the analysis that can be performed.

This project intends to create a table grammar that is simple to use, while providing ultimate freedom to the end user when generating summary tables from data sets. This project contains the reference implementation in the language R, but is not limited to R.

For an example using Rmarkdown, see example.html

General Outline of Formula Transforms

A formula, a data frame (spreadsheet), and a transform function input into the framework will output an abstract table, that can be rendered into text, LaTeX, Word, or HTML5.

Formulas will be in the Columns ~ Rows syntax.

A user supplies a set of data and a formula which produces a summary object. This summary object is then passed to a renderer which is responsible for the final production of a table in a target language. The user can alter labeling, variable type detection, output table data genertion or add or alter output format. Each concern of the pipeline is separated from other concerns.

For example, one may wish for summary tables which match the New England Journal of Medicine format in LaTeX. A provided bundle of table generation will create the desired analysis directly from the data, and allow for specifying a style to the rendered LaTeX. The same formula and data could be used for a statistical report inside a department and the Hmisc table generation could be selected. In the end, the user is no longer bound to any decision in the table summary chain, beyond the grammar, and is free to change at will--or contribute more target bundles to share with others.

Original High Level Requirements

It must render to LaTeX, Text, HTML5, RMarkdown, Index table.
It must allow for user override of any summary generation function.
It must allow for user override of any rendering function.
Determination of type/class of a statistical variable is user overrideable.
Control over labeling of variables must be user overrideable.
It must be easily extensible. I.e., any user overrides should require a minimum of fuss / syntax for the end user.
Index table must be user specified name based, and not numeric numbers.
Index table must be repeatible, and contain search information.
It should reproduce by default as much as possible Hmisc summaryM behaviors.
It must be algebraically well formed.

Grammar Definition of Formulas

A formula consists of a column specification, a tilde "~" and a row-specification.

A specification is a combination of expressions with a "+" joining them. Note one can add more variables to either columns or rows in this manner.

An expression can be a variable name from the data, or a variable joined with an expression via the "*" operator.

a either variable name from the data, or a variable name joined with an expression.

<table-formula>        ::= <expression> "~" <expression>
<expression>           ::= <term> "+" <expression> | <term>
<term>                 ::= <factor> "\*" <term> | <factor>
<factor>               ::= <data-name>                             |
                            "(" <expression>" ")"                  |
                            <function-name> "(" <r-expression> ")" |

The operators + and * are distributive, i.e. term1 * (term2 + term3) == term1 * term2 + term1 * term3

The operators are not commutative, i.e. term1 + term2 =/= term2 + term1

The operators are associative.

Thus this grammar loosely corresponds to a noncommutative ring (+, *), which is non-albelian, a monoid under multiplication, and is distributive. It is not a true ring, in that elements once reduced do not appear back in the set operated on, as the grammar is describing a final product that is non-reducible, the final table.

If a function is encountered, this is executed and expected to return a variable with a label that is useable in generating summaries.

A parser creates an abstract syntax tree of the formula. It will apply any distributive requests to requested variables. Functions will create additional data, by passing in the current dataframe and executing the command.

This concludes the syntax phase of compiling a table. The next phase is where semantic meaning of the formula is created.

Statistical Analysis and Summary

The user now has the choice what semantic content is desired for constructing the statistical summary. One might appreciate the default summary statistics and asthetic layout of Hmisc. One might want to generate data ready for the New England Journal of Medicine or it might just be statistics about a particular model that is central to the idea.

At this point, if statistical p-values are to be used a table should have a consistent viewpoint of what the null hypothesis is. A consistent viewpoint is essentially to a readers understanding the collection of information being presented. For example, Hmisc takes the viewpoint that the null hypothesis is indepedence of variables between row and column. Thus the table is exploring what possible relationships exist, and giving the reader a feel for the ranges of the data. Then based on what data type a variable is an appropriate statistical test is chosen.

Notes on Data Types

In preparing this reference implementation, it was discovered that there are some fundamental basic types of data in relationship to statistical operations. Unsurprisingly, most class or type definitions in languages represent the underlying machine storage format. This viewpoint of type is at odds with being able to succienctly define how table summaries are generated from provided data. A formal definition of the types of data available is required. However, the user of the library can freely change or amend the types provided and/or type determination.

The default of this library is similar choices as made by Hmisc. That is, a column of data in a data frame will be classified into one of the following types:

Binomial
Categorical
Numerical

The consequence of supporting these types must be explored in terms of the algebraic operators. First of all a Categorical or Binomial will expand to be a number of columns or rows corresponding to their groups. Binomial is kept as a special case for handling dropping of one for terseness in expression. Numerical is used as is, and will only have a single row or column in correspondence with it's variable.

A Categorical * Categorical creates nested groups for consideration, which results in a categorical.

A Numerical * Numerical will just treat this as the numerical product of the two variables.

A Categorical * Numerical, or vice versa creates several numerical variables for consideration, filtered by the category they are in.

Please, note that this is not a constraint of the table grammar language, but simply compiler choices. One is free to consume the abstract syntax tree and make different decisions about the meaning of Numerical * Numerical, and for that matter how types are determined and what they are. The important thing to remember is that all combinations of types be considered if writing a table compiler!

Hmisc Defaults

As mentioned default analysis bundle mimics Hmisc. An intersection occurs between variables defined on the columns and rows.

It performs a Chi^2 test for Categorical X Categorical. Each intersection of groups contains the overall fraction in that category.

The Continuous X Numerical intersection provides quantiles, and the results of a Kruskal–Wallis test.

Hmisc does not provide for a Continuous X Continous variable, but in remaining consistent with other tests a Spearman correlation test is provided.

Design as a Table of Tables

Internally, a table consists of cells. A cell may be renderable, or it might be another table. An expansion function for flattening a table is used prerendering.

This choice forces a consistency requirement upon any author of compiler packages for tables. The number of rows and columns that analysis generates must be consistent across types. For example, for the default Hmisc descriptive compiler, the following table shows how many cells (rows X columns) are generated when analysis is done between a row type and a column type:

	Binomial	Categorical (M values)	Numerical
Binomial	1 X 3	1 X (M + 2)	1 X 3
Categorical (N values)	(N+1) X 3	(N+1) X (M + 2)	(N+1) X 3
Numerical	1 X 3	1 X (M + 2)	1 X 3

Note that the first term is consistent across each row, and the second term is consistent across each column. This insures that upon flattening that the number of rows and columns remain consistent.

Note: Multicolumn and multirow formatting is on the todo list.

Full BNF of formula syntax

<table-formula>        ::= <expression> "~" <expression> 
<expression>           ::= <term> "+" <expression> | <term> 
<term>                 ::= <factor> "\*" <term> | <factor> 
<factor>               ::=  "(" <expression>" ")"                   |
                            <variable>                              |
                            <function-name> "(" <r-expression> ")"
<function-name>        ::= <identifier>
<variable>             ::= <identifier>
                            ( "::" <identifier> )
                            ( "["  ( <integer>  | '"' <format> '"' "]" )
<identifier>           ::= 1 | ( [A-Za-z\_] | .[A-Za-z\_] ) [A-Za-z0-9\_.]+
 
 
<format>               ::= "%" (<flags>) (<width>) (. <precision>) <specifier> 
<flags>                ::= <flag> | <flag> <flags>
<flag>                 ::=  "-" | "+" | " " | "#" | "0"
<width>                ::= <integer>
<precision>            ::= <integer>
<specifier>            ::= [diuoxXfFeEgGaAcspn]

<integer>              ::= [0-9]+

A variable identifier can specify desired resolution, and / or the type it should be treated as. For example: albumin[2]::Numerical specifies that albumin should be reported with 2 significant digits and treated as a Numerical variable. An alternate approach allows for sprintf specification, such as albumin["%0.2g"]

Aside on Statistical Data Types

Instead of focusing on machine representation as type, what if statistical type were the focus? The following types I feel are more sensible and useful to real world measurements:

Binomial
Categorical
Ordinal
Count
Integer
Rational
DateTime
String
Complex
Vector of any of the above
Matrix of any of the above

This information defines what operations and tests can be done on data far better than worrying about the number of bits in the storage format. The type could define exactly what tests are allowed on data.

tangram's People

Contributors

Stargazers

Watchers

Forkers

rishi0812 sbalci clinicopath

tangram's Issues

Investigate Pandoc comment support for Word.

integration with jamovi

I think it would be very useful if there is a tangram extension for Jamovi.
I thought this when reading this discussion:
https://twitter.com/thosjleeper/status/1003227915981553664
https://twitter.com/serdarbalci/status/1008364257505923073

allowing graphs as table elements

Hi Shawn,

wow, this has come a long way since I checked back last time! Congratulations on the amazing progress. I didn't really put much more effort into my take on descriptive tables (https://github.com/kkmann/describr), it is mostly hard coded and in a terrible (mostly working) state ;).

The idea that got me going in the first place was to allow graphs as 'intersection functions' as well so that you could have e.g. histograms or boxplots in your table (I personally feel that there are too many numbers floating around in most medical papers...).

The problem with graphics is obviously the choice of the back-end, but have you given that any thoughts? describer essentially just formats everything as a plot thus being more or less back-end independent.

Best,

Kevin

SAS Table Proc Processor

Need similar functionality to SAS's table proc function, except it must take any statistical function that compares two variables within R. This would really open up potential use cases for the library to huge array of possibilities.

Constraints:

A single function for the entire table only. Focus--no multiple choice statistical comparisons. The function may generate multiple outputs, e.g. mean + Standard deviation.
Interface must be as simple as possible.

Questions:

How will it handle the '*' operator? As factor? As multiplication?

Date of Report with Word Fields

Need some method to preserve date of report generation inside word properties / fields.

Setting rowspan / colspan

Is it possible to set rowspan / colspan for a particular cell?

chi square for 1 ~ var1 + var2

Hey,

absolutely love the package, can't wait for the JSS article! One quick question: When I do not have strata in the table, e.g. using the formula 0 ~ var1 + var2, what exactly are the the null hypotheses tested by hmisc? How do I suppress testing globally?

Multi line footnotes

Multiple line footnotes cause the table to render multiple times. It should just render multiple line foot notes in HTML5.

Switch LaTeX rendering to longtable

Really long tables don't split properly unless they are specified with longtable in LaTeX. Need to switch everything in render-latex.R to that for better rendering.

Documentation Rollup

Many of the functions are closely related into principal groups. Right now the doxygen documents them separately, it would be better if they had single help pages for all the related functions to give the user a more consistent view.

P-value Handling

Special formating of P-value based on significance, and also displaying lower limits is desireable. This should be done after handling Issue #21 .

Matched Cohort Transform

kylerove has provided a example transform for Matched Cohort studies that follows the template layout of the Hmisc statistics (#48). It copies a lot of boiler plate code to accomplish it's goal.This ticket is to include this as a default transform available with an example and to modify the hmisc transform possibly to handle requests like this with minimal coding.

Word Table Formatting

Can a means to easily format word tables to Lancet / NEJM style be devised? If so, do it!

Unable to create big tables

I was trying to create a large table with ~34 variables (group ~ 1 + 2 + 3 + ... + 34) and consistently would get an error on the 24th variable. I thought it might be a problem with that variable, but even if I removed it, whatever the 24th variable was would cause it to error out:

"Error in self$next_token() : Unparseable input starting at
postop_"

23 variables (most are binary, some categorical) seems like a strange number. Not sure what the issue is.

[edit] This is using the latest code here on github.

Word Update Macro Bundled

For users creating a manuscript external to the R/RStudio environment, a means to update their manuscript automatically is necessary. The table generator provides this via an external indexing method. A macro to read this file and update a manuscript exists in Word. This needs to be added into the package.

Drop embedded flag on table

It would be better if the "embedded" flag was not required. One could simply flatten if necessary. Detecting when this is needed is a bit more difficult and should be recursive.

Correlation value formatting too many digits.

Add LaTeX Rendering

Add more indenting spacing control

I have a request to add the ability to indent row headers in the manner of kable, as well as add breaks with a description in a table. This request is urgent for the user.

Test statistic not showing any longer

I don't know when this changed, but its been a few months since I ran my code. Now it is not showing the test statistic column. I tried with no transform, hmisc, my own transform. What am I missing?

table1 <- tangram(group ~ age[2] + sex + height[1] + weight[1] + bmi[4] + bmi_z[2] + tmi[6] + race + language + insurance + distance[1] + match_vp_shunt + match_spina_dx + match_past_surgery + neurologic_lesion + neurologic_lesion_level + match_ambulation + diagnosis_1 , data=recordsData, transform=hmisc, id="table1")

Underscores and Periods in LaTeX titles

Underscores and Periods in table headers have shown some issues with generating LaTeX which doesn't compile.

RMS Example Vignette

A very clear vignette that shows the RMS functionality off for comparing models is needed.

summary_table should be tangram

Hello,

I just want to point out that in your vignettes and examples, the function call summary_table() should be tangram()

At least that's what I had to do for it to work

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tangram_0.2.6        magrittr_1.5         digest_0.6.12        base64enc_0.1-3      stringr_1.2.0       
 [6] R6_2.2.1             ggthemes_3.4.0       wesanderson_0.3.2    Hmisc_4.0-3          Formula_1.2-1       
[11] survival_2.41-3      hrbrthemes_0.3.2     bindrcpp_0.1         rstan_2.15.1         StanHeaders_2.15.0-1
[16] forcats_0.2.0        pscl_1.4.9           lattice_0.20-35      MASS_7.3-47          scales_0.4.1.9000   
[21] dplyr_0.7.0          purrr_0.2.2.2        readr_1.1.1          tidyr_0.6.3          tibble_1.3.3        
[26] ggplot2_2.2.1        tidyverse_1.1.1     

loaded via a namespace (and not attached):
 [1] httr_1.2.1            jsonlite_1.5          splines_3.4.0         modelr_0.1.0          assertthat_0.2.0     
 [6] stats4_3.4.0          latticeExtra_0.6-28   pander_0.6.0          cellranger_1.1.0      yaml_2.1.14          
[11] Rttf2pt1_1.3.4        backports_1.1.0       glue_1.1.0            extrafontdb_1.0       checkmate_1.8.2      
[16] RColorBrewer_1.1-2    rvest_0.3.2           colorspace_1.3-2      htmltools_0.3.6       Matrix_1.2-10        
[21] plyr_1.8.4            psych_1.7.5           broom_0.4.2           haven_1.0.0           htmlTable_1.9        
[26] janitor_0.3.0         nnet_7.3-12           lazyeval_0.2.0        mnormt_1.5-5          readxl_1.0.0         
[31] evaluate_0.10         nlme_3.1-131          xml2_1.1.1            foreign_0.8-68        data.table_1.10.4    
[36] tools_3.4.0           hms_0.3               munsell_0.4.3         cluster_2.0.6         compiler_3.4.0       
[41] rlang_0.1.1           grid_3.4.0            rstudioapi_0.6.0.9000 htmlwidgets_0.8       labeling_0.3         
[46] rmarkdown_1.6         gtable_0.2.0          codetools_0.2-15      inline_0.3.14         reshape2_1.4.2       
[51] gridExtra_2.2.1       lubridate_1.6.0       knitr_1.16            extrafont_0.17        bindr_0.1            
[56] rprojroot_1.2         stringi_1.1.5         parallel_3.4.0        Rcpp_0.12.11          rpart_4.1-11         
[61] acepack_1.4.1

Eliminate table_builder

The original table_builder object is only semantically different by having a row/col position. It should be eliminated such that all operators work on any tangram object. This can be accomplished by adding these attributes to the tangram object. This would really make the library more concise and useful.

Add RTF Rendering

Add cross product '*' categories to Hmisc processing

The handling of '*' categories to Hmisc processing would be a wonderful additional to round out the summary_table hmisc_style processor.

What happened to LaTeX?

Hi,

great project, if this works out well, it might become more important than ggplot considering that there are quite some empirical papers without graphs but few without tables ;)
So, what are the plans concerning LaTeX output? My use case is the usual *.Rmd -> .pdf workflow via RStudio and I saw a mention of a latex() method in the docs but not the code.

Are there any plans on deepening the documentation/publishing this at JSS? I tried to understand the concept from the source files but did not quite get it.
Will it be possible to define custom types via something like X1::MyType[param1, param2, ...] in the formula and defining the behavior for MyType? Otherwise defining custom tables could become quite nasty if you want all numerics in a specific format with say median and IQR.

Will it be possible to define the layout for each row/col intersection? I am thinking of a table with many statistics and many strata - it might be wise to layout the statistics vertically to save horizontal space. In extreme cases it might also be necessary to go the 'ultra-wide' way by aligning the strata vertically as well...

summary_table method dispatch

instead of continuing to make new summary_XXX routines, just use method dispatch and base it on the incoming class.

LaTeX Special Character Handling

Need to test rendering in LaTeX of all special characters. There are some issues with math versus text mode in the conversions.

Fix space bug in submitting formula via text.

If specifying a formula, trailing spaces cause issues.

Multi-column / Multi-rows Design

A design of how to deal with multiple column entries and multiple row entries needs to be considered.

CSS Centering Inconsistent

Rework CSS centering of table elements to be consistent for Hmisc, NEJM, and Lancet styles.

Rmarkdown transformation

Special Rmarkdown for text should be handled through library from raw text. In particular for text: bold, italic, larger, smaller, superscript, subscript.

Function for any value to be indexed

Need to add documentation for any value to be indexed in a report.

Make transforms style aware, Refactor Cell helpers.

The time has come to make the transforms themselves style aware to simplify the abstract representation. Previously the library contained all information and the renderer made the style decisions for in cell formatting. This has led to some complexity in interfacing with the many different R table packages. The goal of this ticket is to refactor to make all tables in memory contain only Rmarkdown and necessary extensions. Decisions on style can be made now in the various cell_* helper routines, and also overridden.

Index Granularity

There has been an open question of the indexing method granularity. The reproducible manuscript workflow has answered this, and it must be at the most granular level. This is necessary such that cut & paste and later substitution is possible at the lowest possible boundary--which is internal to a table cell.

This effort is blocked by the refactoring of cell definitions inside the table. Issue #6

Composition Functions: cbind and rbind

The ability to paste tables together via cbind and rbind needs to be added. Composition operators for these are needed as well, cbind => "+" and rbind => "%>%". This would satisfy some immediate needs.

CSV output and markdown

Is there a way to strip markdown formatting for specific output types. For example, I'm trying to export a table to CSV and import into Excel. Excel does not parse markdown, so the formatting is unnecessary. I was going to try creating my own style but it seems that markdown is applied to this regardless?

Reproducible Manuscript Vignette

A full walkthrough of the reproducible manuscript work flow with changing data and collaborators using Word external to R example is needed.

Refactor Cell Definitions

Cell definitions are too limited and do not include comprehensive Greek letter values.

Any text value can include a subset of LaTeX greek or math symbols that will render appropriately into the various output formats. Text values can also be bold or italic. This indicates that consideration needs to be made towards text formats in general. Should not get too broad use HTML5 as guideline and least common denominator.

Current possible cell definitions.

Numerical Vector with possible name. Singles treated as single value, Doubles as Range, Triples as IQR, multiples as joined list.
Text Only (with header/subheader attributes)--can have math symbols like Infinity, etc.
A list of elements that are rendered comma separated.

Open question. What about N value handling?

Thanks and request for descriptive table (a.k.a. Table 1)

Thank you very much for this invaluable package.

I am not sure if this is answered, sorry in advance for duplication.

Would you please guide me if there is a way to produce a table with just descriptive stats, not as a cross table and without statistics?

An example table:

Add RTF Lancet Style

knitr integration

The separate rendering calls are not necessary. Refactor to

a) Make additional rendering information attributes of tangram object

id
caption
style
fragment

b) Create router that detects knitr rendering and calls appropriate renderer

c) Keep interface backward compatible for now if possible. Use deprecation / change warnings.

AST formula processing

Need to fully support R formula parsing. To do this the following steps are requied:

Document using BNF what an R formula is and what it means. Relay this to core R team for inclusion in current documentation.
Expand parser to contain all possibilities of an R formula for table processing.

This will then lead to a greatly expanded number of possible statistical transforms for the Hmisc example.

Auditing Word

When working with updating Word fields from a report. Need to add comments for the following:

Any value that was changed.
Any value that is orphaned.

This will allow a manuscript preparer to see and audit exactly what has occurred during an update.

Units on a label seemed to have vanished.

Add RTF NEJM Style

Word Formatting

When converting to Word output (Rmd with indexing), formatting is getting lost by the VB field conversion macro. Formatting characters needs to be moved outside of all special index handling.

Architecture Diagram

Main README needs an architecture diagram to summarize layers and purpose of each layer.

Binary variable inserts extra row

Sorry another bug.

In this example I showed previously, the output for one of the binary variables (postop_mobilization_check) contains an extra row:

table3 <- tangram(group ~ eras_score[0] + eras_teaching_check[0] + preop_carb_check[0] + preop_diet_check[0] + preop_bowel_prep_check[0] + antibiotic_check[0] + dvt_ppx_check[0] + normothermia_check[0] + regional_anesthesia_check[0] + intraop_opioid_check[0] + intraop_fluids_check[0] + intraop_min_invasive_check[0] + intraop_ng_check[0] + intraop_drains_check[0] + postop_diet_check[0] + postop_ivf_check[0] + postop_mobilization_check[0] + postop_excessdrainremoval_check[0] + postop_adjunctive_check[0] + postop_antiemetic_check[0] + anastomosis, data=allSingle, transform=my_percent) %>% del_col(2)

This seems to occur randomly and only occasionally. There is nothing about that variable that is different as far as I can see (type, factor, contents, etc).

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.