Many tables in this corpus and similar papers make use of multiple rows of column headers. This provides nested or tree-structured headings.
Example:
![nestedtableheadingexample_svg](https://user-images.githubusercontent.com/28900786/28775190-9c188df8-75e8-11e7-8331-fcc7b6145fa4.png)
Currently, only one row of column headers is preserved by norma. This affects the HTML and CSV outputs.
In this example the upper line of headings (MUSIC | CBTE | CBTT | Total) intended as column headings are not included in the final formats for the table:
HTML:
![nestedtableheadingexample_html](https://user-images.githubusercontent.com/28900786/28775242-d7c2d160-75e8-11e7-89e4-38c543784e43.png)
CSV:
[extract from start of file]
"Demographic variable ","(n = 19) ","(n = 18) ","(n = 18) ","(N = 55) "
Age,40.37 (9.64),40.72 (11.02),41.39 (12.73),40.61 (10.99)
Gender (female),7 (37%),12 (67%),6 (32%),25 (45%)
Indigenous status (Aboriginal),0 (0%),2 (11%),0 (0%),2 (4%)
Cause:
The current table output format uses a simple HTML4/XHTML structure:
<table>
<caption />
<tr><th></th> ... </tr>
<tr><td></td> ... </tr>
...
</table>
Only one header row (i.e., row of form <tr><th></th> ... </tr>
) is included. This is added in module svg2xml
in TableContentCreator.java
in method addHeader
.
A solution to this issue which would simplify future development would be to make use of the higher-level table-structuring elements introduced in HTML4. Specifically an HTML4 table has syntax:
<!ELEMENT TABLE - -
(CAPTION?, (COL*|COLGROUP*), THEAD?, TFOOT?, TBODY+)>
So <thead>
could be used to group multiple header rows semantically without adding attributes to indicate header rows etc. for downstream processing and differentiation from rows of observation data.
Impact: Downstream processing would need a small refactor to take account of tables with a mixture of <tr>
and the other grouping elements <thead>
, <tbody>
and <tfoot>
. This would affect svg2xml
, html
and possibly other modules.