Comments (5)
First half of fix is:
/* 18.4 Shared String Table */
function parseStrs(data) {
var s = [];
var sst = data.match(new RegExp("<sst ([^>]*)>([\\s\\S]*)<\/sst>","m"));
if(sst) {
s = sst[2].replace(/<si>/g,"").split(/<\/si>/).map(function(x) { var z = {};
var y=x.match(/<([^>]*)>([\s\S]*)<\/[^>]*>/);
if(y) z[y[1].split(" ")[0]]=utf8read(unescapexml(y[2]));
return z;
});
sst = parsexmltag(sst[1]); s.Count = sst.count; s.Unique = sst.uniqueCount;
}
return s;
}
(the inner match expression has changed).
Since this is apparently a rich text string, it has a 'r' instead of a 't' value, so the retrieval in parseSheet, which assumes:
case 's': p.v = strs[parseInt(p.v, 10)].t; break;
Will still return undefined.
This should probably still be undefined, and the readme should be updated to note that rich text in cells is not supported by the converter.
The fix to the inner match, however, should be done so that expressions are properly extracted into the Strings array.
from sheetjs.
@mitchellsundt thanks for reporting this.
I think the "Right Thing" looks like this:
- cells have a
.r
or some other property for the rich text - function to convert those values to proper HTML (replacing special tags
likerFont
) - for each cell, the
.t
will hold a plaintext rendering (strip out the
formatting)
I'm not in front of computer now; I expect to push this later tonight
On Tue, Apr 30, 2013 at 6:42 PM, Mitch Sundt [email protected]:
First half of fix is:
/* 18.4 Shared String Table /
function parseStrs(data) {
var s = [];
var sst = data.match(new RegExp("])>([\s\S])</sst>","m"));
if(sst) {
s = sst[2].replace(//g,"").split(/</si>/).map(function(x) { var z = {};
var y=x.match(/<([^>])>([\s\S])</[^>]>/);
if(y) z[y[1].split(" ")[0]]=utf8read(unescapexml(y[2]));
return z;
});sst = parsexmltag(sst[1]); s.Count = sst.count; s.Unique = sst.uniqueCount;
}
return s;}
(the inner match expression has changed).
Since this is apparently a rich text string, it has a 'r' instead of a 't'
value, so the retrieval in parseSheet, which assumes:case 's': p.v = strs[parseInt(p.v, 10)].t; break;
Will still return undefined.
This should probably still be undefined, and the readme should be updated
to note that rich text in cells is not supported by the converter.The fix to the inner match, however, should be done so that expressions
are properly extracted into the Strings array.—
Reply to this email directly or view it on GitHubhttps://github.com//issues/25#issuecomment-17259038
.
from sheetjs.
Here's a revised parseStrs that ensures that the <t>...</t> tags match at the beginning and end and do not have any XML (open angle brackets) between those tags. Strings[] objects then have .t for simple text, and .raw for XML text (because that is a raw series of XML elements -- see below)
/* 18.4 Shared String Table */
function parseStrs(data) {
var s = [];
var sst = data.match(new RegExp("<sst ([^>]*)>([\\s\\S]*)<\/sst>","m"));
if(sst) {
s = sst[2].replace(/<si>/g,"").split(/<\/si>/).map(function(x) { var z = {};
var simpleString=x.match(/^<t>([^<]*)<\/t>$/);
if (simpleString) {
if (simpleString) {
z['t'] = utf8read(unescapexml(simpleString[1]));
} else {
z['raw'] = x;
}
return z;
});
sst = parsexmltag(sst[1]); s.Count = sst.count; s.Unique = sst.uniqueCount;
}
return s;
}
In the parseSheet function, I did this for the 's' case:
case 's': {
var sval = strs[parseInt(p.v, 10)];
if ( "t" in sval ) {
p.v = sval.t;
} else {
delete p.v;
p.raw = sval.raw;
}
}
break;
I.e., put the XML string into the .raw field. My example XLSX file has the following for the 'raw' string:
"<r><t xml:space="preserve">This inserts a 'joins' entry into the column_definitions table for the </t></r><r><rPr><b/><sz val="10"/><color rgb="FF000000"/><rFont val="Arial"/><family val="2"/></rPr><t>household_id</t></r><r><rPr><sz val="11"/><color theme="1"/><rFont val="Calibri"/><family val="2"/><scheme val="minor"/></rPr><t xml:space="preserve"> column of the </t></r><r><rPr><b/><sz val="10"/><color rgb="FF000000"/><rFont val="Arial"/><family val="2"/></rPr><t>household</t></r><r><rPr><sz val="11"/><color theme="1"/><rFont val="Calibri"/><family val="2"/><scheme val="minor"/></rPr><t xml:space="preserve"> table_id of the form:
"[ { table_id: household_member, element_name: household_id } ]"
</t></r>"
Note that this is just a mixed series of <r> and <t> tags so it isn't wrapped by any enclosing tag
from sheetjs.
@mitchellsundt I pushed an update that puts the plaintext in the .t field even if the text is rich (and populates the .r field for plain text as well as rich text), so for plaintext uses (like CSV) this should suffice. The other half (actually parsing the rich text and generating HTML or other output) is in the pipeline.
As for missing support, the ultimate goal is 100% coverage (so any bug or missing feature is very much undesired
from sheetjs.
Thanks!
On Tue, Apr 30, 2013 at 10:05 PM, Niggler [email protected] wrote:
@mitchellsundt https://github.com/mitchellsundt I pushed an update that
puts the plaintext in the .t field even if the text is rich (and populates
the .r field for plain text as well as rich text), so for plaintext uses
(like CSV) this should suffice. The other half (actually parsing the rich
text and generating HTML or other output) is in the pipeline.As for missing support, the ultimate goal is 100% coverage (so any bug or
missing feature is very much undesired—
Reply to this email directly or view it on GitHubhttps://github.com//issues/25#issuecomment-17268924
.
Mitch Sundt
Software Engineer
University of Washington
[email protected]
from sheetjs.
Related Issues (20)
- Gettting an Array, but of scrambled data 😰 HOT 2
- sheet_to_json from remote file not working HOT 2
- Unable to set name for worksheet separately from workbook HOT 3
- xlsx CP932 Incorrect output HOT 3
- Error appears in bower js-xlsx#~0.11.5 HOT 4
- sheet_to_json: inconsistent blank cell parsing HOT 2
- format entire column HOT 2
- reading text in shapes HOT 1
- Rearranging the XLSX.write order HOT 1
- Add VBA script to file I am creating from scratch HOT 2
- Wrong filename when download file HOT 1
- How to Define name for a range ? HOT 1
- Need to Prevent formatting of dates while reading a csv
- No option to change delimiter when writing CSV HOT 1
- Export to excel- Hyperlink doesn't work on documentation page HOT 1
- reading and writing the excel with Symbols,photos, in different tabs.
- QUOTE not defined HOT 9
- [Security] Prototype Pollution in sheetJS HOT 27
- Archive this GitHub Project HOT 3
- Thanks!
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sheetjs.