Giter Site home page Giter Site logo

Comments (5)

mitchellsundt avatar mitchellsundt commented on May 5, 2024

First half of fix is:

/* 18.4 Shared String Table */
function parseStrs(data) {
var s = [];
var sst = data.match(new RegExp("<sst ([^>]*)>([\\s\\S]*)<\/sst>","m"));
if(sst) {
    s = sst[2].replace(/<si>/g,"").split(/<\/si>/).map(function(x) { var z = {};
        var y=x.match(/<([^>]*)>([\s\S]*)<\/[^>]*>/);
        if(y) z[y[1].split(" ")[0]]=utf8read(unescapexml(y[2]));
        return z;
    });

    sst = parsexmltag(sst[1]); s.Count = sst.count; s.Unique = sst.uniqueCount;
}
return s;
}

(the inner match expression has changed).

Since this is apparently a rich text string, it has a 'r' instead of a 't' value, so the retrieval in parseSheet, which assumes:

            case 's': p.v = strs[parseInt(p.v, 10)].t; break;

Will still return undefined.

This should probably still be undefined, and the readme should be updated to note that rich text in cells is not supported by the converter.

The fix to the inner match, however, should be done so that expressions are properly extracted into the Strings array.

from sheetjs.

redchair123 avatar redchair123 commented on May 5, 2024

@mitchellsundt thanks for reporting this.

I think the "Right Thing" looks like this:

  • cells have a .r or some other property for the rich text
  • function to convert those values to proper HTML (replacing special tags
    like rFont)
  • for each cell, the .t will hold a plaintext rendering (strip out the
    formatting)

I'm not in front of computer now; I expect to push this later tonight

On Tue, Apr 30, 2013 at 6:42 PM, Mitch Sundt [email protected]:

First half of fix is:

/* 18.4 Shared String Table /
function parseStrs(data) {
var s = [];
var sst = data.match(new RegExp("]
)>([\s\S])</sst>","m"));
if(sst) {
s = sst[2].replace(//g,"").split(/</si>/).map(function(x) { var z = {};
var y=x.match(/<([^>]
)>([\s\S])</[^>]>/);
if(y) z[y[1].split(" ")[0]]=utf8read(unescapexml(y[2]));
return z;
});

sst = parsexmltag(sst[1]); s.Count = sst.count; s.Unique = sst.uniqueCount;

}
return s;

}

(the inner match expression has changed).

Since this is apparently a rich text string, it has a 'r' instead of a 't'
value, so the retrieval in parseSheet, which assumes:

        case 's': p.v = strs[parseInt(p.v, 10)].t; break;

Will still return undefined.

This should probably still be undefined, and the readme should be updated
to note that rich text in cells is not supported by the converter.

The fix to the inner match, however, should be done so that expressions
are properly extracted into the Strings array.


Reply to this email directly or view it on GitHubhttps://github.com//issues/25#issuecomment-17259038
.

from sheetjs.

mitchellsundt avatar mitchellsundt commented on May 5, 2024

Here's a revised parseStrs that ensures that the <t>...</t> tags match at the beginning and end and do not have any XML (open angle brackets) between those tags. Strings[] objects then have .t for simple text, and .raw for XML text (because that is a raw series of XML elements -- see below)

/* 18.4 Shared String Table */
function parseStrs(data) {
    var s = [];
    var sst = data.match(new RegExp("<sst ([^>]*)>([\\s\\S]*)<\/sst>","m"));
    if(sst) {
        s = sst[2].replace(/<si>/g,"").split(/<\/si>/).map(function(x) { var z = {};
            var simpleString=x.match(/^<t>([^<]*)<\/t>$/);
            if (simpleString) {
            if (simpleString) {
                z['t'] = utf8read(unescapexml(simpleString[1]));
            } else {
                z['raw'] = x;
            }
            return z;
        });

        sst = parsexmltag(sst[1]); s.Count = sst.count; s.Unique = sst.uniqueCount;
    }
    return s;
}

In the parseSheet function, I did this for the 's' case:

            case 's': {
                    var sval = strs[parseInt(p.v, 10)];
                    if ( "t" in sval ) {
                        p.v = sval.t;
                    } else {
                        delete p.v;
                        p.raw = sval.raw;
                    }
                }
                break;

I.e., put the XML string into the .raw field. My example XLSX file has the following for the 'raw' string:

"<r><t xml:space="preserve">This inserts a 'joins' entry into the column_definitions table for the </t></r><r><rPr><b/><sz val="10"/><color rgb="FF000000"/><rFont val="Arial"/><family val="2"/></rPr><t>household_id</t></r><r><rPr><sz val="11"/><color theme="1"/><rFont val="Calibri"/><family val="2"/><scheme val="minor"/></rPr><t xml:space="preserve"> column of the </t></r><r><rPr><b/><sz val="10"/><color rgb="FF000000"/><rFont val="Arial"/><family val="2"/></rPr><t>household</t></r><r><rPr><sz val="11"/><color theme="1"/><rFont val="Calibri"/><family val="2"/><scheme val="minor"/></rPr><t xml:space="preserve"> table_id of the form: 

"[ { table_id: household_member, element_name: household_id } ]"
</t></r>"

Note that this is just a mixed series of <r> and <t> tags so it isn't wrapped by any enclosing tag

from sheetjs.

redchair123 avatar redchair123 commented on May 5, 2024

@mitchellsundt I pushed an update that puts the plaintext in the .t field even if the text is rich (and populates the .r field for plain text as well as rich text), so for plaintext uses (like CSV) this should suffice. The other half (actually parsing the rich text and generating HTML or other output) is in the pipeline.

As for missing support, the ultimate goal is 100% coverage (so any bug or missing feature is very much undesired

from sheetjs.

mitchellsundt avatar mitchellsundt commented on May 5, 2024

Thanks!

On Tue, Apr 30, 2013 at 10:05 PM, Niggler [email protected] wrote:

@mitchellsundt https://github.com/mitchellsundt I pushed an update that
puts the plaintext in the .t field even if the text is rich (and populates
the .r field for plain text as well as rich text), so for plaintext uses
(like CSV) this should suffice. The other half (actually parsing the rich
text and generating HTML or other output) is in the pipeline.

As for missing support, the ultimate goal is 100% coverage (so any bug or
missing feature is very much undesired


Reply to this email directly or view it on GitHubhttps://github.com//issues/25#issuecomment-17268924
.

Mitch Sundt
Software Engineer
University of Washington
[email protected]

from sheetjs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.