Giter Site home page Giter Site logo

Comments (4)

ross-spencer avatar ross-spencer commented on August 20, 2024

Okay, this is an interesting one. I hadn't realized I logged the original ticket. That was useful for context though. I realized what we were seeing when that was logged was the corporate firewall getting in the way of JHOVE trying to communicate with the Harvard servers to download the configuration schema to then validate the configuration document.

Actually, we can now recreate it. I found a website running the same firewall and then pointed the configuration at it:

<?xml version="1.0" encoding="UTF-8"?>
<jhoveConfig version="1.0"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xmlns="http://hul.harvard.edu/ois/xml/ns/jhove/jhoveConfig"
 xsi:schemaLocation="http://hul.harvard.edu/ois/xml/ns/jhove/jhoveConfig
                     https://web.archive.org/web/20190917103633/http://nsa.stuart-hall.org/dynPolLoginRedirect.html">

image

[Fatal Error] dynPolLoginRedirect.html:1:3: The markup in the document preceding the root element must be well-formed.
May 02, 2020 8:03:51 PM JhoveView errorAlert
WARNING: Error parsing configuration file: The markup in the document preceding the root element must be well-formed.

We can also mess with the configuration with other sources of non-XSD:

Pointing it at Github:

<?xml version="1.0" encoding="UTF-8"?>
<jhoveConfig version="1.0"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xmlns="http://hul.harvard.edu/ois/xml/ns/jhove/jhoveConfig"
 xsi:schemaLocation="http://hul.harvard.edu/ois/xml/ns/jhove/jhoveConfig
                     https://github.com/">

image

[Error] :32:68: s4s-elt-character: Non-whitespace characters are not allowed in schema elements other than 'xs:appinfo' and 'xs:documentation'. Saw 'The world’s leading software development platform · GitHub'.
[Fatal Error] :79:59: Attribute name "data-pjax-transient" associated with an element type "meta" must be followed by the ' = ' character.
May 02, 2020 8:04:53 PM JhoveView errorAlert
WARNING: Error parsing configuration file: Attribute name "data-pjax-transient" associated with an element type "meta" must be followed by the ' = ' character.

Pointing it at the original SourceForge issue:

<?xml version="1.0" encoding="UTF-8"?>
<jhoveConfig version="1.0"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xmlns="http://hul.harvard.edu/ois/xml/ns/jhove/jhoveConfig"
 xsi:schemaLocation="http://hul.harvard.edu/ois/xml/ns/jhove/jhoveConfig
                     https://sourceforge.net/p/jhove/bugs/51/">

image

[Error] :35:88: s4s-elt-character: Non-whitespace characters are not allowed in schema elements other than 'xs:appinfo' and 'xs:documentation'. Saw 'JHOVE / Bugs / #51 JhoveView: Markup Parsing Error: dynPolLoginRedirect.html'.
[Error] :52:36: s4s-elt-character: Non-whitespace characters are not allowed in schema elements other than 'xs:appinfo' and 'xs:documentation'. Saw 'if (!window.SF) { window.SF = {}; }'.
[Error] :53:21: s4s-elt-character: Non-whitespace characters are not allowed in schema elements other than 'xs:appinfo' and 'xs:documentation'. Saw 'SF.sandiego = false;'.
[Error] :54:27: s4s-elt-character: Non-whitespace characters are not allowed in schema elements other than 'xs:appinfo' and 'xs:documentation'. Saw 'SF.sandiego_chrome = true;'.
[Error] :55:35: s4s-elt-character: Non-whitespace characters are not allowed in schema elements other than 'xs:appinfo' and 'xs:documentation'. Saw 'SF.cdn = "https://a.fsdn.com/con";'.
[Error] :87:23: s4s-elt-character: Non-whitespace characters are not allowed in schema elements other than 'xs:appinfo' and 'xs:documentation'. Saw 'div.moderate {'.
[Error] :88:24: s4s-elt-character: Non-whitespace characters are not allowed in schema elements other than 'xs:appinfo' and 'xs:documentation'. Saw 'color:grey;'.
[Error] :89:10: s4s-elt-character: Non-whitespace characters are not allowed in schema elements other than 'xs:appinfo' and 'xs:documentation'. Saw '}'.
[Error] :114:25: s4s-elt-character: Non-whitespace characters are not allowed in schema elements other than 'xs:appinfo' and 'xs:documentation'. Saw '/* make URL '.
[Fatal Error] :114:26: The entity name must immediately follow the '&' in the entity reference.
May 02, 2020 8:05:43 PM JhoveView errorAlert
WARNING: Error parsing configuration file: The entity name must immediately follow the '&' in the entity reference.

Impact

So for all that playing about the impact here is as one might imagine is that jhove-view opens correctly, but then we can't do much with the window, i.e. none of the modules are loaded so if we drag and drop into the window a processing pop-up appears but there's no processing happening as far as I can see. If you try selecting a module there are no entries.

What do we do?

Rightfully this is marked as a low-priority task, but are there some things we might consider doing @carlwilson? e.g. to make this more robust?

A couple of ideas:

  1. XML validation for a config document seems like quite a high bar. Can we skip validation of the config entirely? Is XML still the right choice for a config document?

  2. Do we ask JHOVE to exit entirely and more cleanly, maybe with a clearer message? I.e. once we know we haven't the schema to validate against, we can let the user know that? Right now we're asking the user to react to an unfiltered message from the XML validation, we can parse and translate that to say the config validation didn't work because the schema was invalid?

  3. Something else? Maybe load a default configuration? (A downside of that is that certain modules may not be installed to be accessed.)

I feel like there is room to do something here, but I'm not sure the appetite of the project.

from jhove.

carlwilson avatar carlwilson commented on August 20, 2024

Nice work @ross-spencer. JHOVE config is an area that requires a little more work as the codes quite old. I do agree about the validation but am happy to take a look a little further down the line once the final stream is underway in a week.

And add me to the assigned list.

from jhove.

MartinSpeller avatar MartinSpeller commented on August 20, 2024

JhoveView: Markup Parsing Error: dynPolLoginRedirect.html #116 - Assigned to ross-spencer

from jhove.

ross-spencer avatar ross-spencer commented on August 20, 2024

@carlwilson Nice. Shall we update the ticket name now too do you reckon, maybe, to begin, something like: JhoveView: Markup Parsing Error: when the jhoveConfig.xsd schema location is improperly redirected?

from jhove.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.