Giter Site home page Giter Site logo

Comments (8)

elgonzo avatar elgonzo commented on May 27, 2024

EDIT: I forgot the modification necessary for the XPathAttribute(string xpathString) constructor in my code below. My apologies. Included now with this edit.


Yeah, that's a nice catch and a very good question.

If backwards compatibility here is important, i have an idea that both maintains backwards compatibility while also enabling proper usage of the NodeReturnType property.

The foundation underlying my idea is to make XPathAttribute keeping state about whether NodeReturnType has been explicitly set or not. For tracking whether an XPathAttribute instance or attribute annotation sets the NodeReturnType explicitly, XPathAttribute will use a boolean property/field indicating exactly that. I name it IsNodeReturnTypeExplicitlySet. It should be an internal property/field so that the HAP library itself can query it but it otherwise remains invisible to user code consuming the HAP library.

Something like this (abridged version of the suggested XPathAttribute implementation):

public sealed class XPathAttribute : Attribute
{
    ...

    internal bool IsNodeReturnTypeExplicitlySet { get; private set; }
    
    public ReturnType NodeReturnType
    {
        get => _nodeReturnType;
        set
        {
            _nodeReturnType = value;
            IsNodeReturnTypeExplicitlySet = true;
        }
    }
    private ReturnType _nodeReturnType;
    
    ...
    
    public XPathAttribute(string xpathString)
    {
        XPath = xpathString;
        //
        // Setting the backing field and NOT the NodeReturnType property itself
        // to avoid IsNodeReturnTypeExplicitlySet becoming true for this constructor.
        //
        _nodeReturnType = ReturnType.InnerText;
    }    
    
    public XPathAttribute(string xpathString, ReturnType nodeReturnType)
    {
        XPath = xpathString;
        //
        // Assigning the NodeReturnType property will
        // set the IsNodeReturnTypeExplicitlySet to true.
        //
        // Alternatively, to maintain similarity with the other constructor, the backing field
        // could be set plus an asssignment of IsNodeReturnTypeExplicitlySet like:
        //    _nodeReturnType = nodeReturnType;
        //    IsNodeReturnTypeExplicitlySet = true;
        //
        NodeReturnType = nodeReturnType;
        
    }

This is done specifically in a way to keep the IsNodeReturnTypeExplicitlySet flag operable even when a user only updates HAP without recompiling their user code/classes that make use of the [XPath] attribute.

The second step necessary is a modification of those call sites for Tools.GetHtmlForEncapsulation that in the previous HAP codebase didn't account for the NodeReturnType. Those call sites should only pass NodeReturnType values that are explicitly specified in an [XPath] attribute notation/instantiation, otherwise they should pass whatever would match the old code behavior, e.g., something like this (i am basing this on your PR #506):

innerHtmlDocument.LoadHtml(Tools.GetHtmlForEncapsulation(htmlNode, xPathAttribute.NodeReturnType));

would change into something like this:

    innerHtmlDocument.LoadHtml(
        Tools.GetHtmlForEncapsulation(
            htmlNode,
            (xPathAttribute.IsNodeReturnTypeExplicitlySet) ? xPathAttribute.NodeReturnType : ReturnType.InnerHtml
        )
    );

and

innerHtmlDocument.LoadHtml(Tools.GetHtmlForEncapsulation(node, xPathAttribute.NodeReturnType));

would in the same manner change into something like this:

    innerHtmlDocument.LoadHtml(
        Tools.GetHtmlForEncapsulation(
            node,
            (xPathAttribute.IsNodeReturnTypeExplicitlySet) ? xPathAttribute.NodeReturnType : ReturnType.InnerHtml
        )
    );

What do you think about this approach?

from html-agility-pack.

elgonzo avatar elgonzo commented on May 27, 2024

As an addendum, i pondered a while whether it is worthwhile to keep backward compatibility here, even though i gave an idea about how to maintain it. The old behavior from the old codebase is quite a bit inconsistent, and inconsistent behavior plus adding something more is at the end of the day still inconsistent behavior, just with some topping on it. :-P And trying to understand (or explain to someone) these inconsistencies in the GetEncapsulatedData behavior when it is used in somewhat complex scenarios could become a bit of a challenge, i fear.

Backwards compatibility is a rather strong argument, and usually i gravitate towards maintaining backwards compatibility unless there is a good reason not to. But i also feel the inconsistencies in the old GetEncapsulatedData implementation (including the related attributes such as [XPath]) related to the NodeReturnType handling are such a good reason to not keep this particular case of backwards compatibility.

Of course, not maintaining backwards compatibility and a breaking change in behavior of an API would only be warranted for a major version (first version number) or second version number increment in my opinion. So, there is that to consider too.

Just my 2c, of course. No strings attached, i am not a stakeholder in HAP, so feel free without worries to completely dismiss my expressed opinion if you disagree. :-)

from html-agility-pack.

rwecho avatar rwecho commented on May 27, 2024

@elgonzo I'm very glad that you could come and discuss this issue with me.
If we keep the backward compatibility, we can also modify the code :

innerHtmlDocument.LoadHtml(Tools.GetHtmlForEncapsulation(node, xPathAttribute.NodeReturnType==ReturnType.InnerText ? ReturnType.InnerHtml: xPathAttribute.NodeReturnType)); 

Because in the context of the code, if has checked the HasXPathAttribute

if (targetType.IsDefinedAttribute(typeof(HasXPathAttribute)) == true) // Object has xpath attribute (Defined HasXPath)

To the LoadHtml method, It should want an HTML instead of text (Are there any special cases that I haven't thought of?).

After all, this is a very old project, and I believe that maintaining compatibility is of higher priority. Additionally, we should also listen to @JonathanMagnan's advice.

from html-agility-pack.

elgonzo avatar elgonzo commented on May 27, 2024

@elgonzo I'm very glad that you could come and discuss this issue with me. If we keep the backward compatibility, we can also modify the code :

innerHtmlDocument.LoadHtml(Tools.GetHtmlForEncapsulation(node, xPathAttribute.NodeReturnType==ReturnType.InnerText ? ReturnType.InnerHtml: xPathAttribute.NodeReturnType)); 

I don't think this is a good behavior for a public API. Here's why:

If a user defines an annotation like [XPath("//whatever", NodeReturnType = ReturnType.InnerText)], then the behavior of GetEncapsulatedData should be as mandated by this attribute annotation. Which means, it should honor the explicitly specified ReturnType.InnerText. If the GetEncapsulatedData implementation cannot honor a a specified NodeReturnType value for technical reasons or the GetEncapsulatedData API is deemed to not support ReturnType.InnerText for XPathAttribute annotations of properties that are of a type that is also [HasXPath]-annotated, then it should produce some exception/error in that regard.

In my opinion, the GetEncapsulatedData implementation should not try to be smarter than the user and sneakily override a user-specified NodeReturnType, as this will make declaring a NodeReturnType value in the XPathAttribute annotation ambiguous, thus making it more difficult to read code with XPathAttribute annotations using NodeReturnType values correctly. Correct understanding of such user code would then essentially put a burden on the user (or reviewer) about having to understand the additional context or conditions regarding when which NodeReturnType values would mean exactly what in reality. Given the sparse documentation surrounding GetEncapsulatedData, i would be very hesitant to make its behavior and that of its related attributes "clever", fuzzy and less obvious.


Additionally, we should also listen to @JonathanMagnan's advice.

Oh, i fully agree with you. In the end it's zzzproject / Jonathan's decision how to proceed as they have to maintain the library and deal with user feedback etc.


(A pointless side note :-) Yes, HAP is a very old library, dating back to 2005 or so. But the GetEncapsulatedData API is a baby in comparison, added to HAP "only" in 2018. Still five years, though...)

from html-agility-pack.

rwecho avatar rwecho commented on May 27, 2024

@elgonzo As you say, the code should not guess the user purpose, I take your advice and create a PR #509 .
And more , I add a default constructor, to make it possible to get some html from some properties. After all, some front-end writing methods are too flexible these days.

public XPathAttribute(string xpathString, string attributeName, ReturnType nodeReturnType)

@elgonzo @JonathanMagnan We can assess whether this pull request is appropriate

from html-agility-pack.

rwecho avatar rwecho commented on May 27, 2024

@elgonzo

resultCastedToTargetPropertyType = Convert.ChangeType(result, propertyInfo.PropertyType);

In the code, it can handle the conversion of DateTime type, but it is unable to convert DateTime? correctly. I don't know how you handle this. In fact, this is just a feature and does not affect the functionality of the code.

Currently, my project has enabled Nullable globally, but it appears that HAP does not adapt well to this configuration.

from html-agility-pack.

elgonzo avatar elgonzo commented on May 27, 2024

resultCastedToTargetPropertyType = Convert.ChangeType(result, propertyInfo.PropertyType);

In the code, it can handle the conversion of DateTime type, but it is unable to convert DateTime? correctly. I don't know how you handle this. In fact, this is just a feature and does not affect the functionality of the code.

Currently, my project has enabled Nullable globally, but it appears that HAP does not adapt well to this configuration.

That's a valid and real issue, but unrelated to NodeReturnType. Best to move this into its own issue report (and which i just did: #510)

from html-agility-pack.

JonathanMagnan avatar JonathanMagnan commented on May 27, 2024

Hello @rwecho ,

I will close this issue has the v1.11.51 has been released with your latest fix.

Thank again for your contribution ;)

Best Regards,

Jon

from html-agility-pack.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.