Giter Site home page Giter Site logo

Comments (4)

elgonzo avatar elgonzo commented on May 26, 2024 1

It seems the web server you are trying to load from is taking issue with HtmlAgilityPack's default UserAgent string (or a combination of request header fields including the user-agent).

private string _userAgent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:x.x.x) Gecko/20041107 Firefox/x.x";

Set the UserAgent to null or an empty string, and (with respect to the URL you are using) Bob should be your uncle.

Uri LinkProdotto = new Uri("https://www.eaton.com/it/it-it/skuPage.177633.html");
HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb()
{
    UserAgent = null
};
...

(Side note: In case you are not already aware of it, keep in mind that HtmlAgilityPack is "just" a HTML parser and not a web browser engine and therefore does not execute Javascript, WASM or other dynamic content. It will only parse the source HTML page as provided by the web server. If the page contains Javascript, WASM or other dynamic content, the source HTML page parsed by HtmlAgilityPack can look quite different from the HTML document dynamically created from that source HTML page within a web browser.)

from html-agility-pack.

elgonzo avatar elgonzo commented on May 26, 2024 1

@basicn86 yeah, if you need to process dynamic web content, HAP in itself is not the right (or sole) tool to use. As i mentioned in the side note in my first comment, HAP is just a HTML parser, not a web browser and therefore doesn't have facilities such as a javascript engine or wasm host required for generating dynamic page content. In such cases, the right tool for the job is -- depending on the nature of the application -- either a (headless-capable) browser engine such as CEF (with the CefGlue or CefSharp wrapper with respect to .NET) as part of your application, or a browser automation tool like Selenium WebDriver plus whichever stand-alone web browser Selenium supports.

from html-agility-pack.

JonathanMagnan avatar JonathanMagnan commented on May 26, 2024 1

I will close this issue as it looks to be answered by @elgonzo

On my side, whenever I need to do something beyond the purpose, HAP can do, such as processing dynamic web content, I always such Selenium WebDriver, which allows me to do pretty much everything including automating some action such as login, submitting form, etc.

Best Regards,

Jon

from html-agility-pack.

basicn86 avatar basicn86 commented on May 26, 2024

I run into the same issue. I have no clue why it happens either. It appears as though that these websites have some type of protection against crawlers or scrapers. For example, I can visit https://microcenter.com/ just fine on Firefox, but on HTML Agility Pack, it returns some strange page asking me to enable JavaScript. It probably has to do something with cookies or JavaScript. Even when I used curl, I get the same issue. Setting the user agent doesn't change a thing. I think some websites are just impossible to crawl through.

from html-agility-pack.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.