Comments (4)
It seems the web server you are trying to load from is taking issue with HtmlAgilityPack's default UserAgent string (or a combination of request header fields including the user-agent).
Set the UserAgent to null or an empty string, and (with respect to the URL you are using) Bob should be your uncle.
Uri LinkProdotto = new Uri("https://www.eaton.com/it/it-it/skuPage.177633.html");
HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb()
{
UserAgent = null
};
...
(Side note: In case you are not already aware of it, keep in mind that HtmlAgilityPack is "just" a HTML parser and not a web browser engine and therefore does not execute Javascript, WASM or other dynamic content. It will only parse the source HTML page as provided by the web server. If the page contains Javascript, WASM or other dynamic content, the source HTML page parsed by HtmlAgilityPack can look quite different from the HTML document dynamically created from that source HTML page within a web browser.)
from html-agility-pack.
@basicn86 yeah, if you need to process dynamic web content, HAP in itself is not the right (or sole) tool to use. As i mentioned in the side note in my first comment, HAP is just a HTML parser, not a web browser and therefore doesn't have facilities such as a javascript engine or wasm host required for generating dynamic page content. In such cases, the right tool for the job is -- depending on the nature of the application -- either a (headless-capable) browser engine such as CEF (with the CefGlue or CefSharp wrapper with respect to .NET) as part of your application, or a browser automation tool like Selenium WebDriver plus whichever stand-alone web browser Selenium supports.
from html-agility-pack.
I will close this issue as it looks to be answered by @elgonzo
On my side, whenever I need to do something beyond the purpose, HAP can do, such as processing dynamic web content, I always such Selenium WebDriver, which allows me to do pretty much everything including automating some action such as login, submitting form, etc.
Best Regards,
Jon
from html-agility-pack.
I run into the same issue. I have no clue why it happens either. It appears as though that these websites have some type of protection against crawlers or scrapers. For example, I can visit https://microcenter.com/
just fine on Firefox, but on HTML Agility Pack, it returns some strange page asking me to enable JavaScript. It probably has to do something with cookies or JavaScript. Even when I used curl
, I get the same issue. Setting the user agent doesn't change a thing. I think some websites are just impossible to crawl through.
from html-agility-pack.
Related Issues (20)
- [Feature Request] Get specific property value of 'style' attribute HOT 3
- How to bind a nested class property to the attribute of html element with GetEncapsulatedData Method? HOT 4
- The default value of NodeReturnType in XPathAttribute is InnerText HOT 8
- InnerStartIndex value is wrong in nested elements when sequence is escaped. HOT 1
- HtmlNode.GetEncapsulatedData fails when target property is a nullable value type HOT 3
- Setting InnerHtml does not assign OwnerDocument to new nodes correctly HOT 2
- Stackoverflow when using ReplaceChild HOT 2
- Trailing slash on void element becomes part of the attribute name HOT 3
- RobiniaDocs API Explorer
- Unable to set "standalone" boolean attributes HOT 6
- Method not found: '!!0 HtmlAgilityPack.HtmlNode.GetEncapsulatedData()'. in UWP HOT 4
- SelectNodes not matching xpath where attribute name starts-with HOT 5
- LoadHTML leads to wrong result when node begins with underscore HOT 2
- LoadHTML leads to wrong result when node is named "base" HOT 3
- Closing tag in script is recognized as HTML HOT 2
- Bad ZZZ Code.AI suggestion HOT 7
- PreviousSibling not retrieving the entire html content HOT 2
- HtmlDocument shows `<link>foo</link>` tag as just `<link>foo` HOT 4
- Inconsistent comment rendering HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from html-agility-pack.