Giter Site home page Giter Site logo

Comments (15)

dgtlmoon avatar dgtlmoon commented on June 3, 2024 1

You need to compare the HTML then both in the chrome JS rendered version and using curl

from changedetection.io.

Constantin1489 avatar Constantin1489 commented on June 3, 2024 1

@amirt01 Thank you! The case you reported will be fixed with the #2351

Screenshot 2024-05-13 at 15 35 59 Screenshot 2024-05-13 at 15 35 53 Screenshot 2024-05-13 at 15 35 36 image

from changedetection.io.

dgtlmoon avatar dgtlmoon commented on June 3, 2024

tried latest elementpath 4.4.0 same result

from changedetection.io.

Constantin1489 avatar Constantin1489 commented on June 3, 2024

But we pinned elementpath==4.1.5

elementpath==4.1.5

from changedetection.io.

dgtlmoon avatar dgtlmoon commented on June 3, 2024

the error comes from elementpath.. tried different versions, same outcome...

from changedetection.io.

Constantin1489 avatar Constantin1489 commented on June 3, 2024

this is my custom 45.13 container's pip package version.

aniso8601             9.0.1
apprise               1.7.2
arrow                 1.3.0
attrs                 23.2.0
Babel                 2.14.0
beautifulsoup4        4.12.3
blinker               1.7.0
Brotli                1.1.0
certifi               2024.2.2
cffi                  1.16.0
chardet               5.2.0
charset-normalizer    3.3.2
click                 8.1.7
cryptography          3.4.8
decorator             5.1.1
dnspython             2.5.0
elementpath           4.2.1
et-xmlfile            1.1.0
feedgen               0.9.0
Flask                 2.3.3
flask-babel           4.0.0
Flask-Compress        1.14
flask-expects-json    1.7.0
Flask-Login           0.6.3
flask-paginate        2023.10.24
Flask-RESTful         0.3.10
Flask-WTF             1.2.1
gevent                23.9.1
greenlet              3.0.3
h11                   0.14.0
idna                  3.6
iniconfig             2.0.0
inscriptis            2.4.0.1
itsdangerous          2.1.2
Jinja2                3.1.3
jinja2-time           0.2.0
jq                    1.6.0
jsonpath-ng           1.5.3
jsonschema            4.17.3
loguru                0.7.2
lxml                  5.1.0
Markdown              3.5.2
MarkupSafe            2.1.5
memory-profiler       0.61.0
oauthlib              3.2.2
openpyxl              3.1.2
outcome               1.3.0.post0
packaging             23.2
paho-mqtt             2.0.0
pillow                10.2.0
pip                   23.2.1
playwright            1.41.2
pluggy                1.4.0
ply                   3.11
psutil                5.9.8
pycparser             2.21
pyee                  11.0.1
pyrsistent            0.20.0
PySocks               1.7.1
pytest                7.4.4
pytest-flask          1.3.0
python-dateutil       2.8.2
pytz                  2024.1
PyYAML                6.0.1
requests              2.31.0
requests-oauthlib     1.3.1
selenium              4.14.0
setuptools            69.0.3
six                   1.16.0
sniffio               1.3.0
sortedcontainers      2.4.0
soupsieve             2.5
timeago               1.0.16
trio                  0.24.0
trio-websocket        0.11.1
types-python-dateutil 2.8.19.20240106
typing_extensions     4.9.0
urllib3               2.2.0
validators            0.22.0
Werkzeug              3.0.1
wheel                 0.41.2
wsproto               1.2.0
WTForms               3.1.2
zope.event            5.0
zope.interface        6.1

from changedetection.io.

dgtlmoon avatar dgtlmoon commented on June 3, 2024

this is my custom 45.13 container's pip package version.

are you saying you cant reproduce the issue?

from changedetection.io.

Constantin1489 avatar Constantin1489 commented on June 3, 2024

I can reproduce the problem. But it is quite weird.
With "Playwright Chromium/Javascript via 'ws://127.0.0.1:3000/?stealth=1&--disable-web-security=true'", elementpath works
With "Basic fast Plaintext/HTTP Client", 'str' object has no attribute '__name__'

image

?????

from changedetection.io.

Constantin1489 avatar Constantin1489 commented on June 3, 2024

Hi,
I believe the bug is originated from libxml2. See also, https://gitlab.gnome.org/GNOME/libxml2/-/issues/716

from changedetection.io.

Constantin1489 avatar Constantin1489 commented on June 3, 2024

I found the solution but I need time to ensure.

from changedetection.io.

ezalenski avatar ezalenski commented on June 3, 2024

I took a look at this just to try and brush up on my pdb skills.

The issue here is that lxml believes the html from that site is invalid. There's an issue with elementpath.select() assuming it's on a non-empty tree and not handling that correctly (this is where the exception is coming from). I think an improvement changedetection.io can do here is to check the parser.error_log for errors, maybe only with empty trees as I'm not sure how noisy that error_log is and how often it's non-empty.

image

Here's where I attached the pdb:
image

from changedetection.io.

Constantin1489 avatar Constantin1489 commented on June 3, 2024

@ezalenski try with python -m pdb -c 'b elementpath/tree_builders.py:229' and p [ e for e in elem.itersiblings()] in pdb. That is the problem. and see also https://gitlab.gnome.org/GNOME/libxml2/-/issues/716

Also, please take a look at my test in the PR.

from changedetection.io.

amirt01 avatar amirt01 commented on June 3, 2024

I encountered the same issue. I'm solving it temporarily using XPath1.0 by prepending xpath1: to the XPath rule.

from changedetection.io.

Constantin1489 avatar Constantin1489 commented on June 3, 2024

Hi @amirt01 If you provide the example URL, I would be thankful!

from changedetection.io.

amirt01 avatar amirt01 commented on June 3, 2024

Certainly @Constantin1489! I use changedetection.io to monitor company job sites like those hosted on Lever. I ran into this issue when filtering for the posting names: //*[contains(@data-qa, 'posting-name')]. I was able to remedy this by changing this filter to: xpath1://*[contains(@data-qa, 'posting-name')].

Here is an arbitrary example using Kinsta:
Here is a link to the broken watch config.
Here is a link to the fixed* watch config.

image

from changedetection.io.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.