Giter Site home page Giter Site logo

cssselect's People

Contributors

annbgn avatar arthurdarcet avatar dangra avatar elacuesta avatar gallaecio avatar graingert avatar hugovk avatar ianb avatar julius383 avatar kmike avatar kolanich avatar laerte avatar lopuhin avatar lrowe avatar nikolas avatar pcorpet avatar redapple avatar scoder avatar scop avatar simonsapin avatar sjp avatar sortafreel avatar varialus avatar whybin avatar wrar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cssselect's Issues

:nth-last-child() is wrong

:nth-last-child(1) selects the second-to-last child, but it should select the last. :nth-last-of-type() probably has the same bug too.

The test suite tests the incorrect behavior with the following comment:

# FIXME: I'm not 100% sure this is right:

It’s not.

Encoding Error using utf8

In code # coding: utf8 alias should be replaced with the actual encoding # -*- encoding: utf-8 -*-

Negation selector does not accept any selector as argument.

The following are valid CSS3 selectors which are rejected by cssselect:

:not(.foo, .bar)
:not(foo > bar)
:not(foo bar)
:not(:not(a))
:not(<any other selector>)

From the looks of it, disallowing nested selectors was an explicit choice, but the parser doesn't seem to like binary operators such as < and , in the negation either, and just raises a syntax error.

cssselect 0.7: Test failures

All tests were passing in cssselect 0.6.1.
Some tests fail in cssselect 0.7.

$ PYTHONPATH="." python2.7 cssselect/tests.py -v
test_parse_errors (__main__.TestCssselect) ... ok
test_parser (__main__.TestCssselect) ... ok
test_pseudo_elements (__main__.TestCssselect) ... FAIL
test_quoting (__main__.TestCssselect) ... ok
test_select (__main__.TestCssselect) ... ok
test_select_shakespeare (__main__.TestCssselect) ... ok
test_series (__main__.TestCssselect) ... ok
test_specificity (__main__.TestCssselect) ... ok
test_tokenizer (__main__.TestCssselect) ... FAIL
test_translation (__main__.TestCssselect) ... ERROR
test_unicode (__main__.TestCssselect) ... ok
test_unicode_escapes (__main__.TestCssselect) ... ok

======================================================================
ERROR: test_translation (__main__.TestCssselect)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "cssselect/tests.py", line 378, in test_translation
    assert xpath(r'di\a0 v') == (
  File "cssselect/tests.py", line 297, in xpath
    return str(GenericTranslator().css_to_xpath(css, prefix=''))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 14: ordinal not in range(128)

======================================================================
FAIL: test_pseudo_elements (__main__.TestCssselect)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "cssselect/tests.py", line 175, in test_pseudo_elements
    assert parse_one('::before') == ('Element[*]', 'before')
  File "cssselect/tests.py", line 161, in parse_one
    result = parse_pseudo(css)
  File "cssselect/tests.py", line 155, in parse_pseudo
    assert pseudo is None or type(pseudo) is _unicode
AssertionError

======================================================================
FAIL: test_tokenizer (__main__.TestCssselect)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "cssselect/tests.py", line 63, in test_tokenizer
    "<EOF at 42>",
AssertionError

----------------------------------------------------------------------
Ran 12 tests in 0.155s

FAILED (failures=2, errors=1)

Doesn’t work on python 3.4

Hi, I’ve installed cssselect with pip and pip3 (on ubuntu) and I can’t make it works with python 3.4

$ python 
Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from cssselect import GenericTranslator, SelectorError
>>> 
$ python3
Python 3.4.0 (default, Apr 11 2014, 13:05:11) 
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from cssselect import GenericTranslator, SelectorError
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'GenericTranslator'
>>> 

[feature-request] `:not()` to support generic selectors (not only "simple" ones)

The document (version 0.9.1) says:

:not() accepts a sequence of simple selectors, not just single simple selector. For example, :not(a.important[rel]) is allowed, even though the negation contains 3 simple selectors.

May I ask what is a simple selector? Can :not() support something like :not(a>b)?

>>> import cssselect
>>> cssselect.parse('a:not(p>a)')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.4/site-packages/cssselect/parser.py", line 355, in parse
    return list(parse_selector_group(stream))
  File "/usr/lib/python3.4/site-packages/cssselect/parser.py", line 370, in parse_selector_group
    yield Selector(*parse_selector(stream))
  File "/usr/lib/python3.4/site-packages/cssselect/parser.py", line 378, in parse_selector
    result, pseudo_element = parse_simple_selector(stream)
  File "/usr/lib/python3.4/site-packages/cssselect/parser.py", line 471, in parse_simple_selector
    raise SelectorSyntaxError("Expected ')', got %s" % (next,))
cssselect.parser.SelectorSyntaxError: Expected ')', got <DELIM '>' at 7>

cssselect can't work on firefox

Firefox is unlike other xpath implementations in that name() returns an upper-cased string. cssselect's translation of the nth-child selector (for example) uses "name() = 'foo'" which will never possibly match, due to the above oddity. One workaround is to set HTMLTranslator.lower_case_element_names to False, and write selectors like 'LI:nth-child(2)', which will result in a working xpath for firefox, but won't work on any other xpath implementation.

I see two possible solutions:

  1. Call lower-case() wherever name() is called.
  2. Factor out the use of name() entirely, replacing [name() = 'foo'] with [self::foo].

Demonstration of the problem and solution here: (the contrast between chrome and firefox is stark, ie is like chrome)
http://fiddle.jshell.net/J7VrG/10/show/light/

Tokenizer corner cases

Now that the descendant selector bug are fixed (unless I missed
something) the remaining issues that I see are:

  1. The current tokenizer for Symbol uses something like the '\w' regex,
    while a CSS IDENT token can contain any non-ASCII character (including
    U+00A0 no-break space, for example), can have backslash-escapes but can
    not start with a digit.
  2. Unicode white space (like U+00A0) counts as white space (either
    ignored or a descendant combinator) but should not (related to 1)
  3. 2n+1 or similar strings (arguments to :nth-child()) are tokenized as
    Symbol objects, and are then accepted by the parser as element types,
    class names, IDs, etc.

I think that any valid (for CSS) selector that only uses ASCII without
backslash-escapes should be fine now, so maybe this is not really a
problem ...

Non-ASCII pseudo-classes

Translating a selector with a non-ASCII pseudo-class causes UnicodeEncodeError on Python 2.x. This is because we are calling getattr() with a name based on the pseudo-class’. No such pseudo-class exists, but they should raise ExpressionError instead.

Drop Python 2.4 support

What do you think about dropping Python 2.4 support? It is true that Python 2.4 can still be used in some setups (like old Red Hat machines), but

  • Travis doesn't run Python 2.4 tests;
  • tox also can't run Python 2.4 tests

so there is no easy way to make sure cssselect works under Python 2.4, and this makes contributing to cssselect harder.

Exception on selectors without namespace

Problem

The CSS3 spec allows the namespace field to be left empty, which indicates
an element with no namespace attached. However, cssselect cannot handle
those selectors right now.

For example, suppose we have the following line:

GenericTranslator().css_to_xpath('|foo')

This causes the parser to raise an exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/cssselect/cssselect/xpath.py", line 192, in css_to_xpath
    for selector in parse(css))
  File "/cssselect/cssselect/parser.py", line 354, in parse
    return list(parse_selector_group(stream))
  File "/cssselect/cssselect/parser.py", line 367, in parse_selector_group
    yield Selector(*parse_selector(stream))
  File "/cssselect/cssselect/parser.py", line 375, in parse_selector
    result, pseudo_element = parse_simple_selector(stream)
  File "/cssselect/cssselect/parser.py", line 475, in parse_simple_selector
    "Expected selector, got %s" % (peek,))
cssselect.parser.SelectorSyntaxError: Expected selector, got <DELIM '|' at 0>

Expected behaviour

cssselect should be able to handle a selector like |foo.

Note

Here is the related part from Selector Level 3 $6.1.1:

ns|E
    elements with name E in namespace ns 
*|E
    elements with name E in any namespace, including those without a namespace 
|E
    elements with name E without a namespace 
E
    if no default namespace has been declared for selectors, this is equivalent to *|E.
    Otherwise it is equivalent to ns|E where ns is the default namespace. 

Release cssselect with pseudo-elements improvements

This is a remainder asking to release 0.9 (or whatever version you prefer) with recent pseudo-elements improvements.

I didn't wanted to bother you with this release until work on Scrapy CSS selectors are ready to merge, but the unavailability on pypi is inconvenient right now as it makes travis-ci fail for scrapy/scrapy#426 pull request.

thanks!

[bug] parse() fails if :scope present in second element of selector list

As of v1.1.0, cssselect.parse() seems to have problems parsing the ":scope" psuedo-class if the input string is a selector list, and ":scope" occurs in any clause besides the first one.

Ones that successfully parse as expected:

  • parse(":scope > th")
  • parse(":scope > th, td")
  • parse(":scope > th, table > td")

However, all of the following unexpectedly (at least to me) throw SelectorSyntaxError('Got immediate child pseudo-element ":scope" not at the start of a selector'):

  • parse("th, :scope > td")
  • parse("table > th, :scope > td")
  • parse(":scope > th, :scope > td")

(I'd submit a PR for this, but looking at the location of the error, I'm not familiar enough with the internals to suggest what the right thing to do is!).

element>element selector does not work relative to an element

in version 1.0.3 i get an exception when using cssselect on an element to select it's direct children
element > element (see https://www.w3schools.com/cssref/sel_element_gt.asp)

>>> from lxml import html
>>> html.fromstring('<html><body><div class="parent"><div class="child"><div class="child"></div></div></div></body></html>')
<Element html at 0x7feadf137d08>
>>> tree=html.fromstring('<html><body><div class="parent"><div class="child"><div class="child"></div></div></div></body></html>')
>>> tree.cssselect('div.parent')
[<Element div at 0x7feadf137e10>]
>>> tree.cssselect('div.parent')[0].cssselect('> .child')
*** SelectorSyntaxError: Expected selector, got <DELIM '>' at 0>

in version 0.9.1 the following worked w/o raising an exception, however it leads to an unexpected result since the second div.child is no direct child of div.parent

>>> tree=html.fromstring('<html><body><div class="parent"><div class="child"><div class="child"></div></div></div></body></html>')
# works but should return only one element
>>> tree.cssselect('div.parent')[0].cssselect('> .child')
[<Element div at 0x7fa6e973def0>, <Element div at 0x7fa6e973dfb0>]

> only works when parent selector is given in the selector

>>> tree.cssselect('div.parent > .child')
[<Element div at 0x7fa6e973de90>]

A) is it a regression, that element.cssselect('> .child') raises an exception on recent versions?

B) is there a way to select a direct child given the parent element?

Unable to parse selector with escaped characters.

To select an element with class "width-3:4" one must escape the ':' as per http://www.w3.org/International/questions/qa-escapes

However, this raises an error:

>>> GenericTranslator().css_to_xpath('.width-3\3a 4')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/data/devel/cssselect/cssselect/xpath.py", line 165, in css_to_xpath
    selectors = parse(css)
  File "/data/devel/cssselect/cssselect/parser.py", line 313, in parse
    return list(parse_selector_group(stream))
  File "/data/devel/cssselect/cssselect/parser.py", line 328, in parse_selector_group
    yield Selector(*parse_selector(stream))
  File "/data/devel/cssselect/cssselect/parser.py", line 336, in parse_selector
    result, pseudo_element = parse_simple_selector(stream)
  File "/data/devel/cssselect/cssselect/parser.py", line 446, in parse_simple_selector
    "Expected selector, got %s" % (peek,))
SelectorSyntaxError: Expected selector, got <DELIM '' at 8>

A nasty bug lies within the library

Have a crawler based on this library cssselect (1.0.0) my customer was nagging about not getting all the news he wanted for quite sometimes.
Had a closer look and found something peculiar.

If you want to reproduce the error simply go to this url :
http://www.bbc.com/news/uk-england-40465494
and try this selector :
.story-body__inner > p

the purpose of mentioned selector is to get news body and when it will get to following line :

Mr Paget-Brown resigned following sustained criticism of the council and an aborted meeting of its cabinet on Thursday, from which leaders had tried to ban members of the public and press.

it only get the

Mr Paget-Brown

You can take a look at this picture which I got from crawler
http://imgur.com/a/4UcMb

I have tested my selector using firepath in firefox

Whitespace in series

The grammar for series (like 2n+1, as accepted by :nth-child() and friends) is

nth
  : S* [ ['-'|'+']? INTEGER? {N} [ S* ['-'|'+'] S* INTEGER ]? |
         ['-'|'+']? INTEGER | {O}{D}{D} | {E}{V}{E}{N} ] S*
  ;

Currently, any whitespace will be rejected by the parser, as it expects a single token followed by )

CSS selector finds nothing with invalid HTML

Since this example has invalid HTML, feel free to ignore this issue.

Anyway, here it is (simplified from http://www.weheart.co.uk/2013/02/18/alley-oop-design-exhibition/):

import cssselect
import lxml.html

d = lxml.html.document_fromstring('''
<!DOCTYPE html>
<html/>
<body></body>
''')

t = cssselect.HTMLTranslator()

print d.xpath(t.css_to_xpath('body'))
print d.xpath(t.css_to_xpath('body', prefix = '//'))

Just a bit unexpected that the first XPath query doesn't find anything.

Move docs to readthedocs

Procedure to build and upload docs to https://pythonhosted.org/cssselect/ used to be:

(pip install sphinx)
python setup.py build_sphinx
python setup.py upload_sphinx

But now you get:

$ python setup.py upload_sphinx
running upload_sphinx
Submitting documentation to https://upload.pypi.org/legacy/
Upload failed (410): Uploading documentation is no longer supported, we recommend using https://readthedocs.org/.

For 1.0.0, I had to manually create the zip file from the docs _build/html folder and upload it with PyPI's web interface.

Drop Python 3.1 support

What do you think about dropping Python 3.1 support?

  • I doubt anybody uses Python 3.1 in practice;
  • Travis can't run Python 3.1 tests;
  • tox also doesn't support Python 3.1 and can't run cssselect tests under Python 3.1.

Web Scraping Youtube Playlist Information

Im trying to extract information from Youtube but when I try to parse an element:

URL = "https://www.youtube.com/user/Urbanroosters/playlists"
with HTMLSession() as session:
request = session.get(URL)

body = request.html.find('div id="items" class="style-scope ytd-grid-renderer"><ytd-grid-playlist-renderer class="style-scope ytd-grid-renderer" lockup=""')

the following error is displayed:

Traceback (most recent call last):

File "/Users/JGTB/opt/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3319, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)

File "", line 7, in
body = request.html.find('div id="items" class="style-scope ytd-grid-renderer"><ytd-grid-playlist-renderer class="style-scope ytd-grid-renderer" lockup=""')

File "/Users/JT/opt/anaconda3/lib/python3.7/site-packages/requests_html.py", line 212, in find
for found in self.pq(selector)

File "/Users/JGTB/opt/anaconda3/lib/python3.7/site-packages/pyquery/pyquery.py", line 300, in call
result = self._copy(*args, parent=self, **kwargs)

File "/Users/JGTB/opt/anaconda3/lib/python3.7/site-packages/pyquery/pyquery.py", line 286, in _copy
return self.class(*args, **kwargs)

File "/Users/JGTB/opt/anaconda3/lib/python3.7/site-packages/pyquery/pyquery.py", line 271, in init
xpath = self._css_to_xpath(selector)

File "/Users/JGBT/opt/anaconda3/lib/python3.7/site-packages/pyquery/pyquery.py", line 282, in _css_to_xpath
return self._translator.css_to_xpath(selector, prefix)

File "/Users/JGTB/opt/anaconda3/lib/python3.7/site-packages/cssselect/xpath.py", line 192, in css_to_xpath
for selector in parse(css))

File "/Users/JGTB/opt/anaconda3/lib/python3.7/site-packages/cssselect/parser.py", line 415, in parse
return list(parse_selector_group(stream))

File "/Users/JGBT/opt/anaconda3/lib/python3.7/site-packages/cssselect/parser.py", line 428, in parse_selector_group
yield Selector(*parse_selector(stream))

File "/Users/JGTB/opt/anaconda3/lib/python3.7/site-packages/cssselect/parser.py", line 454, in parse_selector
next_selector, pseudo_element = parse_simple_selector(stream)

File "/Users/JGTB/opt/anaconda3/lib/python3.7/site-packages/cssselect/parser.py", line 545, in parse_simple_selector
"Expected selector, got %s" % (peek,))

File "", line unknown
SelectorSyntaxError: Expected selector, got <DELIM '=' at 6>

HTML :enable and :disabled are not quite conformant

These should match :enabled, but currently do not:

li elements that are children of menu elements, and that have a child element that defines a command, if the first such element's Disabled State facet is false (not disabled)

(Similarly for :disabled with Disabled State facet is true (disabled))

Form elements should be considered disabled

... if its disabled attribute is set, or if it is a descendant of a fieldset element whose disabled attribute is set and is not a descendant of that fieldset element's first legend element child, if any.

The last part was skipped, so the current implementation is:

... if its disabled attribute is set, or if it is a descendant of a fieldset element whose disabled attribute is set

Incorrect use of XPath name() function

The use of the name() function for matching tags breaks with documents that have a default namespace or multiple namespace prefixes mapping to the same namespace.

For example,

The CSS selector

h|p + h|p

becomes

descendant-or-self::h:p/following-sibling::*[name() = 'h:p' and (position() = 1)]

When this query is run on a XHTML document it will produce no matches, because the name() function returns "p". Similarly if it is run on a document that defines the XHTML namespace with a prefix other than h it will fail.

A possible solution is to have the css_to_xpath function take a namespaces argument that contains a mapping of prefixes to URIs and then use local-name() and namespace-uri() instead of name(). The argument can default to None, in which case it can use the present behavior, for backward compatibility.

See http://lenzconsulting.com/namespaces-in-xslt/#perils_of_the_name_function for more details on the problems caused by using the name() function.

Support for relational pseudo-class :has()

CSS Selectors Level 4 (still in draft) introduce the :has() pseudo-class:

The relational pseudo-class, :has(), is a functional pseudo-class taking a relative selector list as an argument. It represents an element if any of the relative selectors, when absolutized and evaluated with the element as the :scope elements, would match at least one element.

For example, the following selector matches only <a> elements that contain an <img> child:
a:has(> img)
The following selector matches a <dt> element immediately followed by another <dt> element:
dt:has(+ dt)

Although no browser seems to be supporting this yet, it looks here to stay (I may be wrong).

It would be interesting to support this to get a bit more flexibility on predicates (e.g. testing children elements).

Project maintenance

I’m not really interested in cssselect anymore. I think the approach of "translating" selectors to XPath is fundamentally flawed (see #12 for example). I’ve started cssselect2 which implements Selectors "for real", but it’s blocked on a deciding what kind of tree it works on.

I’ve also kind of moved on from Python; I mostly work with Rust nowadays.

Still, some people seem to be interested in cssselect. @redapple, @Dobz, @kmike, @bukzor, @sjp, @kovidgoyal, or anyone, would you be interested in maintaining it? I can give push access to this repository and to PyPI.

Some files aren't recorded into plist with --record option

The FreeBSD port is failing:

===> Checking for items in STAGEDIR missing from pkg-plist
Error: Orphaned: %%PYTHON_SITELIBDIR%%/cssselect/__init__.pyc
Error: Orphaned: %%PYTHON_SITELIBDIR%%/cssselect/parser.pyc
Error: Orphaned: %%PYTHON_SITELIBDIR%%/cssselect/xpath.pyc
===> Checking for items in pkg-plist which are not in STAGEDIR

These files aren't recorded by --record.

Version 0.9.1

[attr~='']

http://www.w3.org/TR/selectors/#attribute-selectors

[att~=val]
Represents an element with the att attribute whose value is a whitespace-separated list of words, one of which is exactly "val". If "val" contains whitespace, it will never represent anything (since the words are separated by spaces). Also if "val" is the empty string, it will never represent anything.

The empty-string or whitespace-only cases are not implemented. Similar issues for other attribute operators.

Support :nth-child(An+B of S)

The current CCS 4 draft has :nth-child(An+B [of S]? ), extending :nth-child(An+B)

The :nth-child(An+B [of S]? ) pseudo-class notation represents the An+Bth element that matches the selector list S among its inclusive siblings.
The CSS Syntax Module [CSS3SYN] defines the An+B notation. If S is omitted, it defaults to *.

By passing a selector argument, we can select the Nth element that matches that selector. For example, the following selector matches the first three “important” list items, denoted by the .important class:
:nth-child(-n+3 of li.important)

Example in docs wrong

Was just fiddling with the cssselect 0.2 package from PyPI. Noticed that the docs
claim a result that doesn't appear to be current/correct. In docs:

>>> from cssselect import css_to_xpath
>>> exrpession = css_to_xpath('div.content')
>>> exrpession
"descendant-or-self::div[contains(concat(' ', normalize-space(@class), ' '), ' content ')]"

What I got:

u"descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' content ')]"

(Differences: The u for unicode string and @class and expression.)

ID selector syntax

The spec’ed syntax for ID selectors is # followed by an identifier, not any hash token. See the "type" flag on hash tokens in css-syntax.

:nth-child incorrect with + or ~

As noted in #4, the current implementation of :nth-child and related selectors is incorrect when used after a + or ~ combinator: the selector e ~ f:nth-child(3) is translated to XPath e/following-sibling::*[name() = 'f' and (position() = 3)] which is wrong: it finds the 3rd element after e, not the third child of its parent.

Test case:

diff --git a/cssselect/tests.py b/cssselect/tests.py
index 796537b..d1dc9fa 100755
--- a/cssselect/tests.py
+++ b/cssselect/tests.py
@@ -516,7 +516,8 @@ class TestCssselect(unittest.TestCase):
         assert pcss(':lang("EN")', '*:lang(en-US)', html_only=True) == [
             'second-li', 'li-div']
         assert pcss(':lang("e")', html_only=True) == []
-        assert pcss('li:nth-child(3)') == ['third-li']
+        assert pcss('li:nth-child(3)',
+                    '#first-li ~ :nth-child(3)') == ['third-li']
         assert pcss('li:nth-child(10)') == []
         assert pcss('li:nth-child(2n)', 'li:nth-child(even)',
                     'li:nth-child(2n+0)') == [

:nth-last-child selector incorrectly starts at 0 instead of 1

Using cssselect 0.9.1, the :nth-last-child selector starts from 0 instead of 1, that is, :nth-last-child(0) selects the last element when it should instead select nothing.

Here is an example using lxml:

import lxml.html

html_fragment = lxml.html.fromstring("""
<div>
    <p>First</p>
    <p>Second</p>
    <p>Second Last</p>
    <p>Last</p>
</div>
""")

for element in html_fragment.cssselect("div > p:nth-last-child(1)"):
    print(element.text_content())

print()

for element in html_fragment.cssselect("div > p:nth-last-child(2n)"):
    print(element.text_content())

Output:

Second Last

Second
Last

Expected output:

Last

First
Second Last

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.