Giter Site home page Giter Site logo

fb55 / css-select Goto Github PK

View Code? Open in Web Editor NEW
547.0 7.0 69.0 3.84 MB

a CSS selector compiler & engine

Home Page: http://feedic.com/css-select/

License: BSD 2-Clause "Simplified" License

TypeScript 100.00%
css-selector cssselect javascript dom-structure html dom htmlparser2

css-select's People

Contributors

andolf avatar bitdeli-chef avatar chriseppstein avatar dbuezas avatar delgan avatar dependabot-preview[bot] avatar dependabot[bot] avatar eford36 avatar eightarmcode avatar eps1lon avatar fb55 avatar ganeshv avatar greenkeeper[bot] avatar greenkeeperio-bot avatar jannispl avatar jaspreet57 avatar jugglinmike avatar kevva avatar kirbysayshi avatar lrosemberg avatar nrkn avatar phated avatar sqs avatar webreflection avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

css-select's Issues

Issue in the version 0.3.5

<html>
<div id="first">
<div class="second main">
<h1 class="title">Title</h1>
<p class="para">Para 1</p>
</div>
</div>
<p class="para">foo</p>
</html>
var pNode = $('#first .para');

The length of pNode is 0. This was working in the previous version (i.e., length of pNode is 1) of CSSselect(0.3.1) .The only modification I can see is the CSSSelect has been updated from 0.3.1 to 0.3.5

I am using Cheerio Version 0.10.4
The dependency of cheerio-select has been updated to CSSselect 0.3.5

Can’t use :contains with unbalanced parentheses in text

(I figure this is a css-select issue rather than a css-what issue, since :contains isn’t a real CSS selector, but if not, I can re-file it there.)

Trying to match an element containing text which includes unbalanced parentheses causes a syntax error. Here’s an example:

const cssSelect = require("css-select");

const input = [
  ":contains(hello)",
  ":contains(hello ())",
  ":contains('hello ()')",
  ":contains(\"hello ()\")",
  ":contains('hello (')",
  ":contains(\"hello (\")",
  ":contains(hello \\()",
];

for (const s of input) {
  try {
    console.log(s, typeof cssSelect.compile(s));
  } catch (e) {
    console.error(s, e.message);
  }
}

Output:

:contains(hello) function
:contains(hello ()) function
:contains('hello ()') function
:contains("hello ()") function
:contains('hello (') parenthesis not matched
:contains("hello (") parenthesis not matched
:contains(hello \() parenthesis not matched

Here’s one of the errors in full:

:contains('hello (') SyntaxError: parenthesis not matched
    at parseSelector (/firharvesting/puppeteer/node_modules/css-what/index.js:219:14)
    at parse (/firharvesting/puppeteer/node_modules/css-what/index.js:82:13)
    at compileUnsafe (/firharvesting/puppeteer/node_modules/css-select/lib/compile.js:31:14)
    at Function.compile (/firharvesting/puppeteer/node_modules/css-select/lib/compile.js:20:13)
    at Object.<anonymous> (/firharvesting/puppeteer/test.js:12:30)
    at Module._compile (internal/modules/cjs/loader.js:654:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:665:10)
    at Module.load (internal/modules/cjs/loader.js:566:32)
    at tryModuleLoad (internal/modules/cjs/loader.js:506:12)
    at Function.Module._load (internal/modules/cjs/loader.js:498:3)

Adapter API: Return for not found/empty

It would be helpful if it is clarified, what an adapter API should return
when no element was found or as an empty result.

For arrays this is quite probably just an empty array ([]).
But when a single node/element is expected to be returned, this is less clear,
like for getParent where this will usually happen for the root node (which got no parent) or
for getText which could return just an empty string or something non-string like (like null)
to indicate that there is no text at all for this node.

Adapter methods where this should be clarified:
getAttributeValue; getName; getParent; getText; findOne
What should these return if nothing (no element or attribute(value)) was found? -
false, null, undefined, an empty string or generally something 'false-y'?

Add element state support (e.g. :hover, :active)

It would be helpful if :hover and :active are added as supported pseudo classes, even when just skipping (falseFunc).
It would be even better if the adapter API would be extended to handle state (e.g. getState(...)),
so states like link states can be also properly matched for (hovered link, visited link, ...).

How did you publish module with Uppercase in npm?

Hi @fb55,

few days ago I tried to publish my module in npm and it says something like "package name must be lowercase" and it refuses my module. Now I type npm install CSSselect and it works ... How did you manage to publish you package without uglifying its name?

Regads.

custom HTML gets interpreted as pseudo selector

Hello! I'm trying to parse HTML provided through Atlassian Confluence. Which uses custom HTML elements like <ac:structured-macro ac:name="details">

When selecting this element with a selector like ac:structured-macro I get the following error:

SyntaxError: unmatched pseudo-class :structured-macro
at module.exports.compile (as pseudo) in css-select/lib/pseudos.js — line 388

Is there an option where I can allow this kind of HTML and/or escape the colon?

Thank you!

Support overloaded pseudo-classes e.g. input:text and div:text(...)

TL;DR: I've implemented a :text() filter which matches an element's text content either by string equality:

span:text("IMDb")
span:text('IMDb')

or pattern matching (with optional flags):

span:text(/^IMDb$/i)

I'd like to use it with Cheerio, but css-select won't let me do it without monkey-patching.


I'm trying to move away from XPath to CSS3/Cheerio/jQuery as much as possible, and the main thing holding me back is text selection e.g. in XPath I can easily select an element by its text content:

//span[text()="IMDb"]

But it's a pain to do in Cheerio/jQuery &c:

$('span').filter(function () { return $(this).text() === 'aaaagh!' })

There's also :contains(...) and :icontains(...), but neither of them work very well on unstructured markup. Identifying nodes by their text content is usually a last resort when there are few (or no) classes or IDs to rely on, so, IMO, we need all the precision we can get.

css-select already provides a jQuery-style :text selector for textboxes, but for various reasons I don't want to change the name of my selector:

  • it matches jQuery's $.text() method
  • it matches XPath's [text()] predicate
  • it matches text!

On my current project, I can happily override the built-in :text pseudo since I don't use it, but I'd like the filter to be available on other projects without hackery and without breaking input:text. Currently, adding parens to the input:text pseudo blows up because it's meant to be parameterless, so supporting both wouldn't break any existing code.

By exposing the filters and pseudos, css-select almost allows me to use both (it happily parses anything I throw at it between those parentheses 👍), but the strict argument verification (verifyArgs) gets in the way i.e. what I want to do is install a callback in filters['text'], and inside the callback either return my function if there's a parameter, or the original function if there isn't:

filters.text = function (next, param) {
    if (param == null) { // input:text
        let fn = pseudos.text

        if (next === boolbase.trueFunc) {
            return fn
        }

        return function text (elem) {
            return fn(elem, param) && next(elem)
        }
    } else { // span:text(...)
        let parsed = parseParam(param)

        if ({}.toString.call(parsed) === '[object RegExp]') {
            return function text (elem) {
                return next(elem) && parsed.test(getText(elem))
            }
        } else {
            return function text (elem) {
                return next(elem) && parsed === getText(elem)
            }
        }
    }
}

This works fine for the filter, but blows up when I try to use the pseudo:

SyntaxError: pseudo-selector :text requires an argument

There are a couple of ways round this:

  1. add the filter to css-select as a new built-in and special case it inside verifyArgs as is currently done for scope
  2. allow functions to signal to compile/verifyArgs that their args shouldn't be verified
  3. add a new export type alongside filters and pseudos, e.g. plugins, which is checked before filters and isn't subject to arg verification.

2) could be something like:

function verifyArgs (func, name, subselect) {
    if (func.noVerifyArgs) {
        return;
    }
}

I'm happy to provide a PR for 1), but I think 2) and 3) provide more flexibility, and 2) is the smallest change and the quickest/easiest to implement.

depth option

Hey man,

It'd be great to have an option to specify how deep the selection engine will crawl through the children. I think this would be a nice option at the selection level because it could improve performance for certain queries.

For my use case, I'm implementing $('#fruit').children('li'), which will only want to search one level deep.

"use strict" doesn't seem supported in node 0.4.x

Tested on node version 0.4.11, and 0.4.7

/Users/MattMueller/Node Projects/cheerio-select/node_modules/CSSselect/index.js:485
    class: function(next, data){
 ^^^^^

node.js:134
    throw e; // process.nextTick error, or 'error' event on first tick
    ^
SyntaxError: Unexpected strict mode reserved word

"* html" incorrectly matches elements

(tested using Cheerio & htmlparser2)

The selector * html will incorrectly match the <html> element, because it picks up the virtual type: 'root' element as a parent.
* * html will correctly not match anything.

Adapter API: findOne also required?

Before the README addition, I ran css-select against a custom adapter implementation and css-select expected also a findOne: ( test:Predicate, nodes:[Node] ) => elem:ElementNode function (see this line).
Is this function mandatory or would css-select cwitch to an alternative method when it is not present?

Release?

Can you release the latest changes? I have a package I need to release that depends on it.

Publish 1.1.0

Hey Felix--I'm having some problems downstream, and I'm wondering if it was a bug fixed in 1.1.0. The latest version available on npm is 0.7.0. Could you publish the latest release?

How to select tags that have dots in their names

Sorry for the noob question...

I'm parsing xml and my tags have dots in them:

<android.widget.TextView index="1" text="Free for all join (good vibes no bs)" class="android.widget.TextView" content-desc="" checkable="false" checked="false" clickable="true" enabled="true" focusable="true" focused="false" scrollable="false" long-clickable="false" password="false" selected="false" bounds="[190,63][1013,210]" instance="0"/>

How would I go about selecting those elements?

I've tried '<android.widget.TextView>' but it doesn't seem to work.

Allow swapping of domutils

To be able to use css-select against an AST that is different than the one from htmlparser2,
the internally used domutils should be swappable for a different module (but with same API).

Currently I have to use mocking for replacing it:

[...]
var mock        = require('mock-require'),
    domutilsAlt = require('../domutils-alt');
mock('domutils', domutilsAlt);
var cssSelect   = require('css-select');
[...]

Case insensitive :contains?

Is there a way to do a case insensitive :contains? If not, would :icontains be an acceptable enhancement? I'm willing to do the coding.

please add tag for 1.3.0-rc0 release

tag corresponding to latest npmjs.com release is missing in this repo. Please add it. We prefer to take github tarballs for creating debian packages.

(Clarification) Adapter getSiblings(...)

From adapter section of updated README:

get the siblings of the node. Note that unlike jQuery's siblings method,
this is expected to include the current node as well

Does this mean that the current node itself (the one passed to getSiblings in order to get its siblings)
should be added to start / end (or inbetween? Does its position matter?) of the returned list of siblings for that node?

// to start of list?
getSiblings(passedNode)[...]
  return [ passedNode, sibling1, sibling2, sibling3, ... ]

// to end of list?
getSiblings(passedNode)[...]
  return [ sibling1, sibling2, sibling3, ..., passedNode ]

// its position doesn't matter?
getSiblings(passedNode)[...]
  return [ sibling1, sibling2, passedNode, sibling3, ..., ]

More consistent signatures for filters and pseudos?

There's a difference between the way the adapter is passed to pseudos:

pseudo(elem, adapter, subselect)

and the way it's passed to filters:

filter(next, subselect, { adapter, ... }, context)

This is both confusing (I can't think of any other APIs which present the same parameter in two different ways like this), and limiting, as it means compile options can't be passed down to pseudos.

I realize it's too late for css-select v2.0, but, given that it's not too late for Cheerio v1.0 (yet), how about making the API more consistent and flexible across these related function types?

At least by 1) using the options object for both:

filter(next, subselect, options, context)
pseudo(elem, options, subselect)

And possibly by 2) making the parameter ordering more consistent as well e.g.:

filter(next, context, subselect, options)
pseudo(elem, subselect, options)

or:

filter(subselect, options, next, context)
pseudo(subselect, options, elem)

Typically, this kind of shared parameter would be defined on this rather than passed around everywhere. This could be done by 3) making CSSselect a constructor and moving the function it currently exports to e.g. a select method:

const cssSelect = new CSSselect({ adapter })
cssSelect.select(...)

But even without going down that route (which would presumably entail a major rewrite), the adapter could still be 4) accessed via this.adapter e.g.:

filter.call(options, subselect, next, context)
pseudo.call(options, subselect, elem)

License?

Are you letting people use this code that you published here?

If so, please put an Open Source license on the code -- e.g. the MIT license is very common for this kind of software. Here's the text you can use. Just add a file called LICENSE, or add this text to your README, and we'll know that you are giving people permission to use your code. If this is your code, put in your name for the copyright. If you work for a company and they own your IP, then use your company's name (with permission of course).

Thanks!

---------------->

MIT License
Copyright (c) 2012 [YOUR NAME or YOUR COMPANY'S NAME]. All Rights Reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

:has psuedo-selector exceeds call stack

Getting this error

18) select.psuedo-misc Has Children - :has():
    Maximum call stack size exceeded 

When I select on p:has(a) from sizzle tests. Tested on node 0.6.6 and 0.7.7.

Describe how to use css-select with Cheerio?

I grepped the documentation for Cheerio and didn't find anything. In Node.js environment, css-select would be predominantly useful in combination with Cheerio. As such, it would be useful to have general guidance about how the two projects interoperate.

Replace String.prototype.trimLeft usage

This is a slightly silly request, but you might want to remove your use of String.prototype.trimLeft().

It was the only issue I had getting CSSselect to work in IE (I'm doing in-browser screen-scraping using cornet), as IE doesn't implement the non-standard trimLeft method. It was obviously easy to work around by adding a simple polyfill, so I'm content if you just close this without fixing. Your call! :)

:has and "immediately after" selector not working

I've tried something like

h3:has( + div)

trying to select all header bit it didn't work. I suppose the + (immediately after) selector doesn't work like in jquery?

Test html:

<html>
    <body>
    <h3>Test1</h3>
    <div id="foo">Test1</div>
    <h3>Test2</h3>
    <div id="bar">Test2</div>
    <h3>Test3</h3>
    <div id="baz">baz
      <ul> <li nn="1">item1</li> <li nn="2">item2</li> </ul>
    </div>
    </body>
</html>

Some functions of adapter don't differ between implementations

Therefore you could require a smaller adapter surface and fill in the general functions if the adapter doesn't implement them already

I wasn't sure if css-select is the best place to do this, or if it should be standalone, so I created this:

https://github.com/nrkn/css-select-base-adapter

@fb55 if you think this would be good in css-select I will add it, update the readme and submit a PR

Otherwise, maybe just add a link to the base adapter repo in the readme under adapters?

[fix proposal] Super slow with multiple descendant tokens

for reference: cheeriojs/cheerio#1118

Since the descendant token has the option to "consume" an element or to skip to an ancestor, the time complexity explodes with selectors like .row div div div div. In these cases, for all elements that are not descendants of .row the SAME descendant function will be called many times on the same ancestors, resulting in exponential worst case time complexity (relative to number of descendant tokens in the selector).

I have two solutions that cache previous results and they solve the issue. The fastest one is invasive (adds an ID to the elements) the other solution uses a local Set to cache results so it leaves the element object untouched, but is twice as slow in node v9.2.0 (but not in v6.10.0)

file: lib/general.js

Slow solution:
This one does not mutate the element but requires node > 0.12.8 (or a polyfill for Set)

descendant: function(next){
	var isFalseCache = new Set();

	return function descendant(elem){
		var found = false;

		while(!found && (elem = adapter.getParent(elem))){
			if(!isFalseCache.has(elem)){
				found = next(elem);
				if(!found){
					isFalseCache.add(elem);
				}
			}
		}

		return found;
	};
},

Fast (but mutates element by adding an Id to it) solution:
This one does not add any measurable delay to normal cases in my test.

var elemId = 1;
...
descendant: function(next){
	var isFalseCache = [];

	return function descendant(elem){
		var found = false;

		while(!found && (elem = adapter.getParent(elem))){
			if(!isFalseCache[elem._id]){
				found = next(elem);
				if(!found){
					if(!elem._id) elem._id = elemId++;
					isFalseCache[elem._id] = true;
				}
			}
		}

		return found;
	};
},

(benchmarking done with around 3000 selectors across 30 pages)

If somebody could give me feedback I will make a PR.
Props to the devs btw, this lib is super clever!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.