antonmedv / finder Goto Github PK

View Code? Open in Web Editor NEW

1.3K 19.0 90.0 165 KB

CSS Selector Generator 🗺

License: MIT License

TypeScript 4.83% HTML 93.45% JavaScript 1.72%

finder's Introduction

finder

The CSS Selector Generator

Features

Generates the shortest selector
Unique selectors per page
Stable and robust selectors
2kB minified + gzipped

Install

npm install @medv/finder

Usage

import {finder} from '@medv/finder'

document.addEventListener('click', event => {
  const selector = finder(event.target)
  console.log(selector)  
})

Example

An example of a generated selector:

.blog > article:nth-child(3) .add-comment

Configuration

const selector = finder(event.target, {
  root: document.body,          // Root of search, defaults to document.body.
  idName: (name) => true,       // Check if this ID can be used.
  className: (name) => true,    // Check if this class name can be used.
  tagName: (name) => true,      // Check if tag name can be used.
  attr: (name, value) => false, // Check if attr name can be used.
  seedMinLength: 1,           
  optimizedMinLength: 2,
  threshold: 1000,
  maxNumberOfTries: 10_000,
  timeoutMs: undefined,
})

seedMinLength

Minimum length of levels in fining selector. Starts from 1. For more robust selectors give this param value around 4-5 depending on depth of you DOM tree. If finder hits the root, this param is ignored.

optimizedMinLength

Minimum length for optimising selector. Starts from 2. For example selector body > div > div > p can be optimised to body p.

threshold

Max number of selectors to check before falling into nth-child usage. Checking for uniqueness of selector is very costly operation, if you have DOM tree depth of 5, with 5 classes on each level, that gives you more than 3k selectors to check. Default 1000 is good enough in most cases.

maxNumberOfTries

Max number of tries for the optimization. This is a trade-off between optimization and efficiency. Default 10_000 is good enough in most cases.

timeoutMs

Optional timeout in milliseconds. undefined (no timeout) by default. If timeoutMs: 500 is provided, an error will be thrown if selector generation takes more than 500ms.

Become a sponsor

Every line of code in my repositories 📖 signifies my unwavering commitment to open source 💡. Your support 🤝 ensures these projects keep thriving, innovating, and benefiting all 💼. If my work has ever resonated 🎵 or helped you, kindly consider showing love ❤️ by sponsoring. 🚀 Sponsor Me Today! 🚀

License

MIT

finder's People

Contributors

Stargazers

Watchers

Forkers

zhanglei923 antonlapshin jccr indifferentalex lukasdrgon raphaelfaria intercom thisispaul stephensebastin nachoab gabasch85 asiellb flatrow smuszel daysv jerrydeng kino6052 ephetic mckenna prateek479 slavivanov engineerapart agrublev brobles82 cybernetics zg-lib honzababarik nhokcrazy199 abbeycampbell josephjlee doc22940 eggachecat appinitio abbadata williamipark wooter-s roughsoft conormaguire jamieni lemmingqa spread0x iamsaquib8 eric013 twikio gaurav645 821938089 suitespacedev thepaulmcbride veedeo nolaneo nick121212 matang28 hylee-lhy ecomgraduates mattmikolay ssivov svemat01 matthewpenkala iqiuyu-0821 wooodhead rom-twik threerocks lushun5 akash07k hawkeyexl chalecao strzalkowski tikazyq corbie11 witalewski amol-16 shirtiny jakubgajewskicode livingomw eoghanmurray ynjw kuangshi00 bryanspacex aimwhy intuned msartiano lapritchett kepelrs yang-ann lawweiliang yuelight helixmorphe helkyle piotrmlnm

finder's Issues

Allow custom document pointers

I planned on using this for content within a TinyMCE editor document, which is another iframe, but the document reference is not configurable.

I would suggest a new configuration option:

finder(node, {
  document: editor.getDoc()
})

Support UMD/global or add usage notes to README

Can you please either support UMD/global or add some usage notes to the README? I assume you're expecting users to include this into source via build pipeline like webpack, but just to play around I tried including it via script and it fails with this error:

index.js:47 Uncaught ReferenceError: exports is not defined
    at index.js:47

Thanks!

Finder returns non-unique selector

On this page: https://www.buildzoom.com/contractor/live-wire-electric-1-llc

Calling finder on the element returned by this selector: "#credentials > div.team-wrap.row > div:nth-child(2) > div:nth-child(1) > div.biz_info.no-bottom-border.biz-info- > div.bz-content-wrapper > div > table > tbody > tr:nth-child(1) > td.table-item.u-hyphenate"

will make finder return the selector ".table-item", which is non-unique.

Selector was not found for all the elements on page

I am working on a POC to allow element style changes & preview and the library is throwing error: "Selector was not found" for all the elements (event.target)on a specific page. Even if root, element values are properly being passed to the finder function.
Sample webpage where it is failing - https://www.sanaullastore.com/
It is able to return the selector while using extension version of code but failing if same code is added via javascript.

Support using numbers as classnames or id values

document.querySelector('.123') => document.querySelector('[class="123"]')
document.querySelector('#123') => document.querySelector('[id="123"]')

https://benfrain.com/when-and-where-you-can-use-numbers-in-id-and-class-names/

Improvement: add second first-match fallback

Add before nth-child fallback, first-match fallback:

Select first match of each level before moving to next.

Shadow DOM Support

Chrome Dev Tools can generate a "JS Path" for an element:

https://developers.google.com/web/updates/2018/11/devtools#copy

It would be great if finder could generate a Chrome Dev Tools-style JS Path for an element, that would be an important step to developing record/replay integration tests for shadow-dom enabled apps.

AFAICT This is a pre-requisite for checkly/headless-recorder#51

Use nth-of-type instead of nth-child when appropriated

Possibility to set extended CSS selector library

Hi there,

I'm using SizzleJs (https://sizzlejs.com) library as extension for my selectors.
Is that possible to extend default finder selectors with that?

Thanks

IE 11 Not Supported

In IE 11, I get a syntax error that looks to be due to the use of an arrow function, which is not supported in IE 11.

[feature request] Add config option to include structural/semantic elements

So I'm not sure if this is out of scope for this project.

I'm not so much interested as the shortest selector, as the most meaningful one.

From that point of view, any of the following tags are interesting:
https://www.w3schools.com/html/html5_semantic_elements.asp

I'd like those to be always included in the selector if they are present in the ancestor list.

I guess these would have to be qualified if they are not unique, so article:nth-of-type(2).

Does this request make any sense or is it a different project?

Version without module

I love this library, but I am having many troubles to use it in some contexts, like for example in a Chrome Extension I am implementing.

The fact that it is packaged like a module is making things more complicate.

Is is possible to have a version that is not using modules?

Thanks

Argument of type 'Node' is not assignable to parameter of type 'Element'.

In my case I'm getting a Node as my event target but finder expects a Element which makes TypeScript complain about it.

Ignore dynamic Angular attributes

Angular dynamically adds attributes called “_ngcontent-#” to elements by default.
As those attribute names are regenerated on every load, the finder should ignore these attributes while generating css pointers.

Example Angular 9
Pointer generated by your tool:
[_ngcontent-vrx-c260=“”] > [href=“mailto:[email protected]”]
after a reload the same elements pointer would need to be:
[_ngcontent-wok-c260=“”] > [href=“mailto:[email protected]”]

Different Angular versions have different ng-content id structures, but it would be fixed by generally ignore all attributes that start with "_ngcontent"

Selector not found

For some pages ->

  function sort(paths: Iterator<Path> & Iterable<Path>): Path[] {
    return Array.from(paths).sort((a, b) => penalty(a) - penalty(b))
  }

Output of "Array.from(paths)" is returning => [ ] (empty array)
Adding a polyfill substitute in this method is solving the issue ->
ex- code

function transformToArray <a>(a: Iterator<a>) {
  let result: a[] = []
  let current = a.next()
  while(current.done == false) {
    result.push(current.value)
    current = a.next()
  }
  return result
}

  function sort(paths: Iterator<Path> & Iterable<Path>): Path[] {
    return transformToArray(paths).sort((a, b) => penalty(a) - penalty(b))
  }

Let me know, I can raise a PR for the same.

Chrome extension

If someone wants to work on this. :)

maxNumberOfTries option not exist in the budle index.d.ts

the option 'maxNumberOfTries' exist in the documantion but when i'm installing the package through npm, the bundled package not conastins this option.

Consider caching/memoizing "unique" function

First of all, thank you for the library! I think this is the best most robust CSS query generator. I use it in a lot of my projects.

For one of my project, I needed to generate css selector for ALL dom elements on the page. For this specific case, finder is also perfect, unfortunately if page is big, time taken to generate unique css selector for all dom elements can add up, up to 250 seconds, which is unusable for my use case.

After doing performance analysis, "unique" function is called a lot of times to figure out if selector is not unique, and it is called repeatedly with the same arguments. Caching this function drops execution time for that specific page (and other pages) from 250 seconds to under 1 second at the cost of memory.

something like this worked for me:

const cache = new Map();

  function unique(a) {
      const key = JSON.stringify(a);
      if (cache.has(key)) {
       return cache.get(key);
      } else {
        switch (rootDocument.querySelectorAll(selector(a)).length) {
            case 0:
                cache.set(key, !1);
                throw new Error(`Can't select any node with this selector: ${selector(a)}`);
            case 1:
                cache.set(key, !0);
                return !0;
            default:
                cache.set(key, !1);
                return !1;
        }
      }
  }

cssesc vs CSS.escape

Hi, Could you please tell me what is the difference between cssesc and CSS.escape ? Is it possible to replace cssesc with CSS.escape ?

Uncompiled `finder.ts` file is a target of compilation by typescript

I've never seen a .ts in an npm module before.

My typescript tries to typecheck it and complains, and I haven't been able to find a way to get it turned off. I'm not sure why a source file would belong in the published module anyhow-- could we remove it?

Error handling

Hey.

First off thank you for creating Finder. It is awesome and it has been the back bone to some awesome features we have on our site.

We are having a small issue with how finder handles errors when it can't find nodes etc, because it throws them. This is a problem because it's picked up by Sentry, and it is creating a lot of noise in our error reporting dashboard.

Would you be open to adding on option for, or changing the error handling so it's a little more graceful? I would personally be happy to write a PR for this if you accept the idea of the improvement.

Thanks again!

Generating unexpected selector when specifying a root selector

Hi, for this test document, I am attempting to find a selector for <p>Line 1</p> When no selector root is explicitly set, it returns:
#module > p

However, when I set the root selector to document.querySelector("#module"), it still returns the same thing. I would think it should return a selector that doesn't contain #module since we are using that as the selector root?

test("root:setandfind", (t) => {
  const html = `
  <div class="document">
  <div id="module">
    <span id="span1"></span>
    <p>Line 1</p>
    <div id="introduction">
      <h2>
        Introduction
      </h2>
      <p>Line 2</p>
    </div>
  </div>
</div>
  `;
  document.write(html);
  let rootelem = document.querySelector("#module");
  let pElement = rootelem.querySelector("p");

  const css = selector(pElement);
  console.log("Default selector (no options): ", css);

  const newcss2 = selector(pElement, { root: rootelem });
  console.log("Selector after setting root to (#module): ", newcss2);
});

Output:

Default selector (no options):  #module > p
Selector after setting root to (#module):  #module > p

As you can see, the selector still seems to reference the ID of the root selector.
Is this a bug or am I misunderstanding the behavior of setting the root selector?

Hang on generation

Generation hangs if you try to generate css selector for

Your location

element from the attached HTML file.

finder_issue.html.txt

root config doesn't allow Document

For an embedded iframe, I defined root: body but this then fails if I try to find the body element since body.querySelectorAll('body') returns an empty array. This scenario seems to be handled for the default case since the ownerDocument is returned for the rootDocument instead of the body. It seems like the root config should allow both Document and Element. Then the default for root could be updated to document and I think findRootDocument could then be removed.

As a workaround, I am passing the ownerDocument and ignoring the TypeScript error.

Forms with input named "id" causes cssesc to break

Can be tested with something like:

<form id="test">
  <input type="text" name="id">
</form>

In this case input.id in index.ts:168 will be the input element, not "test".
input.getAttribute('id') might work better.

Support for XML namespace prefixes in DOM Element names / tagnames (XHTML, SVG, MathML), with proposed fix

finder/finder.ts

Lines 220 to 229 in feef199

    
           function tagName(input: Element): Knot | null { 
        
             const name = input.tagName.toLowerCase() 
        
             if (config.tagName(name)) { 
        
               return { 
        
                 name, 
        
                 penalty: 2, 
        
               } 
        
             } 
        
             return null 
        
           }

My current workaround is:

{
name: name.replace(/^(.+:)(.+)$/, "*|$2"),
}

Examples:

m:math ==> *|math
svg:a ==> *|a
div ==> div

Unfortunately the wildcard * prefix is necessary because the querySelector() API does not support namespace prefixes, unlike CSS stylesheets:

https://developer.mozilla.org/en-US/docs/Web/SVG/Element/a

...consequently, for example *|a will match not only SVG link elements but also HTML hyperlinks! :(

https://www.w3.org/TR/selectors-api/#namespace-prefix-needs-to-be-resolved

The execution speed is relatively slow

When the seedMinLength property is set to 20, the execution speed is very slow, and the browser will be stuck for a while. Can the execution speed be improved?

let selector = finder(e.target, {
  className: (name) => {
    return !name.startsWith('is-');
  },
  tagName: (name) => true,
  seedMinLength: 20,
  optimizedMinLength: 15
});

custom penalty

I'd like to customize the penalty so that the selectors this generates primarily use a specific attribute, but if that attribute isn't present on a descendant fallback to using a classname.

What are your thoughts on adding a new option called penalties that could look like this for example

penalties: { tag: 10, attr: 0, id: 8, classname: 10, any: 20 }

src/index.ts is used by index.js.map, but not included in released package

Cause a warning:

(Emitted value instead of an instance of Error) Cannot find source file '../src/index.ts': Error: Can't resolve '../src/index.ts'

CDN Stopped working

Hello @antonmedv, thank you for the amazing library!
I oppened this issue to let you know that the CDN(https://medv.io/finder/finder.js) in README.md stopped working.

For anyone having trouble the alternative CDN link https://cdn.jsdelivr.net/npm/@medv/finder is working.

Site crushes when trying to find large amount of selectors with attributes to exclude

This is occur to me in many sites, when there is large amount of selectors to generate (more the 2000) it crushes.

my conclusion is that it cause by excluding values and names in the attributes.
the new property of "maxNumberOfTries" doesn't help.

Example of my code for the exclude attributes.

excludeByAtributeName=["style", "ng", "on","event"];
excludeAttributeValue=["@","?","$","#","~","&","%","^","*","alert","console.log","confirm"];

attr: (name: string, value: string) =>
!excludeByAtributeName
.some(a => name.includes(a)) &&
!excludeAttributeValue
.some(c => value.includes(c))

ERROR Error: Can't select any node with this selector: .h-12

It worked when I clicked on the same control after refreshing, but I went back and clicked it again and made an error 😢

document.addEventListener('click', event => {
const element = event.target;
console.log('element---------------->', element);

  const selector = finder(element);
  console.log('selector---------------->', selector);

}

success：
element---------------->'<‘img _ngcontent-serverapp-c96="" src="receive.png" class="h-12 w-12"’>
selector----------------> .relative:nth-child(3) > .h-12

When the failure：
element----------------> <‘img _ngcontent-serverapp-c96="" src="receive.png" class="h-12 w-12"’>
ERROR Error: Can't select any node with this selector: .h-12
at unique (index.js:188) [angular]
at findUniquePath (index.js:161) [angular]
at _loop_1 (index.js:136) [angular]
at bottomUpSearch (index.js:145) [angular]
at default_1 (index.js:79) [angular]
at HTMLDocument. (all-track-action-listener.ts:38) [angular]
at Object.onInvokeTask (core.js:41762) [angular]
at ZoneTask.invokeTask [as invoke] (zone-evergreen.js:480) []
at invokeTask (zone-evergreen.js:1621) []
at HTMLDocument.globalZoneAwareCallback (zone-evergreen.js:1658) []

Finder vs Optimal-Select Poor performance

I'm importing the default select and finder, then doing a test against various HTML dom. To my surprise, finder was not able to produce valid id and was a lot slower than select.

Optimal-Select:

Finder:

Surely it produced a little selector, but the cost seems too great (4ms vs 19ms).

I tested it against news.hackernews.com, with this selector,

document.querySelector("#hnmain > tbody > tr:nth-child(3) > td > table > tbody > tr:nth-child(2) > td.subtext > a:nth-child(6)")

This is not noticeable if there are less elements,

Observation:

It's not same everytime, but here are some more tests on same page, just with more elements inside document.body.

Elements inside document.body: 740
Select: 1.54ms
Finder: 1.84ms

Elements inside document.body: 3496
Select: 4ms
Finder: 16ms

Elements inside document.body: 6941
Select: 6ms
Finder: 31ms

Elements inside document.body: 68951
Select: 60ms
Finder: 506ms

I will try to prepare a re-producable example/benchmark. But it really seems something is not right here.

The benchmark on this repo confirms this. It says it's slower speed but still fast enough for regular usage.

Can someone confirm this issue somehow and how can we address this?

Support UMD bundles too

Can we please have a UMD build too? This would help users run this in browser environments as well, using something like unpkg.

I tried https://unpkg.com/@medv/[email protected]/dist/index.js and it doesn't work in browsers.

Duplicate IDs will cause Finder to throw an error and not return a selector

Hi there 👋

I've run into a bug where finder will throw an error if two elements on the same page have the same ID. This is the error being thrown:

finder/finder.ts

Line 69 in 1757700

throw new Error(`Selector was not found.`)

Very small reproducible example:

<h1 id="duplicate_id">This ID is not unique</h1>
<h1 id="duplicate_id" >This ID is not unique either</h1>

Maybe an unoptimized selector could be returned instead here?

Prefer `[id="123"]` vs. `#\31 23`

It took me quite a while to work out what was going on with ids when they started with a numeral:

#\\34 69bb17329ad2642a43d .jss138

IMO, the above is not great for display purposes.

Could we switch to the following format instead?

[id="469bb17329ad2642a43d"] .jss138"

Selector not found

The script does not work on this website: https://jrdunn.com/
The error is

VM1638:1 Uncaught Error: Selector was not found.
    at finder (<anonymous>:1:724)
    at HTMLDocument.<anonymous> (<anonymous>:2:20)

Provision of js version to download and consume it in html files

Providing js variant to use it in static pages will be helpful to many . please consider including it in repo or some kind of CDN link will be helpful

Supporting attributes

I've been doing quite a bit of background research on this space, and finder seems to be smarter than the competition by quite a bit. I also like the testing on real-world fixtures.

One feature that I'd love to see added is the ability to support HTML attributes. Some attributes like type=password are very strong signals and oftentimes unique single-level selectors.

Cheers 👍

Test single pass threshold-free method

Instead 3 rollbacks by threshold, test selectors generated by single pass by reducing number of checked selectors by heuristic.

[Feature request] Ability to prioritize certain attributes over keeping the selectors as short as possible.

The problem

Currently the library tries to generate selectors that are unique and as a short as possible. Which is great in most cases, but sometimes in a changing web-page, it results in selectors that are not future-proof .

If a page has one link to start with, the library would probably produce a as selector for that link. While this works for the webpage as is, it may not work when the webpage has another link added soon. (Example, take blogs or new websites for example where the links can keep changing). In such a case, having the ability to specific which attribute to always keep a part of the selector would help. (in this case, href is a good candidate)

How could the library support this ?

Perhaps some option where one can specify attributes to keep at a tag basis. And when the library sees an element have one of these tags, it includes it in the final selector even if that doesn't necessarily result in a shorter selector.

{
  prioritizeAttributes: {
    tag: [<the attributes>]
  }
}

Great library btw 😄
Thank you

Finder Timeout

We want to be able to set a timeout on finder. If we are not able to generate CSS selectors in a certain amount of time, we'd like to return early if the timeout is breached.

Is this something that could be possible?

Happy to create a PR and give it a shot as well.

Is it possible to support custom html elements?

#Offending page
Link

Offending Element

In the 'Save extra with 4 offers' section, there is a custom HTML element

<dptags:querylogoperation methodname="addCount" metric="SOPP:sellerPromotionBTF">
..content
</dptags:querylogoperation>

Gibberish character in produced selector

The following HTML element, (taken from the Hacker News website):

<tr class="athing" id="244329320> ... </tr>

will generate the following, invalid, selector:

#\32 4432930

Type info declaration file (.d.ts)

I'm trying to use this in a typescript project as a dependency.
After importing I get this error in the TS compiler:

semantic error TS7016 Could not find a declaration file for module '@medv/finder'. '/Users/abc/xyz/node_modules/@medv/finder/dist/index.js' implicitly has an 'any' type.
  Try `npm install @types/medv__finder` if it exists or add a new declaration (.d.ts) file containing `declare module 'medv__finder';

Could you publish the generated typings file? Thanks!

Browser version?

Is there bundled browser version?

Support multiple elements

Please add support for multiple elements to be provided

Can I get a prebundled version?

I've been struggling of bundling the source code past week, I've tried a lot of ways to bundle it but nothing seems working well with my product. so I think It'd be really helpful for me and others as well If you provide prebundled source.

[feature request] - generation of 'alternate' selector

So let's say I have generated a lovely selector like

.opened .single-link > span

My problem is that while the opened class is unique on the page at the moment, with my next click, I will open another 'section' of the page, and then there will be two opened classes present, and my original selector is no longer unique.

I guess I could use a MutationObserver to watch for class changes and build up a list of 'volatile' classnames, however this will not work if the HTML has exactly one .opened class at the start.

So I was thinking that it would be nice to have the ability to generate a 'backup' selector, which would be the equivalent of the following

finder(target,{
    className: (e) => e !== 'opened' && e !== 'single-link',
})

Is that something that is worthwhile to elevate to a single config option, which takes into account what would normally be generated, but then operates on alternative basis that whatever is currently unique, may not be in future?

No selector found with root node passed in

I have come across an issue with my specific use case and the "unique" functionality.

I am using this library to find a path to the element from a root node (its an SVG element). This SVG element happens to be duplicated on the page so when the library runs the unique check the selectors it generates from the root passed in is not unique (to the page) so i always get no path.

It doesn't look like there is a way to specify that I only need the path to be unique from the root element passed in as that is what I will do the resulting query in. I don't know if this is too much of an edge case to implement into the library but I don't know any other way of doing it (without writing my own).

~~I will open a pull request to see if I can make this work but any pointers are appreciated.~~
I am not familiar with TS and couldn't get it to compile due to some errors already present it seems. Would it be simple to add a way to use the root node as the "rootDocument" for unique checking?

	function tagName(input: Element): Knot \| null {
	const name = input.tagName.toLowerCase()
	if (config.tagName(name)) {
	return {
	name,
	penalty: 2,
	}
	}
	return null
	}