Giter Site home page Giter Site logo

finder's Introduction

finder

finder

Test

The CSS Selector Generator

Features

  • Generates the shortest selector
  • Unique selectors per page
  • Stable and robust selectors
  • 2kB minified + gzipped

Install

npm install @medv/finder

Usage

import {finder} from '@medv/finder'

document.addEventListener('click', event => {
  const selector = finder(event.target)
  console.log(selector)  
})

Example

An example of a generated selector:

.blog > article:nth-child(3) .add-comment

Configuration

const selector = finder(event.target, {
  root: document.body,          // Root of search, defaults to document.body.
  idName: (name) => true,       // Check if this ID can be used.
  className: (name) => true,    // Check if this class name can be used.
  tagName: (name) => true,      // Check if tag name can be used.
  attr: (name, value) => false, // Check if attr name can be used.
  seedMinLength: 1,           
  optimizedMinLength: 2,
  threshold: 1000,
  maxNumberOfTries: 10_000,
  timeoutMs: undefined,
})

seedMinLength

Minimum length of levels in fining selector. Starts from 1. For more robust selectors give this param value around 4-5 depending on depth of you DOM tree. If finder hits the root, this param is ignored.

optimizedMinLength

Minimum length for optimising selector. Starts from 2. For example selector body > div > div > p can be optimised to body p.

threshold

Max number of selectors to check before falling into nth-child usage. Checking for uniqueness of selector is very costly operation, if you have DOM tree depth of 5, with 5 classes on each level, that gives you more than 3k selectors to check. Default 1000 is good enough in most cases.

maxNumberOfTries

Max number of tries for the optimization. This is a trade-off between optimization and efficiency. Default 10_000 is good enough in most cases.

timeoutMs

Optional timeout in milliseconds. undefined (no timeout) by default. If timeoutMs: 500 is provided, an error will be thrown if selector generation takes more than 500ms.

Become a sponsor

Every line of code in my repositories πŸ“– signifies my unwavering commitment to open source πŸ’‘. Your support 🀝 ensures these projects keep thriving, innovating, and benefiting all πŸ’Ό. If my work has ever resonated 🎡 or helped you, kindly consider showing love ❀️ by sponsoring. πŸš€ Sponsor Me Today! πŸš€

License

MIT

finder's People

Contributors

antonlapshin avatar antonmedv avatar eggachecat avatar ephetic avatar jccr avatar lapritchett avatar mattmikolay avatar nolaneo avatar thepaulmcbride avatar thisispaul avatar williamipark avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

finder's Issues

Allow custom document pointers

I planned on using this for content within a TinyMCE editor document, which is another iframe, but the document reference is not configurable.

I would suggest a new configuration option:

finder(node, {
  document: editor.getDoc()
})

Support UMD/global or add usage notes to README

Can you please either support UMD/global or add some usage notes to the README? I assume you're expecting users to include this into source via build pipeline like webpack, but just to play around I tried including it via script and it fails with this error:

index.js:47 Uncaught ReferenceError: exports is not defined
    at index.js:47

Thanks!

Selector was not found for all the elements on page

I am working on a POC to allow element style changes & preview and the library is throwing error: "Selector was not found" for all the elements (event.target)on a specific page. Even if root, element values are properly being passed to the finder function.
Sample webpage where it is failing - https://www.sanaullastore.com/
It is able to return the selector while using extension version of code but failing if same code is added via javascript.

IE 11 Not Supported

In IE 11, I get a syntax error that looks to be due to the use of an arrow function, which is not supported in IE 11.

[feature request] Add config option to include structural/semantic elements

So I'm not sure if this is out of scope for this project.

I'm not so much interested as the shortest selector, as the most meaningful one.

From that point of view, any of the following tags are interesting:
https://www.w3schools.com/html/html5_semantic_elements.asp

I'd like those to be always included in the selector if they are present in the ancestor list.

I guess these would have to be qualified if they are not unique, so article:nth-of-type(2).

Does this request make any sense or is it a different project?

Version without module

I love this library, but I am having many troubles to use it in some contexts, like for example in a Chrome Extension I am implementing.

The fact that it is packaged like a module is making things more complicate.

Is is possible to have a version that is not using modules?

Thanks

Ignore dynamic Angular attributes

Angular dynamically adds attributes called β€œ_ngcontent-#” to elements by default.
As those attribute names are regenerated on every load, the finder should ignore these attributes while generating css pointers.

Example Angular 9
Pointer generated by your tool:
[_ngcontent-vrx-c260=β€œβ€] > [href=β€œmailto:[email protected]”]
after a reload the same elements pointer would need to be:
[_ngcontent-wok-c260=β€œβ€] > [href=β€œmailto:[email protected]”]

Different Angular versions have different ng-content id structures, but it would be fixed by generally ignore all attributes that start with "_ngcontent"

Selector not found

For some pages ->

  function sort(paths: Iterator<Path> & Iterable<Path>): Path[] {
    return Array.from(paths).sort((a, b) => penalty(a) - penalty(b))
  }

Output of "Array.from(paths)" is returning => [ ] (empty array)
Adding a polyfill substitute in this method is solving the issue ->
ex- code

function transformToArray <a>(a: Iterator<a>) {
  let result: a[] = []
  let current = a.next()
  while(current.done == false) {
    result.push(current.value)
    current = a.next()
  }
  return result
}

  function sort(paths: Iterator<Path> & Iterable<Path>): Path[] {
    return transformToArray(paths).sort((a, b) => penalty(a) - penalty(b))
  }

Let me know, I can raise a PR for the same.

Consider caching/memoizing "unique" function

First of all, thank you for the library! I think this is the best most robust CSS query generator. I use it in a lot of my projects.

For one of my project, I needed to generate css selector for ALL dom elements on the page. For this specific case, finder is also perfect, unfortunately if page is big, time taken to generate unique css selector for all dom elements can add up, up to 250 seconds, which is unusable for my use case.

After doing performance analysis, "unique" function is called a lot of times to figure out if selector is not unique, and it is called repeatedly with the same arguments. Caching this function drops execution time for that specific page (and other pages) from 250 seconds to under 1 second at the cost of memory.

something like this worked for me:

const cache = new Map();

  function unique(a) {
      const key = JSON.stringify(a);
      if (cache.has(key)) {
       return cache.get(key);
      } else {
        switch (rootDocument.querySelectorAll(selector(a)).length) {
            case 0:
                cache.set(key, !1);
                throw new Error(`Can't select any node with this selector: ${selector(a)}`);
            case 1:
                cache.set(key, !0);
                return !0;
            default:
                cache.set(key, !1);
                return !1;
        }
      }
  }

cssesc vs CSS.escape

Hi, Could you please tell me what is the difference between cssesc and CSS.escape ? Is it possible to replace cssesc with CSS.escape ?

Error handling

Hey.

First off thank you for creating Finder. It is awesome and it has been the back bone to some awesome features we have on our site.

We are having a small issue with how finder handles errors when it can't find nodes etc, because it throws them. This is a problem because it's picked up by Sentry, and it is creating a lot of noise in our error reporting dashboard.

Would you be open to adding on option for, or changing the error handling so it's a little more graceful? I would personally be happy to write a PR for this if you accept the idea of the improvement.

Thanks again!

Generating unexpected selector when specifying a root selector

Hi, for this test document, I am attempting to find a selector for <p>Line 1</p> When no selector root is explicitly set, it returns:
#module > p

However, when I set the root selector to document.querySelector("#module"), it still returns the same thing. I would think it should return a selector that doesn't contain #module since we are using that as the selector root?

test("root:setandfind", (t) => {
  const html = `
  <div class="document">
  <div id="module">
    <span id="span1"></span>
    <p>Line 1</p>
    <div id="introduction">
      <h2>
        Introduction
      </h2>
      <p>Line 2</p>
    </div>
  </div>
</div>
  `;
  document.write(html);
  let rootelem = document.querySelector("#module");
  let pElement = rootelem.querySelector("p");

  const css = selector(pElement);
  console.log("Default selector (no options): ", css);

  const newcss2 = selector(pElement, { root: rootelem });
  console.log("Selector after setting root to (#module): ", newcss2);
});

Output:

Default selector (no options):  #module > p
Selector after setting root to (#module):  #module > p

As you can see, the selector still seems to reference the ID of the root selector.
Is this a bug or am I misunderstanding the behavior of setting the root selector?

root config doesn't allow Document

For an embedded iframe, I defined root: body but this then fails if I try to find the body element since body.querySelectorAll('body') returns an empty array. This scenario seems to be handled for the default case since the ownerDocument is returned for the rootDocument instead of the body. It seems like the root config should allow both Document and Element. Then the default for root could be updated to document and I think findRootDocument could then be removed.

As a workaround, I am passing the ownerDocument and ignoring the TypeScript error.

Forms with input named "id" causes cssesc to break

Can be tested with something like:

<form id="test">
  <input type="text" name="id">
</form>

In this case input.id in index.ts:168 will be the input element, not "test".
input.getAttribute('id') might work better.

Support for XML namespace prefixes in DOM Element names / tagnames (XHTML, SVG, MathML), with proposed fix

finder/finder.ts

Lines 220 to 229 in feef199

function tagName(input: Element): Knot | null {
const name = input.tagName.toLowerCase()
if (config.tagName(name)) {
return {
name,
penalty: 2,
}
}
return null
}

My current workaround is:

{
name: name.replace(/^(.+:)(.+)$/, "*|$2"),
}

Examples:

  • m:math ==> *|math
  • svg:a ==> *|a
  • div ==> div

Unfortunately the wildcard * prefix is necessary because the querySelector() API does not support namespace prefixes, unlike CSS stylesheets:

https://developer.mozilla.org/en-US/docs/Web/SVG/Element/a

...consequently, for example *|a will match not only SVG link elements but also HTML hyperlinks! :(

https://www.w3.org/TR/selectors-api/#namespace-prefix-needs-to-be-resolved

The execution speed is relatively slow

When the seedMinLength property is set to 20, the execution speed is very slow, and the browser will be stuck for a while. Can the execution speed be improved?

let selector = finder(e.target, {
  className: (name) => {
    return !name.startsWith('is-');
  },
  tagName: (name) => true,
  seedMinLength: 20,
  optimizedMinLength: 15
});

custom penalty

I'd like to customize the penalty so that the selectors this generates primarily use a specific attribute, but if that attribute isn't present on a descendant fallback to using a classname.

What are your thoughts on adding a new option called penalties that could look like this for example

penalties: { tag: 10, attr: 0, id: 8, classname: 10, any: 20 }

Site crushes when trying to find large amount of selectors with attributes to exclude

This is occur to me in many sites, when there is large amount of selectors to generate (more the 2000) it crushes.

my conclusion is that it cause by excluding values and names in the attributes.
the new property of "maxNumberOfTries" doesn't help.

Example of my code for the exclude attributes.

excludeByAtributeName=["style", "ng", "on","event"];
excludeAttributeValue=["@","?","$","#","~","&","%","^","*","alert","console.log","confirm"];

attr: (name: string, value: string) =>
!excludeByAtributeName
.some(a => name.includes(a)) &&
!excludeAttributeValue
.some(c => value.includes(c))

ERROR Error: Can't select any node with this selector: .h-12

It worked when I clicked on the same control after refreshing, but I went back and clicked it again and made an error 😒

document.addEventListener('click', event => {
const element = event.target;
console.log('element---------------->', element);

  const selector = finder(element);
  console.log('selector---------------->', selector);

}

success:
element---------------->'<β€˜img _ngcontent-serverapp-c96="" src="receive.png" class="h-12 w-12"’>
selector----------------> .relative:nth-child(3) > .h-12

When the failure:
element----------------> <β€˜img _ngcontent-serverapp-c96="" src="receive.png" class="h-12 w-12"’>
ERROR Error: Can't select any node with this selector: .h-12
at unique (index.js:188) [angular]
at findUniquePath (index.js:161) [angular]
at _loop_1 (index.js:136) [angular]
at bottomUpSearch (index.js:145) [angular]
at default_1 (index.js:79) [angular]
at HTMLDocument. (all-track-action-listener.ts:38) [angular]
at Object.onInvokeTask (core.js:41762) [angular]
at ZoneTask.invokeTask [as invoke] (zone-evergreen.js:480) []
at invokeTask (zone-evergreen.js:1621) []
at HTMLDocument.globalZoneAwareCallback (zone-evergreen.js:1658) []

Finder vs Optimal-Select Poor performance

I'm importing the default select and finder, then doing a test against various HTML dom. To my surprise, finder was not able to produce valid id and was a lot slower than select.

Optimal-Select:

image

Finder:

image

Surely it produced a little selector, but the cost seems too great (4ms vs 19ms).

I tested it against news.hackernews.com, with this selector,

document.querySelector("#hnmain > tbody > tr:nth-child(3) > td > table > tbody > tr:nth-child(2) > td.subtext > a:nth-child(6)")

This is not noticeable if there are less elements,
image

Observation:

It's not same everytime, but here are some more tests on same page, just with more elements inside document.body.

Elements inside document.body: 740
Select: 1.54ms
Finder: 1.84ms

Elements inside document.body: 3496
Select: 4ms
Finder: 16ms

Elements inside document.body: 6941
Select: 6ms
Finder: 31ms

Elements inside document.body: 68951
Select: 60ms
Finder: 506ms

I will try to prepare a re-producable example/benchmark. But it really seems something is not right here.

The benchmark on this repo confirms this. It says it's slower speed but still fast enough for regular usage.

Can someone confirm this issue somehow and how can we address this?

Duplicate IDs will cause Finder to throw an error and not return a selector

Hi there πŸ‘‹

I've run into a bug where finder will throw an error if two elements on the same page have the same ID. This is the error being thrown:

throw new Error(`Selector was not found.`)

Very small reproducible example:

<h1 id="duplicate_id">This ID is not unique</h1>
<h1 id="duplicate_id" >This ID is not unique either</h1>

Maybe an unoptimized selector could be returned instead here?

Prefer `[id="123"]` vs. `#\31 23`

It took me quite a while to work out what was going on with ids when they started with a numeral:

#\\34 69bb17329ad2642a43d .jss138

IMO, the above is not great for display purposes.

Could we switch to the following format instead?

[id="469bb17329ad2642a43d"] .jss138"

Selector not found

The script does not work on this website: https://jrdunn.com/
The error is

VM1638:1 Uncaught Error: Selector was not found.
    at finder (<anonymous>:1:724)
    at HTMLDocument.<anonymous> (<anonymous>:2:20)

Supporting attributes

I've been doing quite a bit of background research on this space, and finder seems to be smarter than the competition by quite a bit. I also like the testing on real-world fixtures.

One feature that I'd love to see added is the ability to support HTML attributes. Some attributes like type=password are very strong signals and oftentimes unique single-level selectors.

Cheers πŸ‘

[Feature request] Ability to prioritize certain attributes over keeping the selectors as short as possible.

The problem

Currently the library tries to generate selectors that are unique and as a short as possible. Which is great in most cases, but sometimes in a changing web-page, it results in selectors that are not future-proof .

If a page has one link to start with, the library would probably produce a as selector for that link. While this works for the webpage as is, it may not work when the webpage has another link added soon. (Example, take blogs or new websites for example where the links can keep changing). In such a case, having the ability to specific which attribute to always keep a part of the selector would help. (in this case, href is a good candidate)

How could the library support this ?

Perhaps some option where one can specify attributes to keep at a tag basis. And when the library sees an element have one of these tags, it includes it in the final selector even if that doesn't necessarily result in a shorter selector.

{
  prioritizeAttributes: {
    tag: [<the attributes>]
  }
} 

Great library btw πŸ˜„
Thank you

Finder Timeout

We want to be able to set a timeout on finder. If we are not able to generate CSS selectors in a certain amount of time, we'd like to return early if the timeout is breached.

Is this something that could be possible?

Happy to create a PR and give it a shot as well.

Is it possible to support custom html elements?

#Offending page
Link

Offending Element

In the 'Save extra with 4 offers' section, there is a custom HTML element

<dptags:querylogoperation methodname="addCount" metric="SOPP:sellerPromotionBTF">
..content
</dptags:querylogoperation>

Type info declaration file (.d.ts)

I'm trying to use this in a typescript project as a dependency.
After importing I get this error in the TS compiler:

semantic error TS7016 Could not find a declaration file for module '@medv/finder'. '/Users/abc/xyz/node_modules/@medv/finder/dist/index.js' implicitly has an 'any' type.
  Try `npm install @types/medv__finder` if it exists or add a new declaration (.d.ts) file containing `declare module 'medv__finder';

Could you publish the generated typings file? Thanks!

Can I get a prebundled version?

I've been struggling of bundling the source code past week, I've tried a lot of ways to bundle it but nothing seems working well with my product. so I think It'd be really helpful for me and others as well If you provide prebundled source.

[feature request] - generation of 'alternate' selector

So let's say I have generated a lovely selector like

.opened .single-link > span

My problem is that while the opened class is unique on the page at the moment, with my next click, I will open another 'section' of the page, and then there will be two opened classes present, and my original selector is no longer unique.

I guess I could use a MutationObserver to watch for class changes and build up a list of 'volatile' classnames, however this will not work if the HTML has exactly one .opened class at the start.

So I was thinking that it would be nice to have the ability to generate a 'backup' selector, which would be the equivalent of the following

finder(target,{
    className: (e) => e !== 'opened' && e !== 'single-link',
})

Is that something that is worthwhile to elevate to a single config option, which takes into account what would normally be generated, but then operates on alternative basis that whatever is currently unique, may not be in future?

No selector found with root node passed in

I have come across an issue with my specific use case and the "unique" functionality.

I am using this library to find a path to the element from a root node (its an SVG element). This SVG element happens to be duplicated on the page so when the library runs the unique check the selectors it generates from the root passed in is not unique (to the page) so i always get no path.

It doesn't look like there is a way to specify that I only need the path to be unique from the root element passed in as that is what I will do the resulting query in. I don't know if this is too much of an edge case to implement into the library but I don't know any other way of doing it (without writing my own).

I will open a pull request to see if I can make this work but any pointers are appreciated.
I am not familiar with TS and couldn't get it to compile due to some errors already present it seems. Would it be simple to add a way to use the root node as the "rootDocument" for unique checking?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.