Giter Site home page Giter Site logo

yureien / text-fragment-scraper Goto Github PK

View Code? Open in Web Editor NEW
4.0 2.0 0.0 48 KB

Scrape highlighted text using text fragments

Home Page: https://www.npmjs.com/package/text-fragment-scraper

TypeScript 97.76% JavaScript 2.24%
puppeteer scraper text-fragment text-fragment-url text-fragments hacktoberfest

text-fragment-scraper's Introduction

Text Fragment Scraper

Obtains the entire highligted text from URLs (using text fragments) and then returns them as an array.

If the text fragment can be extracted directly from the URL without having to open the website, it does that. Else, it scrapes the website text to extract the entire highlighted text.

Uses Puppeteer to scrape the website.

Example

scrapeURL("https://web.dev/text-fragments/#:~:text=Text%20Fragments%20let%20you%20specify%20a%20text%20snippet%20in%20the%20URL%20fragment");

// Returns the following
[ 'Text Fragments let you specify a text snippet in the URL fragment' ]
// In the above case, it does not scrape the site since the text is present in URL itself.

scrapeURL("https://web.dev/text-fragments/#:~:text=The%20fact%20though,Text%20Fragments%20solve");

// Returns the following
[
  'The fact though that I had to open the Developer Tools to find the id of an element speaks volumes about the probability this particular section of the page was meant to be linked to by the author of the blog post.What if I want to link to something without an id? Say I want to link to the ECMAScript Modules in Web Workers heading. As you can see in the screenshot below, the <h1> in question does not have an id attribute, meaning there is no way I can link to this heading. This is the problem that Text Fragments solve'
]
// In this case though, it actually scrapes the entire site for this text.

text-fragment-scraper's People

Contributors

yureien avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.