Giter Site home page Giter Site logo

bharat-1809 / see-link Goto Github PK

View Code? Open in Web Editor NEW
16.0 2.0 0.0 238 KB

๐Ÿ”Ž Get the preview metadata like title, description, image, video, etc from a link or a URL extracted from the given text.

Home Page: https://seelink.bharatsharma.me

License: MIT License

JavaScript 87.03% HTML 12.75% Shell 0.21%
url preview link link-preview url-preview javascript npm-package puppeteer nodejs

see-link's Introduction

See-Link

Build Status Package Version License Donate

See-a-Link! Get the preview metadata like title, description, image, video, etc from a link or a link extracted from the given text.

See-Link looks through the open-graph, twitter cards markup and other meta tags to get the preview information from the link. It visits a link in a headless browser and scrapes the required information. It can also return the dominant color of the page in case the theme-color is needed but no meta info is found.

To know more about how See-Link works checkout this wiki.

Check out this article to know about meta tags, SEO and their need for generating link previews.

Table of Contents

Features

  • Get the dominant color of the page if required.
  • Can return images present in html body incase meta tags are not available.
  • Can be customized to return the info as per the requirement.
  • Uses stealth measures and user-agent can be customized to bypass restrictions setup by sites to prevent bot scraping.

NOTE

  • A different domain cannot be requested from your web app (browsers block cross-origin-requests). If you do not know how same-origin-policy works, here is a great article, therefore this library works on node (back-end environments) and certain mobile run-times (like react-native).

  • This library fetches the website and parses its html using puppeteer, as if the user would visit the page. This means that some websites might redirect you to a sign-up page. You can try to change the user-agent property (by default it uses Facebook's user agent) and the response you get then might be different, but there is nothing wrong with this library.

Getting Started

npm install see-link

Usage

Its quite simple:

const seeLink = require('see-link');

(async () => {
    const preview = await seeLink('https://www.bharatsharma.me');

    // You can directly pass a url as above or pass a chunk of text
    // and seeLink will extract the first link from it. Like this:

    const preview_text = await seeLink('This text will be parsed by seeLink https://www.bharatsharma.me');

    console.log(preview);
})();

The above code will result in the following output:

{
  title: 'Bharat Sharma',
  description: 'Personal website of Bharat Sharma',
  image: 'https://bharatsharma.me/assets/images/logo.png',
  domainName: 'bharatsharma.me'
}

API

function seeLink(url: string, options?: seeLink.Options): Promise<SeeLinkRes>

seeLink takes a url and options object (optional). The url string can be any link or a text containing a link. It rejects with an error if there was no URL in the text provided.

seeLink returns a promise that resolves to the preview metadata of the following type:

SeeLinkRes {
  title: string;
  description: string;
  image: string;
  domainName: string;
  video?: string;
  themeColor?: string;
  favIcon?: string;
  type?: string;
}

Options

Additionally you can pass an options object to the function to change the default behaviour:

Option Name Function Type
detailedPreview Get all the possible metadata supported by the see-link (eg: video, theme-color, type, favIcon) boolean
getVideo Get the video metadata along with the default result boolean
getThemeColor Get the theme-color metadata along with the default result boolean
getDominantThemeColor When getThemeColor is set to true and no theme-color meta info is found then get the dominant color of the page. By default it's true boolean
args Arguments to pass to the puppeteer.launch function string[]
userAgent User-Agent to use when visiting the website string
timeout Timeout in milliseconds for the request number
executablePath Path to the chrome/chromium executable string
headless Whether to run the browser in headless mode boolean

Test

The library is tested using Mocha and Chai. You can run the tests by running the following command in the project root:

./test.sh

If bash: ./test.sh: Permission denied error is thrown, you can make the test executable by running the following command:

chmod +x test.sh

Alternatively, you can run the tests by running the following commands in the project root:

npm run-script test-server

then in another terminal run the following command:

npm test

NOTE: The test server runs on port 3000 by default. You can change the port number in test_setup/config.js file.

License

See-Link is released under the MIT License.

see-link's People

Contributors

bharat-1809 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

see-link's Issues

puppeteer-extra-plugin error when running on runkit

When running on runkit or jsfiddle, the following error is thrown:

          A plugin listed 'puppeteer-extra-plugin-user-preferences' as dependency,
          which is currently missing. Please install it:

          yarn add puppeteer-extra-plugin-user-preferences

          Note: You don't need to require the plugin yourself,
          unless you want to modify it's default settings.

Error: EROFS: read-only file system

EROFS error when running on cloud run. This occurs when puppeteer takes a screenshot of the page and tries to save it on the file system for use by color-thief.

Possible solution:

  • Set the destination folder for screenshot to be /tmp instead of current-directory

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.