Giter Site home page Giter Site logo

danialdezfouli / wappalyzer Goto Github PK

View Code? Open in Web Editor NEW

This project forked from juliopontes/wappalyzer

1.0 1.0 0.0 50.68 MB

Identify technology on websites.

Home Page: https://www.wappalyzer.com

License: MIT License

JavaScript 91.24% CSS 4.05% HTML 4.38% Dockerfile 0.33%

wappalyzer's Introduction

Wappalyzer Travis

Wappalyzer identifies technologies on websites, including content management systems, ecommerce platforms, JavaScript frameworks, analytics tools and much more.

Prerequisites

Quick start

git clone https://github.com/aliasio/wappalyzer
cd wappalyzer
yarn install
yarn run link

Usage

Command line

node src/drivers/npm/cli.js https://example.com

Chrome extension

  • Go go about:extensions
  • Enable 'Developer mode'
  • Click 'Load unpacked'
  • Select src/drivers/webextension

Firefox extension

  • Go go about:debugging#/runtime/this-firefox
  • Click 'Load Temporary Add-on'
  • Select src/drivers/webextension/manifest.json

Specification

A long list of regular expressions is used to identify technologies on web pages. Wappalyzer inspects HTML code, as well as JavaScript variables, response headers and more.

Patterns (regular expressions) are kept in src/technologies.json. The following is an example of an application fingerprint.

Example

"Example": {
  "description": "A short description of the technology.",
  "cats": [
    "1"
  ],
  "cookies": {
    "cookie_name": "Example"
  },
  "dom": {
    "#example-id": {
      "attributes": {
        "class": "example-class"
      },
      "properties": {
        "example-property": ""
      },
      "text": "Example text content"
    }
  },
  "dns": {
    "MX": [
      "example\\.com"
    ]
  },
  "js": {
    "Example.method": ""
  },
  "excludes": "Example",
  "headers": {
    "X-Powered-By": "Example"
  },
  "html": "<link[^>]example\\.css",
  "css": "\\.example-class",
  "robots": "Disallow: /unique-path/",
  "implies": "PHP\\;confidence:50",
  "meta": {
    "generator": "(?:Example|Another Example)"
  },
  "script": "example-([0-9.]+)\\.js\\;confidence:50\\;version:\\1",
  "url": ".+\\.example\\.com",
  "oss": true,
  "saas": true,
  "pricing": ["low", "medium", "high", "freemium", "onetime", "recurring", "poa"],
  "website": "https://example.com",
}

JSON fields

Find the JSON schema at schema.json.

Required properties

Field Type Description Example
cats Array One or more category IDs. [1, 6]
website String URL of the application's website. "https://example.com"

Optional properties

Field Type Description Example
description String A short description of the technology in British English (max. 250 characters). Write in a neutral, factual tone; not like an ad. "A short description."
icon String Application icon filename. "WordPress.svg"
cpe String The CPE is a structured naming scheme for applications, see the specification. "cpe:/a:apache:http_server"
saas Boolean The technology is offered as a Software-as-a-Service (SaaS), i.e. hosted or cloud-based. true
oss Boolean The technology has an open-source license. true
pricing Array Cost indicator (based on a typical plan or average monthly price) and available pricing models. For paid products only.

One of:

  • low Up to US 100 / mo
  • mid Up US 1,000 / mo
  • high More than 10,000 / mo

Plus any of:

  • freemium Free plan available
  • onetime One-time payments accepted
  • recurring Subscriptions available
  • poa Price on asking
  • payg Pay as you go (e.g. commissions or usage-based fees)
["low", "freemium"]

Implies and excludes (optional)

Field Type Description Example
implies String | Array The presence of one application can imply the presence of another, e.g. WordpPress means PHP is also in use. "PHP"
excludes String | Array Opposite of implies. The presence of one application can exclude the presence of another. "Apache"

Patterns (optional)

Field Type Description Example
cookies Object Cookies. { "cookie_name": "Cookie value" }
dom Object Uses a query selector to inspect element properties, attributes and text content. { "#example-id": { "property": { "example-prop": "" } } }
dns Object DNS records: supports MX, TXT, SOA and NS (NPM driver only). { "MX": "example\\.com" }
js Object JavaScript properties (case sensitive). Avoid short property names to prevent matching minified code. { "jQuery.fn.jquery": "" }
headers Object HTTP response headers. { "X-Powered-By": "^WordPress$" }
html String | Array HTML source code. Patterns must include an HTML opening tag to avoid matching plain text. For performance reasons, avoid html where possible and use dom instead. "<a [^>]*href=\"index.html"
css String | Array CSS rules. Unavailable when a website enforces a same-origin policy. For performance reasons, only a portion of the available CSS rules are used to find matches. "\\.example-class"
robots String | Array Robots.txt contents. "Disallow: /unique-path/"
url String Full URL of the page. "^https?//.+\\.wordpress\\.com"
meta Object HTML meta tags, e.g. generator. { "generator": "^WordPress$" }
scripts String | Array URLs of JavaScript files included on the page. "jquery\\.js"

Patterns

Patterns are essentially JavaScript regular expressions written as strings, but with some additions.

Quirks and pitfalls

  • Because of the string format, the escape character itself must be escaped when using special characters such as the dot (\\.). Double quotes must be escaped only once (\"). Slashes do not need to be escaped (/).
  • Flags are not supported. Regular expressions are treated as case-insensitive.
  • Capture groups (()) are used for version detection. In other cases, use non-capturing groups ((?:)).
  • Use start and end of string anchors (^ and $) where possible for optimal performance.
  • Short or generic patterns can cause applications to be identified incorrectly. Try to find unique strings to match.

Tags

Tags (a non-standard syntax) can be appended to patterns (and implies and excludes, separated by \\;) to store additional information.

Tag Description Example
confidence Indicates a less reliable pattern that may cause false positives. The aim is to achieve a combined confidence of 100%. Defaults to 100% if not specified. "js": { "Mage": "\\;confidence:50" }
version Gets the version number from a pattern match using a special syntax. "scripts": "jquery-([0-9.]+)\.js\\;version:\\1"

Version syntax

Application version information can be obtained from a pattern using a capture group. A condition can be evaluated using the ternary operator (?:).

Example Description
\\1 Returns the first match.
\\1?a: Returns a if the first match contains a value, nothing otherwise.
\\1?a:b Returns a if the first match contains a value, b otherwise.
\\1?:b Returns nothing if the first match contains a value, b otherwise.
foo\\1 Returns foo with the first match appended.

wappalyzer's People

Contributors

adityapandey1998 avatar alepore avatar alexbyte avatar aliasio avatar arnaudligny avatar benoitpointet avatar ceopeo avatar craiglondon avatar daawesomep avatar donaldgoose avatar gadcam avatar hadifarnoud avatar hannolans avatar honjes avatar johanndutoit avatar johannes-andersen avatar jvoisin avatar kingthorin avatar kkadosh avatar kolen avatar kyletaylored avatar mxschmitt avatar q-- avatar rockeynebhwani avatar sebastianlopienski avatar seralf avatar smant avatar timweprovide avatar wardoost avatar wxh06 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.