This project uses the Playwright library to crawl a specified webpage with Chrome and Firefox with WebAssembly enabled and disabled. The downloaded webpage files are downloaded to the folder JSOutput
. Screenshots are saved to the Screenshots
folder.
- Node.js
- MySQL
- Run the
found_page_schema.sql
under theDatabase
folder to set up the schema and table for metadata logging. - Run the command
npm install
in the root directory of this project (same as this README). - Run
npm run build
to rebuild the source TypeScript files in thesrc
folder and output them to thebuild
folder as JavaScript files. - Optionally, modify scripts under
src
or configure the scan parameters in theconfig.json
undersrc
and rebuild by running Step 3 again.
- Run the command
node ./build/index.js --url <url_to_san>
to scan the<url_to_san>
and all of its first-level subpages. For example, try running the commandnode ./build/index.js --url https://jkumara.github.io/pong-wasm/
as this site contains WebAssembly. - To scan a list of urls with the crawler, run the command
node ./build/index.js --file <file_path>
to read in the file at<file_path>
. For example, to use the included filesites.txt
, run the commandnode ./build/index.js --file sites.txt
- By default, both of these commands will now only download WebAssembly file found by default. If you want to download all files, add the flag
--full true
to the command. For example, if running the example in Usage 2, run the commandnode ./build/index.js --file sites.txt --full true
.