Flag `ONLY_NEW` not working

Welcome to docudigger 👋

Document scraper for getting invoices automagically as pdf (useful for taxes or DMS)

🏠 Homepage

Prerequisites

npm >=9.1.2
node >=18.12.1

Configuration

All settings can be changed via CLI, env variable (even when using docker).

Setting	Description	Default value
AMAZON_USERNAME	Your Amazon username	`null`
AMAZON_PASSWORD	Your amazon password	`null`
AMAZON_TLD	Amazon top level domain	`de`
AMAZON_YEAR_FILTER	Only extracts invoices from this year (i.e. 2023)	`2023`
AMAZON_PAGE_FILTER	Only extracts invoices from this page (i.e. 2)	`null`
ONLY_NEW	Tracks already scraped documents and starts a new run at the last scraped one	`true`
FILE_DESTINATION_FOLDER	Destination path for all scraped documents	`./documents/`
FILE_FALLBACK_EXTENSION	Fallback extension when no extension can be determined	`.pdf`
DEBUG	Debug flag (sets the log level to DEBUG)	`false`
SUBFOLDER_FOR_PAGES	Creates sub folders for every scraped page/plugin	`false`
LOG_PATH	Sets the log path	`./logs/`
LOG_LEVEL	Log level (see https://github.com/winstonjs/winston#logging-levels)	`info`
RECURRING	Flag for executing the script periodically. Needs 'RECURRING_PATTERN' to be set. Default `true`when using docker container	`false`
RECURRING_PATTERN	Cron pattern to execute periodically. Needs RECURRING to true	`/30 * * *`
TZ	Timezone used for docker environments	`Europe/Berlin`

Install

⚠️ Attention: There is no need to install this locally. Just use npx

Usage

🔨 Make sure you have an .env file present (with the variables from above) in the work directory or use the appropriate cli arguments.

🚑 If you want to use an .env file, make sure you use env-cmd (https://www.npmjs.com/package/env-cmd)

$ npx docudigger COMMAND
running command...

$ npx docudigger (--version)
@disane-dev/docudigger/2.0.2 linux-x64 node-v18.16.1

$ npx docudigger --help [COMMAND]
USAGE
  $ docudigger COMMAND

`docudigger scrape all`

Scrapes all websites periodically (default for docker environment)

USAGE
  $ npx docudigger scrape all [--json] [--logLevel trace|debug|info|warn|error] [-d] [-l <value>] [-c <value> -r]

FLAGS
  -c, --recurringCron=<value>  [default: * * * * *] Cron pattern to execute periodically
  -d, --debug
  -l, --logPath=<value>        [default: ./logs/] Log path
  -r, --recurring
  --logLevel=<option>          [default: info] Specify level for logging.
                               <options: trace|debug|info|warn|error>

GLOBAL FLAGS
  --json  Format output as json.

DESCRIPTION
  Scrapes all websites periodically

EXAMPLES
  $ docudigger scrape all

`docudigger scrape amazon`

Used to get invoices from amazon

USAGE
  $ npx docudigger scrape amazon -u <value> -p <value> [--json] [--logLevel trace|debug|info|warn|error] [-d] [-l
    <value>] [-c <value> -r] [--fileDestinationFolder <value>] [--fileFallbackExentension <value>] [-t <value>]
    [--yearFilter <value>] [--pageFilter <value>] [--onlyNew]

FLAGS
  -c, --recurringCron=<value>        [default: * * * * *] Cron pattern to execute periodically
  -d, --debug
  -l, --logPath=<value>              [default: ./logs/] Log path
  -p, --password=<value>             (required) Password
  -r, --recurring
  -t, --tld=<value>                  [default: de] Amazon top level domain
  -u, --username=<value>             (required) Username
  --fileDestinationFolder=<value>    [default: ./data/] Amazon top level domain
  --fileFallbackExentension=<value>  [default: .pdf] Amazon top level domain
  --logLevel=<option>                [default: info] Specify level for logging.
                                     <options: trace|debug|info|warn|error>
  --onlyNew                          Gets only new invoices
  --pageFilter=<value>               Filters a page
  --yearFilter=<value>               Filters a year

GLOBAL FLAGS
  --json  Format output as json.

DESCRIPTION
  Used to get invoices from amazon

  Scrapes amazon invoices

EXAMPLES
  $ docudigger scrape amazon

Docker

docker run \ 
  -e AMAZON_USERNAME='[YOUR MAIL]' \ 
  -e AMAZON_PASSWORD='[YOUR PW]' \
  -e AMAZON_TLD='de' \ 
  -e AMAZON_YEAR_FILTER='2020' \
  -e AMAZON_PAGE_FILTER='1' \
  -e LOG_LEVEL='info' \
  -v "C:/temp/docudigger/:/home/node/docudigger" \
  ghcr.io/disane87/docudigger

Dev-Time 🪲

NPM

npm install
[Change created .env for your needs]
npm run start

Author

👤 Marco Franke

🤝 Contributing

Contributions, issues and feature requests are welcome!
Feel free to check issues page. You can also take a look at the contributing guide.

Show your support

Give a ⭐️ if this project helped you!

This README was generated with ❤️ by readme-md-generator

disane87 / docudigger Goto Github PK

docudigger's Introduction

Welcome to docudigger 👋

🏠 Homepage

Prerequisites

Configuration

Install

Usage

docudigger scrape all

docudigger scrape amazon

Docker

Dev-Time 🪲

NPM

Author

🤝 Contributing

Show your support

docudigger's People

Contributors

Stargazers

Watchers

Forkers

docudigger's Issues

Which website do you want to crawl?

What services do they provide?

In which industry does the company operate?

If you choose other...

Does the webpage have/needs an authentication?

Do they provide a two factor auth?

Would you provide us the credentials (privately)?

Are you willing to collaborate to get this scraper up and running?

What color (hex) represents the company?

Your recorded code

🚨 The automated release from the main branch failed. 🚨

DOCKER_USERNAME and DOCKER_PASSWORD environment variables must be set

🚨 The automated release from the dev branch failed. 🚨

Missing package.json file.

Other Branches

Open

Ignored or Blocked

Detected dependencies

🚨 The automated release from the main branch failed. 🚨

No npm token specified.

🚨 The automated release from the dev branch failed. 🚨

Missing package.json file.

Which website do you want to crawl?

What services do they provide?

In which industry does the company operate?

If you choose other...

Describe the company

Does the webpage have/needs an authentication?

Do they provide a two factor auth?

Would you provide us the credentials (privately)?

Are you willing to collaborate to get this scraper up and running?

What color (hex) represents the company?

Your recorded code

Paste your screenshots here

Paste your html here

Recommend Projects

Recommend Topics

Recommend Org

`docudigger scrape all`

`docudigger scrape amazon`

🚨 The automated release from the `main` branch failed. 🚨

🚨 The automated release from the `dev` branch failed. 🚨

Missing `package.json` file.

🚨 The automated release from the `main` branch failed. 🚨

🚨 The automated release from the `dev` branch failed. 🚨

Missing `package.json` file.