jbms / finance-dl Goto Github PK
View Code? Open in Web Editor NEWTools for automatically downloading/scraping personal financial data.
License: GNU General Public License v2.0
Tools for automatically downloading/scraping personal financial data.
License: GNU General Public License v2.0
Trying to download amazon(.de) orders but doesn't seem to want to download anything, not sure what condition it was waiting on.
Also had to change orderFilter
to timeFilter
here to get it to scrape the orders
https://github.com/jbms/finance-dl/blob/4b8e28a29b8f0faf5ab3457b5cded2079e73f3fd/finance_dl/amazon.py#L449C24-L449C24
DevTools listening on ws://127.0.0.1:64033/devtools/browser/d936fdb9-bceb-4579-bd14-74d94bfbbda2
--connect=http://localhost:64028 --session-id=c1a93491b891b70821a27455a8cb60f8
2023-10-29 21:31:05,228 amazon.py:277 [INFO] Initiating log in
2023-10-29 21:31:05,972 amazon.py:284 [INFO] You must be already logged in!
2023-10-29 21:31:06,464 amazon.py:464 [INFO] Retrieving order group: 'den letzten 30 Tagen'
2023-10-29 21:31:06,871 amazon.py:436 [INFO] Found no more pages
2023-10-29 21:31:06,899 amazon.py:464 [INFO] Retrieving order group: 'den letzten 3 Monaten'
2023-10-29 21:31:07,272 amazon.py:436 [INFO] Found no more pages
2023-10-29 21:31:07,306 amazon.py:464 [INFO] Retrieving order group: '2023'
[28972:31576:1029/213110.343:ERROR:device_event_log_impl.cc(225)] [21:31:10.343] USB: usb_service_win.cc:415 Could not read device interface GUIDs: Het systeem kan het opgegeven bestand niet vinden. (0x2)
[28972:31576:1029/213110.344:ERROR:device_event_log_impl.cc(225)] [21:31:10.343] USB: usb_service_win.cc:104 SetupDiGetDeviceProperty({{A45C254E-DF1C-4EFD-8020-67D146A850E0}, 6}) failed: Kan element niet vinden. (0x490)
2023-10-29 21:31:10,385 amazon.py:427 [INFO] Found order '305-6007193-8432350'
2023-10-29 21:31:11,514 amazon.py:427 [INFO] Found order '305-1364969-2322728'
2023-10-29 21:31:12,697 amazon.py:427 [INFO] Found order '305-7507294-1425151'
2023-10-29 21:31:12,701 amazon.py:436 [INFO] Found no more pages
2023-10-29 21:31:12,725 amazon.py:464 [INFO] Retrieving order group: '2022'
2023-10-29 21:31:14,640 amazon.py:427 [INFO] Found order '304-7286585-0240307'
2023-10-29 21:31:16,847 amazon.py:427 [INFO] Found order '304-0532664-5059518'
2023-10-29 21:31:18,000 amazon.py:427 [INFO] Found order '304-9561231-4685903'
2023-10-29 21:31:19,150 amazon.py:427 [INFO] Found order '304-1342413-3851547'
2023-10-29 21:31:20,295 amazon.py:427 [INFO] Found order '304-4127407-3957141'
2023-10-29 21:31:21,428 amazon.py:427 [INFO] Found order '304-9986951-1473951'
2023-10-29 21:31:22,585 amazon.py:427 [INFO] Found order '304-1950788-6290723'
2023-10-29 21:31:23,732 amazon.py:427 [INFO] Found order '304-7040015-3071518'
2023-10-29 21:31:24,874 amazon.py:427 [INFO] Found order '304-6516850-4829932'
2023-10-29 21:31:26,051 amazon.py:427 [INFO] Found order '304-5372088-9945954'
2023-10-29 21:31:26,069 amazon.py:441 [INFO] Next page.
2023-10-29 21:31:28,034 amazon.py:427 [INFO] Found order '304-7060525-9372365'
2023-10-29 21:31:29,158 amazon.py:427 [INFO] Found order '304-4859404-2549141'
2023-10-29 21:31:30,296 amazon.py:427 [INFO] Found order '304-4924192-0299519'
2023-10-29 21:31:31,412 amazon.py:427 [INFO] Found order '305-8808333-1320322'
2023-10-29 21:31:32,559 amazon.py:427 [INFO] Found order '304-0657323-9229902'
2023-10-29 21:31:33,739 amazon.py:427 [INFO] Found order '305-2521019-8087569'
2023-10-29 21:31:34,939 amazon.py:427 [INFO] Found order '306-6714178-4185146'
2023-10-29 21:31:34,945 amazon.py:436 [INFO] Found no more pages
2023-10-29 21:31:34,975 amazon.py:464 [INFO] Retrieving order group: '2021'
2023-10-29 21:31:37,610 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-5183444-7545952'
2023-10-29 21:31:37,674 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-5183444-7545952'
2023-10-29 21:31:37,739 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-5183444-7545952'
2023-10-29 21:31:37,819 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-5183444-7545952'
2023-10-29 21:31:37,882 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-5183444-7545952'
2023-10-29 21:31:37,958 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-5183444-7545952'
2023-10-29 21:31:39,651 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-4252588-7295546'
2023-10-29 21:31:39,709 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-4252588-7295546'
2023-10-29 21:31:39,781 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-4252588-7295546'
2023-10-29 21:31:39,845 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-4252588-7295546'
2023-10-29 21:31:39,869 amazon.py:441 [INFO] Next page.
2023-10-29 21:31:41,930 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-2656410-5349950'
2023-10-29 21:31:42,006 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-2656410-5349950'
2023-10-29 21:31:42,086 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-2656410-5349950'
2023-10-29 21:31:42,151 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-2656410-5349950'
2023-10-29 21:31:42,218 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-2656410-5349950'
2023-10-29 21:31:42,291 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-2656410-5349950'
2023-10-29 21:31:43,989 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-4665213-3963505'
2023-10-29 21:31:44,059 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-4665213-3963505'
2023-10-29 21:31:44,124 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-4665213-3963505'
2023-10-29 21:31:44,194 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-4665213-3963505'
2023-10-29 21:31:44,206 amazon.py:441 [INFO] Next page.
2023-10-29 21:31:46,816 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-6833943-6329939'
2023-10-29 21:31:46,879 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-6833943-6329939'
2023-10-29 21:31:46,953 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-6833943-6329939'
2023-10-29 21:31:47,016 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-6833943-6329939'
2023-10-29 21:31:47,081 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-6833943-6329939'
2023-10-29 21:31:47,146 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-6833943-6329939'
2023-10-29 21:31:48,830 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-4527795-0562718'
2023-10-29 21:31:48,892 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-4527795-0562718'
2023-10-29 21:31:48,970 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-4527795-0562718'
2023-10-29 21:31:49,034 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-4527795-0562718'
2023-10-29 21:31:49,050 amazon.py:441 [INFO] Next page.
2023-10-29 21:31:51,003 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-8824203-0488334'
2023-10-29 21:31:51,059 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-8824203-0488334'
2023-10-29 21:31:51,124 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-8824203-0488334'
2023-10-29 21:31:51,177 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-8824203-0488334'
2023-10-29 21:31:51,241 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-8824203-0488334'
2023-10-29 21:31:51,246 amazon.py:436 [INFO] Found no more pages
2023-10-29 21:31:51,279 amazon.py:464 [INFO] Retrieving order group: '2020'
2023-10-29 21:31:53,301 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-8315787-5112305'
2023-10-29 21:31:53,368 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-8315787-5112305'
2023-10-29 21:31:53,446 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-8315787-5112305'
2023-10-29 21:31:53,510 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-8315787-5112305'
2023-10-29 21:31:53,580 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-8315787-5112305'
2023-10-29 21:31:53,641 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-8315787-5112305'
2023-10-29 21:31:55,319 amazon.py:425 [INFO] Skipping already-downloaded invoice: '302-6574931-4521130'
2023-10-29 21:31:55,385 amazon.py:425 [INFO] Skipping already-downloaded invoice: '302-6574931-4521130'
2023-10-29 21:31:55,451 amazon.py:425 [INFO] Skipping already-downloaded invoice: '302-6574931-4521130'
2023-10-29 21:31:55,510 amazon.py:425 [INFO] Skipping already-downloaded invoice: '302-6574931-4521130'
2023-10-29 21:31:55,527 amazon.py:441 [INFO] Next page.
2023-10-29 21:31:57,390 amazon.py:425 [INFO] Skipping already-downloaded invoice: '302-3192929-2299507'
2023-10-29 21:31:57,444 amazon.py:425 [INFO] Skipping already-downloaded invoice: '302-3192929-2299507'
2023-10-29 21:31:57,499 amazon.py:425 [INFO] Skipping already-downloaded invoice: '302-3192929-2299507'
2023-10-29 21:31:57,556 amazon.py:425 [INFO] Skipping already-downloaded invoice: '302-3192929-2299507'
2023-10-29 21:31:57,607 amazon.py:425 [INFO] Skipping already-downloaded invoice: '302-3192929-2299507'
2023-10-29 21:31:57,659 amazon.py:425 [INFO] Skipping already-downloaded invoice: '302-3192929-2299507'
2023-10-29 21:31:57,710 amazon.py:425 [INFO] Skipping already-downloaded invoice: '302-3192929-2299507'
2023-10-29 21:31:57,716 amazon.py:436 [INFO] Found no more pages
2023-10-29 21:31:57,744 amazon.py:464 [INFO] Retrieving order group: '2019'
2023-10-29 21:31:59,862 amazon.py:425 [INFO] Skipping already-downloaded invoice: '302-6447883-6259537'
2023-10-29 21:31:59,931 amazon.py:425 [INFO] Skipping already-downloaded invoice: '302-6447883-6259537'
2023-10-29 21:31:59,993 amazon.py:425 [INFO] Skipping already-downloaded invoice: '302-6447883-6259537'
2023-10-29 21:32:00,058 amazon.py:425 [INFO] Skipping already-downloaded invoice: '302-6447883-6259537'
2023-10-29 21:32:00,118 amazon.py:425 [INFO] Skipping already-downloaded invoice: '302-6447883-6259537'
2023-10-29 21:32:00,181 amazon.py:425 [INFO] Skipping already-downloaded invoice: '302-6447883-6259537'
2023-10-29 21:32:01,322 amazon.py:425 [INFO] Skipping already-downloaded invoice: '306-7975075-7649159'
2023-10-29 21:32:01,382 amazon.py:425 [INFO] Skipping already-downloaded invoice: '306-7975075-7649159'
2023-10-29 21:32:01,450 amazon.py:425 [INFO] Skipping already-downloaded invoice: '306-7975075-7649159'
2023-10-29 21:32:01,514 amazon.py:425 [INFO] Skipping already-downloaded invoice: '306-7975075-7649159'
2023-10-29 21:32:01,534 amazon.py:441 [INFO] Next page.
2023-10-29 21:32:03,450 amazon.py:425 [INFO] Skipping already-downloaded invoice: '306-6615968-4057139'
2023-10-29 21:32:03,453 amazon.py:436 [INFO] Found no more pages
2023-10-29 21:32:03,487 amazon.py:464 [INFO] Retrieving order group: '2018'
2023-10-29 21:32:05,546 amazon.py:425 [INFO] Skipping already-downloaded invoice: '302-9120740-4793910'
2023-10-29 21:32:05,611 amazon.py:425 [INFO] Skipping already-downloaded invoice: '302-9120740-4793910'
2023-10-29 21:32:05,684 amazon.py:425 [INFO] Skipping already-downloaded invoice: '302-9120740-4793910'
2023-10-29 21:32:05,749 amazon.py:425 [INFO] Skipping already-downloaded invoice: '302-9120740-4793910'
2023-10-29 21:32:05,814 amazon.py:425 [INFO] Skipping already-downloaded invoice: '302-9120740-4793910'
2023-10-29 21:32:05,883 amazon.py:425 [INFO] Skipping already-downloaded invoice: '302-9120740-4793910'
2023-10-29 21:32:07,038 amazon.py:425 [INFO] Skipping already-downloaded invoice: '028-4791616-8929953'
2023-10-29 21:32:07,107 amazon.py:425 [INFO] Skipping already-downloaded invoice: '028-4791616-8929953'
2023-10-29 21:32:07,172 amazon.py:425 [INFO] Skipping already-downloaded invoice: '028-4791616-8929953'
2023-10-29 21:32:07,234 amazon.py:425 [INFO] Skipping already-downloaded invoice: '028-4791616-8929953'
2023-10-29 21:32:07,246 amazon.py:441 [INFO] Next page.
2023-10-29 21:32:09,193 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-7201984-1193117'
2023-10-29 21:32:09,244 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-7201984-1193117'
2023-10-29 21:32:09,301 amazon.py:425 [INFO] Skipping already-downloaded invoice: '305-7201984-1193117'
2023-10-29 21:32:09,308 amazon.py:436 [INFO] Found no more pages
2023-10-29 21:32:09,350 amazon.py:464 [INFO] Retrieving order group: '2017'
2023-10-29 21:32:09,759 amazon.py:436 [INFO] Found no more pages
2023-10-29 21:32:09,783 amazon.py:464 [INFO] Retrieving order group: '2016'
2023-10-29 21:32:10,184 amazon.py:436 [INFO] Found no more pages
2023-10-29 21:32:10,211 amazon.py:464 [INFO] Retrieving order group: '2015'
2023-10-29 21:32:10,612 amazon.py:436 [INFO] Found no more pages
2023-10-29 21:32:10,648 amazon.py:464 [INFO] Retrieving order group: 'Archivierte Bestellungen'
2023-10-29 21:32:11,001 amazon.py:436 [INFO] Found no more pages
Traceback (most recent call last):
File "D:\Finance\finance-dl-master\finance_dl\scrape_lib.py", line 403, in retry
return func()
File "D:\Finance\finance-dl-master\finance_dl\scrape_lib.py", line 423, in fetch
scraper.run()
File "D:\Finance\finance-dl-master\finance_dl\amazon.py", line 585, in run
self.get_orders(
File "D:\Finance\finance-dl-master\finance_dl\amazon.py", line 478, in get_orders
retrieve_all_order_groups()
File "D:\Finance\finance-dl-master\finance_dl\amazon.py", line 448, in retrieve_all_order_groups
(order_filter,), = self.wait_and_return(
File "D:\Finance\finance-dl-master\finance_dl\scrape_lib.py", line 239, in wait_and_return
WebDriverWait(self.driver, timeout).until(predicate, message=message)
File "D:\Finance\env-financedl\lib\site-packages\selenium\webdriver\support\wait.py", line 87, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: Waiting to match conditions
Can't really provide much more info since there isn't any error
DevTools listening on ws://127.0.0.1:50082/devtools/browser/f72cd410-a08b-41e8-ab1a-cc6b26aad299
--connect=http://localhost:50079 --session-id=7d8f0a6373be86bc7ee5a26c1d39f98d
2023-10-29 21:55:41,054 amazon.py:277 [INFO] Initiating log in
2023-10-29 21:55:41,608 amazon.py:288 [INFO] Looking for sign-in link
2023-10-29 21:55:41,943 amazon.py:294 [INFO] Looking for username link
2023-10-29 21:55:42,388 amazon.py:304 [INFO] Looking for password link
2023-10-29 21:55:42,435 amazon.py:310 [INFO] Looking for "remember me" checkbox
2023-10-29 21:55:43,689 amazon.py:319 [INFO] Logged in
2023-10-29 21:55:44,225 amazon.py:464 [INFO] Retrieving order group: 'last 30 days'
2023-10-29 21:55:44,611 amazon.py:436 [INFO] Found no more pages
2023-10-29 21:55:44,639 amazon.py:464 [INFO] Retrieving order group: 'past three months'
2023-10-29 21:55:45,041 amazon.py:436 [INFO] Found no more pages
2023-10-29 21:55:45,068 amazon.py:464 [INFO] Retrieving order group: '2023'
2023-10-29 21:55:45,878 amazon.py:436 [INFO] Found no more pages
2023-10-29 21:55:45,910 amazon.py:464 [INFO] Retrieving order group: '2022'
2023-10-29 21:55:46,303 amazon.py:436 [INFO] Found no more pages
2023-10-29 21:55:46,331 amazon.py:464 [INFO] Retrieving order group: '2021'
2023-10-29 21:55:47,726 amazon.py:436 [INFO] Found no more pages
2023-10-29 21:55:47,753 amazon.py:464 [INFO] Retrieving order group: '2020'
2023-10-29 21:55:48,153 amazon.py:436 [INFO] Found no more pages
2023-10-29 21:55:48,178 amazon.py:464 [INFO] Retrieving order group: '2019'
2023-10-29 21:55:48,568 amazon.py:436 [INFO] Found no more pages
2023-10-29 21:55:48,599 amazon.py:464 [INFO] Retrieving order group: '2018'
2023-10-29 21:55:49,003 amazon.py:436 [INFO] Found no more pages
2023-10-29 21:55:49,033 amazon.py:464 [INFO] Retrieving order group: '2017'
2023-10-29 21:55:49,438 amazon.py:436 [INFO] Found no more pages
2023-10-29 21:55:49,464 amazon.py:464 [INFO] Retrieving order group: '2016'
2023-10-29 21:55:49,875 amazon.py:436 [INFO] Found no more pages
2023-10-29 21:55:49,902 amazon.py:464 [INFO] Retrieving order group: '2015'
2023-10-29 21:55:50,319 amazon.py:436 [INFO] Found no more pages
It has been a year+ since I used finance-dl. I tried it on my previous pc where I downloaded finance-dl and imported it as a module locally. I think I messed up my env at some point so I remade it when I wanted to use it again a week ago and nothing
(only use amazon, paypal) worked. Now I'm at a point that I can actually login again on paypal but now I'm getting this error.
Anyone can supply a requirements file? I had a lot of issues getting up to this point because of newer version packages that don't work.
Now I got up to a point I have a persistent google profile, which loads paypal but then just runs into this error. This is when the paypal page is on the activities page.
I also noticed this is an issue with the CSFR token? Because I found this in the paypal file, might be an issue with paypal already being on activities page?
def get_csrf_token(self):
if self.csrf_token is not None: return self.csrf_token
logging.info('Getting CSRF token')
self.driver.get('https://www.paypal.com/myaccount/transactions/')
# Get CSRF token
body_element, = self.wait_and_locate((By.XPATH,
'//body[@data-token!=""]'))
self.csrf_token = body_element.get_attribute('data-token')
return self.csrf_token
Traceback (most recent call last):
File "C:\Users\Dieter\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Dieter\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "d:\finance\import beancount\finance-dl\finance_dl\cli.py", line 91, in <module>
main()
File "d:\finance\import beancount\finance-dl\finance_dl\cli.py", line 87, in main
module.run(**spec)
File "d:\finance\import beancount\finance-dl\finance_dl\paypal.py", line 270, in run
scrape_lib.run_with_scraper(Scraper, **kwargs)
File "d:\finance\import beancount\finance-dl\finance_dl\scrape_lib.py", line 424, in run_with_scraper
retry(fetch)
File "d:\finance\import beancount\finance-dl\finance_dl\scrape_lib.py", line 402, in retry
return func()
File "d:\finance\import beancount\finance-dl\finance_dl\scrape_lib.py", line 422, in fetch
scraper.run()
File "d:\finance\import beancount\finance-dl\finance_dl\paypal.py", line 266, in run
self.save_transactions()
File "d:\finance\import beancount\finance-dl\finance_dl\paypal.py", line 207, in save_transactions
transaction_list = self.get_transaction_list()
File "d:\finance\import beancount\finance-dl\finance_dl\paypal.py", line 196, in get_transaction_list
resp = self.make_json_request(url)
File "d:\finance\import beancount\finance-dl\finance_dl\paypal.py", line 169, in make_json_request
'x-csrf-token': self.get_csrf_token(),
File "d:\finance\import beancount\finance-dl\finance_dl\paypal.py", line 180, in get_csrf_token
body_element, = self.wait_and_locate((By.XPATH,
File "d:\finance\import beancount\finance-dl\finance_dl\scrape_lib.py", line 254, in wait_and_locate
return self.wait_and_return(
File "d:\finance\import beancount\finance-dl\finance_dl\scrape_lib.py", line 238, in wait_and_return
WebDriverWait(self.driver, timeout).until(predicate, message=message)
File "D:\Finance\Import Beancount\env\lib\site-packages\selenium\webdriver\support\wait.py", line 87, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: Waiting to locate (('xpath', '//body[@data-token!=""]'),)
Stacktrace:
GetHandleVerifier [0x00007FF61DA08EF2+54786]
(No symbol) [0x00007FF61D975612]
(No symbol) [0x00007FF61D82A64B]
(No symbol) [0x00007FF61D86B79C]
(No symbol) [0x00007FF61D86B91C]
(No symbol) [0x00007FF61D8A6D87]
(No symbol) [0x00007FF61D88BEAF]
(No symbol) [0x00007FF61D8A4D02]
(No symbol) [0x00007FF61D88BC43]
(No symbol) [0x00007FF61D860941]
(No symbol) [0x00007FF61D861B84]
GetHandleVerifier [0x00007FF61DD57F52+3524194]
GetHandleVerifier [0x00007FF61DDAD800+3874576]
GetHandleVerifier [0x00007FF61DDA5D7F+3843215]
GetHandleVerifier [0x00007FF61DAA5086+694166]
(No symbol) [0x00007FF61D980A88]
(No symbol) [0x00007FF61D97CA94]
(No symbol) [0x00007FF61D97CBC2]
(No symbol) [0x00007FF61D96CC83]
BaseThreadInitThunk [0x00007FFC4336257D+29]
RtlUserThreadStart [0x00007FFC4428AA78+40]
I've got both Chase and Amex OFX downloading working, however, the behavior I'm seeing is that the updater downloads the OFX data for min_start_date and then walks backward day by day and never stopping. I end up with a output_dir full of ofx files with the same data. The only thing that differs between the files is the dtstart/dtend. The actual transactions contained are the same.
For reference, here is how I'm running this:
python -m finance_dl.update --config-module finance_dl_config --log-dir logs update amex
The earliest transaction I can get back from the Amex OFX endpoint is from 2017-09-16, but my output_dir contains files from 19850625-19850625--1556073305.ofx to 19900102-19900102--1556070848.ofx
From what I can discern from reading your code and parsing the OFX data it appears to me, that if you call account.download(days=num_days)
the dtstart/dtend returned are based on the num_days, and not the actual transactions dtposted. So since each call returns a valid dtstart/dtend range, the updater thinks it has to keep walking backward to find a range that is invalid. But it never will.
From my testing:
In [42]: num_days = 5353 # this is what the first value of `mid` is
In [43]: data = account.download(days=num_days).read()
In [44]: dtstart = parse_ofx_time(re.findall(r'<DTSTART>([^<]+)', data)[0])
In [45]: dtend = parse_ofx_time(re.findall(r'<DTEND>([^<]+)', data)[0])
In [46]: print('#', dtstart.date(), '--', dtend.date())
# 2004-08-27 -- 2019-04-24
In [47]: txn_dates = [parse_ofx_time(d) for d in re.findall(r'<DTPOSTED>([^<]+)', data)]
In [48]: txn_dates.sort()
In [49]: print('#', txn_dates[0], '--', txn_dates[-1])
# 2017-09-16 00:00:00 -- 2019-04-17 00:00:00
I've opened captin411/ofxclient#69 about this issue, and while my PR captin411/ofxclient#70 will resolve the issue for ofxclient, it will somehow need to be managed in finance-dl as well. Once (if?) that PR is accepted, how would we update our finance-dl config to supply that parameter to Institution.accounts()? Or should finance-dl automatically fix it if tdbank is in the ofx url?
The PayPal-importer seems to fail when trying to get the csrf-token after logging in.
It seems like the whole structure of the webpage has changed.
I wasnt able to find the csrf-token manually.
Relevant Traceback:
Traceback (most recent call last): File "/opt/homebrew/Cellar/[email protected]/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/homebrew/Cellar/[email protected]/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/Users/thies/Finanzen/Beancount/venv/lib/python3.10/site-packages/finance_dl/cli.py", line 94, in <module> main() File "/Users/thies/Finanzen/Beancount/venv/lib/python3.10/site-packages/finance_dl/cli.py", line 90, in main module.run(**spec) File "/Users/thies/Finanzen/Beancount/venv/lib/python3.10/site-packages/finance_dl/paypal.py", line 262, in run scrape_lib.run_with_scraper(Scraper, **kwargs) File "/Users/thies/Finanzen/Beancount/venv/lib/python3.10/site-packages/finance_dl/scrape_lib.py", line 433, in run_with_scraper retry(fetch) File "/Users/thies/Finanzen/Beancount/venv/lib/python3.10/site-packages/finance_dl/scrape_lib.py", line 411, in retry return func() File "/Users/thies/Finanzen/Beancount/venv/lib/python3.10/site-packages/finance_dl/scrape_lib.py", line 431, in fetch scraper.run() File "/Users/thies/Finanzen/Beancount/venv/lib/python3.10/site-packages/finance_dl/paypal.py", line 258, in run self.save_transactions() File "/Users/thies/Finanzen/Beancount/venv/lib/python3.10/site-packages/finance_dl/paypal.py", line 200, in save_transactions transaction_list = self.get_transaction_list() File "/Users/thies/Finanzen/Beancount/venv/lib/python3.10/site-packages/finance_dl/paypal.py", line 193, in get_transaction_list resp = self.make_json_request(url) File "/Users/thies/Finanzen/Beancount/venv/lib/python3.10/site-packages/finance_dl/paypal.py", line 166, in make_json_request 'x-csrf-token': self.get_csrf_token(), File "/Users/thies/Finanzen/Beancount/venv/lib/python3.10/site-packages/finance_dl/paypal.py", line 177, in get_csrf_token body_element, = self.wait_and_locate((By.ID, "__react_data__")) File "/Users/thies/Finanzen/Beancount/venv/lib/python3.10/site-packages/finance_dl/scrape_lib.py", line 263, in wait_and_locate return self.wait_and_return( File "/Users/thies/Finanzen/Beancount/venv/lib/python3.10/site-packages/finance_dl/scrape_lib.py", line 247, in wait_and_return WebDriverWait(self.driver, timeout).until(predicate, message=message) File "/Users/thies/Finanzen/Beancount/venv/lib/python3.10/site-packages/selenium/webdriver/support/wait.py", line 87, in until time.sleep(self._poll)
The README describes starting the automatic updates process by checking the status:
python -m finance_dl.cli --config-module example_finance_dl_config --log-dir logs status
On my system this fails:
usage: cli.py [-h] [--config-module CONFIG_MODULE]
(--config CONFIG | --spec SPEC) [--interactive] [--visible]
[--log LOG]
cli.py: error: one of the arguments --config/-c --spec/-s is required
If I add my --config google_purchases
parameter, then it complains about the --log-dir
parameter:
usage: cli.py [-h] [--config-module CONFIG_MODULE]
(--config CONFIG | --spec SPEC) [--interactive] [--visible]
[--log LOG]
cli.py: error: unrecognized arguments: --log-dir logs status
I think the entire Automatic Usage section needs to be updated, because I can't figure out how to accomplish those tasks with the current code.
When I try to use the amazon scraper, amazon presents me with just the email text box, with a Continue
button.
Once I click that, the scraper works, as it can then find the password.
It might be nice for it to automatically click on the Continue button if there is one?
I used default parameters and ran this command on different days for Fidelity, but it seems it returns data only exactly 2 years from today, even though ofx.py
is searching from 1990-01-01.
Is this an issue with finance-dl
, ofxclient
, or Fidelity?
def CONFIG_fidelity_investments():
# To determine the correct values for `id`, `org`, and `url` for your
# financial institution, search on https://www.ofxhome.com/
ofx_params = {
'id': '7776',
'org': 'fidelity.com',
'url': 'https://ofx.fidelity.com/ftgw/OFX/clients/download',
'username': '',
'password': '',
}
return dict(
module='finance_dl.ofx',
ofx_params=ofx_params,
output_directory=os.path.join(data_dir, 'fidelity_investments'),
)
โ python -m finance_dl.cli --config-module finance_dl_config --config fidelity_investments
2020-04-14 13:19:49,658 ofx.py:183 [INFO] Binary searching to find earliest data available for account X12345678.
2020-04-14 13:19:49,658 ofx.py:146 [INFO] Trying to retrieve data for X12345678 starting at 2005-02-21.
2020-04-14 13:19:50,156 ofx.py:146 [INFO] Trying to retrieve data for X12345678 starting at 1997-07-28.
2020-04-14 13:19:50,747 ofx.py:146 [INFO] Trying to retrieve data for X12345678 starting at 1993-10-14.
2020-04-14 13:19:51,329 ofx.py:146 [INFO] Trying to retrieve data for X12345678 starting at 1991-11-23.
2020-04-14 13:19:51,786 ofx.py:146 [INFO] Trying to retrieve data for X12345678 starting at 1990-12-12.
2020-04-14 13:19:52,149 ofx.py:146 [INFO] Trying to retrieve data for X12345678 starting at 1990-06-22.
2020-04-14 13:19:52,544 ofx.py:146 [INFO] Trying to retrieve data for X12345678 starting at 1990-03-28.
2020-04-14 13:19:52,919 ofx.py:146 [INFO] Trying to retrieve data for X12345678 starting at 1990-02-13.
2020-04-14 13:19:53,393 ofx.py:146 [INFO] Trying to retrieve data for X12345678 starting at 1990-01-22.
2020-04-14 13:19:53,832 ofx.py:146 [INFO] Trying to retrieve data for X12345678 starting at 1990-01-11.
2020-04-14 13:19:54,242 ofx.py:146 [INFO] Trying to retrieve data for X12345678 starting at 1990-01-06.
2020-04-14 13:19:54,656 ofx.py:146 [INFO] Trying to retrieve data for X12345678 starting at 1990-01-03.
2020-04-14 13:19:55,063 ofx.py:146 [INFO] Trying to retrieve data for X12345678 starting at 1990-01-02.
2020-04-14 13:19:55,474 ofx.py:268 [INFO] Received data 2018-04-15 16:19:55 -- 2018-07-14 16:19:55
2020-04-14 13:19:55,491 ofx.py:146 [INFO] Trying to retrieve data for X12345678 starting at 2018-07-12.
2020-04-14 13:19:56,288 ofx.py:268 [INFO] Received data 2018-07-12 00:00:00 -- 2018-10-10 00:00:00
2020-04-14 13:19:56,304 ofx.py:146 [INFO] Trying to retrieve data for X12345678 starting at 2018-10-08.
2020-04-14 13:19:56,700 ofx.py:268 [INFO] Received data 2018-10-08 00:00:00 -- 2019-01-05 23:00:00
2020-04-14 13:19:56,718 ofx.py:146 [INFO] Trying to retrieve data for X12345678 starting at 2019-01-03.
2020-04-14 13:19:57,517 ofx.py:268 [INFO] Received data 2019-01-03 00:00:00 -- 2019-04-03 01:00:00
2020-04-14 13:19:57,535 ofx.py:146 [INFO] Trying to retrieve data for X12345678 starting at 2019-04-01.
2020-04-14 13:19:58,342 ofx.py:268 [INFO] Received data 2019-04-01 00:00:00 -- 2019-06-30 00:00:00
2020-04-14 13:19:58,360 ofx.py:146 [INFO] Trying to retrieve data for X12345678 starting at 2019-06-28.
2020-04-14 13:19:59,166 ofx.py:268 [INFO] Received data 2019-06-28 00:00:00 -- 2019-09-26 00:00:00
2020-04-14 13:19:59,183 ofx.py:146 [INFO] Trying to retrieve data for X12345678 starting at 2019-09-24.
2020-04-14 13:19:59,567 ofx.py:268 [INFO] Received data 2019-09-24 00:00:00 -- 2019-12-22 23:00:00
2020-04-14 13:19:59,583 ofx.py:146 [INFO] Trying to retrieve data for X12345678 starting at 2019-12-20.
2020-04-14 13:19:59,978 ofx.py:268 [INFO] Received data 2019-12-20 00:00:00 -- 2020-03-19 01:00:00
2020-04-14 13:19:59,993 ofx.py:146 [INFO] Trying to retrieve data for X12345678 starting at 2020-03-17.
2020-04-14 13:20:00,682 ofx.py:268 [INFO] Received data 2020-03-17 00:00:00 -- 2020-04-14 16:20:00
2020-04-14 13:20:00,702 ofx.py:146 [INFO] Trying to retrieve data for X12345678 starting at 2020-04-12.
2020-04-14 13:20:01,101 ofx.py:268 [INFO] Received data 2020-04-12 00:00:00 -- 2020-04-14 16:20:00
I opened a new account at an institution, but I haven't funded it yet. Meaning that it has no transactions whatsoever. However, I'm running finance-dl
anyway, because I have other accounts at the same institution whose transactions I'd like to fetch. When I run fintance-dl
to fetch transactions from this institution via OFX, I get an error:
$ python3 -m finance_dl.update --config-module finance_dl_config --log-dir logs update institution --force
Here's the tail end of the traceback I get:
...
[0/1] institution [41s elapsed] File "/usr/lib/python3.7/site-packages/finance_dl/ofx.py", line 187, in get_earliest_data
[0/1] institution [41s elapsed] account.number)
[0/1] institution [41s elapsed] RuntimeError: Failed to retrieve any data for account: 12345678
[1/1] institution [41s elapsed] FAILED with return code 1
Maybe logging/warning that there are no transactions for that account, and merrily continuing onward to the other accounts. I'm open to ideas about what right behavior here is!
Platform: Windows 10
Selenium Version: 4.2.0
When running finance_dl.cli the program crashes with the below error.
(.venv) PS C:\Users\user\source\repos\beancount> python -m finance_dl.cli --config-module finance_dl_config --config paypal -i
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\Users\user\source\repos\beancount\.venv\Lib\site-packages\finance_dl\cli.py", line 94, in <module>
main()
File "C:\Users\user\source\repos\beancount\.venv\Lib\site-packages\finance_dl\cli.py", line 85, in main
with interactive_func(**spec) as ns:
File "C:\Python311\Lib\contextlib.py", line 137, in __enter__
return next(self.gen)
^^^^^^^^^^^^^^
File "C:\Users\user\source\repos\beancount\.venv\Lib\site-packages\finance_dl\scrape_lib.py", line 435, in interact_with_scraper
with temp_scraper(scraper_class, **kwargs) as scraper:
File "C:\Python311\Lib\contextlib.py", line 137, in __enter__
return next(self.gen)
^^^^^^^^^^^^^^
File "C:\Users\user\source\repos\beancount\.venv\Lib\site-packages\finance_dl\scrape_lib.py", line 390, in temp_scraper
scraper = scraper_type(*args, download_dir=download_dir,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\user\source\repos\beancount\.venv\Lib\site-packages\finance_dl\paypal.py", line 124, in __init__
super().__init__(use_seleniumrequests=True, **kwargs)
File "C:\Users\user\source\repos\beancount\.venv\Lib\site-packages\finance_dl\scrape_lib.py", line 176, in __init__
self.driver = driver_class(
^^^^^^^^^^^^^
File "C:\Users\user\source\repos\beancount\.venv\Lib\site-packages\seleniumrequests\request.py", line 144, in __init__
super(RequestsSessionMixin, self).__init__(*args, **kwargs)
File "C:\Users\user\source\repos\beancount\.venv\Lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 70, in __init__
super(WebDriver, self).__init__(DesiredCapabilities.CHROME['browserName'], "goog",
File "C:\Users\user\source\repos\beancount\.venv\Lib\site-packages\selenium\webdriver\chromium\webdriver.py", line 89, in __init__
self.service.start()
File "C:\Users\user\source\repos\beancount\.venv\Lib\site-packages\selenium\webdriver\common\service.py", line 105, in start
raise WebDriverException("Can not connect to the Service %s" % self.path)
selenium.common.exceptions.WebDriverException: Message: Can not connect to the Service finance-dl-chromedriver-wrapper
The exception is:
File "/home/user/.local/lib/python3.8/site-packages/beancount_import/webserver.py", line 493, in _handle_reconciler_loaded
loaded_reconciler = loaded_future.result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 432, in result
return self.__get_result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
raise self._exception
File "/home/user/.local/lib/python3.8/site-packages/beancount_import/thread_helpers.py", line 13, in wrapper
f.set_result(fn(*args, **kwargs))
File "/home/user/.local/lib/python3.8/site-packages/beancount_import/reconcile.py", line 396, in __init__
all_source_results = self._prepare_sources()
File "/home/user/.local/lib/python3.8/site-packages/beancount_import/reconcile.py", line 515, in _prepare_sources
source.prepare(self.editor, source_results)
File "/home/user/.local/lib/python3.8/site-packages/beancount_import/source/paypal.py", line 624, in prepare
jsonschema.validate(txn, transaction_schema)
File "/usr/lib/python3/dist-packages/jsonschema/validators.py", line 934, in validate
raise error
jsonschema.exceptions.ValidationError: 'isCredit' is a required property
it occurs on the following object:
{'amount': {'feeAmount': '$0.00',
'grossAmount': '$19.99',
'isZeroFee': True,
'netAmount': '$19.99'},
'cameFromResCenter': False,
'cameFromSummary': False,
'counterparty': {'detailsCounterpartyText': 'Company Inc.',
'name': 'Company Inc.'},
'counterpartyAccountNumber': '123456789',
'counterpartyBizName': 'Company Name.',
'flags': {'isBuyer': True,
'isOrder': True,
'shouldUpgradeAccount': False},
'fptiTag': 'orderplaced',
'isNewActivityUIEnabled': True,
'links': {'reportDispute': {'linkUrl': '/resolutioncenter/O-0T34207418947273H',
'target': '_blank'},
'upgradeToBusinessAcct': {'linkUrl': '/US/merchantsignup/router',
'target': ''}},
'merchantLogoUrl': None,
'printDetailsLink': {'linkUrl': '/myaccount/transactions/print-orders/O-0T34207418947273H',
'target': '_blank'},
'statusInfo': ['Your payment method will be charged when Company Inc. completes your order.'],
'transactionId': 'O-0T34207418947273H',
'transactionType': 'Order Placed',
'viewContext': 'tdfullpage'}
I would appreciate some guidance on how to modify the schema; I'll be happy to submit a PR if I get it fixed.
As pointed out in #69, v1/ticker returns an error if a stablecoin is passed. Deal with this permanently.
ofxclient bug report
So basically discover.com has aggressive protection of request rate.
Do you have any better ideas than adding sleep()?
I added sleep() into ofx.py ...
I love the idea here but I'm having lots of trouble with these importers.
Here's my summary trying to get a few to work:
I'm not throwing shade as we all know how hard it is to maintain scrapers. Just noting this here to potentially save other folks time.
(then again if someone is having success with these, then ignore me :)
When I use ultipro_google, it logs in properly but crashes with:
[0/1] google_payroll [41s elapsed] 2019-09-27 21:17:23,347 ultipro_google.py:144 [INFO] Document datetime.date(2019, 9, 26) : 'UA29063061': Downloading
[0/1] google_payroll [42s elapsed] Traceback (most recent call last):
[0/1] google_payroll [42s elapsed] File "/home/mark/p/finances/.venv/lib/python3.7/site-packages/finance_dl/scrape_lib.py", line 402, in retry
[0/1] google_payroll [42s elapsed] return func()
[0/1] google_payroll [42s elapsed] File "/home/mark/p/finances/.venv/lib/python3.7/site-packages/finance_dl/scrape_lib.py", line 422, in fetch
[0/1] google_payroll [42s elapsed] scraper.run()
[0/1] google_payroll [42s elapsed] File "/home/mark/p/finances/.venv/lib/python3.7/site-packages/finance_dl/ultipro_google.py", line 206, in run
[0/1] google_payroll [42s elapsed] self.download_statements()
[0/1] google_payroll [42s elapsed] File "/home/mark/p/finances/.venv/lib/python3.7/site-packages/finance_dl/ultipro_google.py", line 200, in download_statements
[0/1] google_payroll [42s elapsed] downloaded_statements=downloaded_statements,
[0/1] google_payroll [42s elapsed] File "/home/mark/p/finances/.venv/lib/python3.7/site-packages/finance_dl/ultipro_google.py", line 152, in get_next_statement
[0/1] google_payroll [42s elapsed] download_link.click()
[0/1] google_payroll [42s elapsed] File "/home/mark/p/finances/.venv/lib/python3.7/site-packages/selenium/webdriver/remote/webelement.py", line 80, in click
[0/1] google_payroll [42s elapsed] self._execute(Command.CLICK_ELEMENT)
[0/1] google_payroll [42s elapsed] File "/home/mark/p/finances/.venv/lib/python3.7/site-packages/selenium/webdriver/remote/webelement.py", line 633, in _execute
[0/1] google_payroll [42s elapsed] return self._parent.execute(command, params)
[0/1] google_payroll [42s elapsed] File "/home/mark/p/finances/.venv/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
[0/1] google_payroll [42s elapsed] self.error_handler.check_response(response)
[0/1] google_payroll [42s elapsed] File "/home/mark/p/finances/.venv/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
[0/1] google_payroll [42s elapsed] raise exception_class(message, screen, stacktrace)
[0/1] google_payroll [42s elapsed] selenium.common.exceptions.ElementNotInteractableException: Message: element not interactable
[0/1] google_payroll [42s elapsed] (Session info: chrome=77.0.3865.90)
Hi,
When using the mint module to download data via mint, it never opens a non-headless browser so I can enter in my MFA information.
In the log below, it says Retrying login interactively
, but never opens a browser in which I can select and enter my MFA.
--connect=http://127.0.0.1:49760 --session-id=e915dc4a973f92fe277f10bb69476134
--connect=http://127.0.0.1:49774 --session-id=b718b621b3cdd23b9130ad83d8bf5042
2019-11-29 21:25:04,884 mint.py:119 [INFO] Logging into mint
2019-11-29 21:25:10,855 mint.py:123 [INFO] Waiting to enter username and password
2019-11-29 21:25:10,904 mint.py:127 [INFO] Entering username and password
2019-11-29 21:25:11,134 mint.py:135 [INFO] Waiting for MFA
2019-11-29 21:25:12,206 mint.py:135 [INFO] Waiting for MFA
2019-11-29 21:25:13,212 mint.py:135 [INFO] Waiting for MFA
2019-11-29 21:25:14,218 mint.py:135 [INFO] Waiting for MFA
2019-11-29 21:25:15,227 mint.py:135 [INFO] Waiting for MFA
2019-11-29 21:25:16,234 mint.py:135 [INFO] Waiting for MFA
2019-11-29 21:25:17,242 mint.py:135 [INFO] Waiting for MFA
2019-11-29 21:25:18,251 mint.py:135 [INFO] Waiting for MFA
2019-11-29 21:25:19,259 mint.py:135 [INFO] Waiting for MFA
2019-11-29 21:25:20,267 mint.py:135 [INFO] Waiting for MFA
2019-11-29 21:25:21,276 mint.py:135 [INFO] Waiting for MFA
2019-11-29 21:25:22,290 mint.py:135 [INFO] Waiting for MFA
2019-11-29 21:25:23,297 mint.py:135 [INFO] Waiting for MFA
2019-11-29 21:25:24,305 mint.py:135 [INFO] Waiting for MFA
2019-11-29 21:25:25,310 mint.py:135 [INFO] Waiting for MFA
2019-11-29 21:25:26,319 mint.py:135 [INFO] Waiting for MFA
2019-11-29 21:25:27,328 mint.py:135 [INFO] Waiting for MFA
2019-11-29 21:25:28,335 mint.py:135 [INFO] Waiting for MFA
2019-11-29 21:25:29,342 mint.py:135 [INFO] Waiting for MFA
2019-11-29 21:25:30,348 mint.py:135 [INFO] Waiting for MFA
2019-11-29 21:25:31,354 mint.py:135 [INFO] Waiting for MFA
2019-11-29 21:25:32,360 mint.py:135 [INFO] Waiting for MFA
2019-11-29 21:25:33,365 mint.py:135 [INFO] Waiting for MFA
2019-11-29 21:25:34,372 mint.py:135 [INFO] Waiting for MFA
2019-11-29 21:25:35,378 mint.py:135 [INFO] Waiting for MFA
2019-11-29 21:25:36,382 mint.py:135 [INFO] Waiting for MFA
2019-11-29 21:25:37,390 mint.py:135 [INFO] Waiting for MFA
2019-11-29 21:25:38,395 mint.py:135 [INFO] Waiting for MFA
2019-11-29 21:25:39,401 mint.py:135 [INFO] Waiting for MFA
2019-11-29 21:25:40,407 mint.py:135 [INFO] Waiting for MFA
Traceback (most recent call last):
File "/Users/jkf/accounting/.venv/lib/python3.7/site-packages/finance_dl/mint.py", line 182, in connect
try_login(scraper)
File "/Users/jkf/accounting/.venv/lib/python3.7/site-packages/finance_dl/mint.py", line 173, in try_login
scraper.login()
File "/Users/jkf/accounting/.venv/lib/python3.7/site-packages/finance_dl/mint.py", line 142, in login
raise TimeoutError("Login failed to complete within timeout")
TimeoutError: Login failed to complete within timeout
2019-11-29 21:25:41,427 mint.py:195 [INFO] Retrying login interactively
--connect=http://127.0.0.1:49807 --session-id=c89a622843c182d121d10f26afbc5633
--connect=http://127.0.0.1:49821 --session-id=9432d53cc738ac098854582993c0c5de
2019-11-29 21:25:43,960 mint.py:119 [INFO] Logging into mint
2019-11-29 21:25:49,641 mint.py:123 [INFO] Waiting to enter username and password
2019-11-29 21:25:49,700 mint.py:127 [INFO] Entering username and password
2019-11-29 21:25:50,004 mint.py:135 [INFO] Waiting for MFA
2019-11-29 21:25:51,014 mint.py:135 [INFO] Waiting for MFA
2019-11-29 21:25:52,018 mint.py:135 [INFO] Waiting for MFA
...
I'm getting the following error when I run the command:
python3 -m finance_dl.cli --config-module finance_dl_config --config mint
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/selenium/webdriver/common/service.py", line 76, in start
stdin=PIPE)
File "/usr/lib/python3.6/subprocess.py", line 709, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.6/subprocess.py", line 1344, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
PermissionError: [Errno 13] Permission denied: '/usr/lib/python3.6/site-packages/finance_dl/chromedriver_wrapper.py'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/lib/python3.6/site-packages/finance_dl/cli.py", line 91, in <module>
main()
File "/usr/lib/python3.6/site-packages/finance_dl/cli.py", line 87, in main
module.run(**spec)
File "/usr/lib/python3.6/site-packages/finance_dl/mint.py", line 463, in run
balances_output_prefix=balances_output_prefix, **kwargs)
File "/usr/lib/python3.6/site-packages/finance_dl/mint.py", line 435, in fetch_mint_data
with connect(credentials, kwargs) as mint:
File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
File "/usr/lib/python3.6/site-packages/finance_dl/mint.py", line 169, in connect
**scraper_args) as scraper:
File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
File "/usr/lib/python3.6/site-packages/finance_dl/scrape_lib.py", line 385, in temp_scraper
headless=headless, **kwargs)
File "/usr/lib/python3.6/site-packages/finance_dl/mint.py", line 114, in __init__
super().__init__(use_seleniumrequests=True, **kwargs)
File "/usr/lib/python3.6/site-packages/finance_dl/scrape_lib.py", line 170, in __init__
service_args=service_args,
File "/usr/lib/python3.6/site-packages/selenium/webdriver/chrome/webdriver.py", line 73, in __init__
self.service.start()
File "/usr/lib/python3.6/site-packages/selenium/webdriver/common/service.py", line 88, in start
os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'chromedriver_wrapper.py' executable may have wrong permissions. Please see https://sites.google.com/a/chromium.org/chromedriver/home
Any thoughts on how I can fix this?
It would be nice to have the requirements documented, with examples for the most common systems. I'm having trouble getting this to work on my XUbuntu VM after installing it in my beancount virtualenv:
selenium.common.exceptions.WebDriverException: Message: 'chromedriver_wrapper.py' executable may have wrong permissions. Please see https://sites.google.com/a/chromium.org/chromedriver/home
There's no description for how #1 was solved. I tried installing google-chrome-beta
, google-chrome-stable
, as well as the chromium snap, and chromium
from the ubuntu repositories. I also tried installing chromedriver
and it's dependencies:
sudo apt-get install chromium-chromedriver
And still, same error. I'm not sure how to get this working.
I am getting
TypeError: Binary Location Must be a String
when trying to run finance_dl.cli with schwab configuration. It is looking for os.getenv("CHROMEDRIVER_CHROME_BINARY") but could not find it. Could you please tell me how to fix the issue?
Steps to reproduce:
Create an environment file
% cat env_test_finance-dl.yml
name: test_finance-dl
channels:
- defaults
dependencies:
- python=3.12
- pip
- pip:
- git+https://github.com/jbms/finance-dl
Create the environment
% conda env create -f env_test_finance-dl.yml
Activate the environment
% conda activate test_finance-dl
Create a configuration file
% cat finance_dl_config.py
import os
profile_dir = os.path.join(os.getenv('HOME'), '.cache', 'finance_dl')
data_dir = '/home/rajulocal/x/x5'
def CONFIG_schwab():
return dict(
module='finance_dl.schwab',
credentials={
'username': 'XXXXXX',
'password': 'XXXXXX',
},
output_directory=os.path.join(data_dir, 'schwab'),
profile_dir=profile_dir,
headless=False,
min_start_date='2023-11-01'
)
Run
% python -m finance_dl.cli --config-module finance_dl_config --config schwab
Traceback (most recent call last):
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/site-packages/finance_dl/scrape_lib.py", line 411, in retry
return func()
^^^^^^
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/site-packages/finance_dl/scrape_lib.py", line 430, in fetch
with temp_scraper(scraper_class, **kwargs) as scraper:
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/contextlib.py", line 137, in __enter__
return next(self.gen)
^^^^^^^^^^^^^^
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/site-packages/finance_dl/scrape_lib.py", line 393, in temp_scraper
scraper = scraper_type(*args, download_dir=download_dir,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/site-packages/finance_dl/schwab.py", line 87, in __init__
super().__init__(use_seleniumrequests=True, **kwargs)
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/site-packages/finance_dl/scrape_lib.py", line 146, in __init__
chrome_options.binary_location = os.getenv("CHROMEDRIVER_CHROME_BINARY")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/site-packages/selenium/webdriver/chromium/options.py", line 52, in binary_location
raise TypeError(self.BINARY_LOCATION_ERROR)
TypeError: Binary Location Must be a String
Waiting 0 seconds before retrying
Traceback (most recent call last):
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/site-packages/finance_dl/scrape_lib.py", line 411, in retry
return func()
^^^^^^
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/site-packages/finance_dl/scrape_lib.py", line 430, in fetch
with temp_scraper(scraper_class, **kwargs) as scraper:
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/contextlib.py", line 137, in __enter__
return next(self.gen)
^^^^^^^^^^^^^^
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/site-packages/finance_dl/scrape_lib.py", line 393, in temp_scraper
scraper = scraper_type(*args, download_dir=download_dir,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/site-packages/finance_dl/schwab.py", line 87, in __init__
super().__init__(use_seleniumrequests=True, **kwargs)
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/site-packages/finance_dl/scrape_lib.py", line 146, in __init__
chrome_options.binary_location = os.getenv("CHROMEDRIVER_CHROME_BINARY")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/site-packages/selenium/webdriver/chromium/options.py", line 52, in binary_location
raise TypeError(self.BINARY_LOCATION_ERROR)
TypeError: Binary Location Must be a String
Waiting 0 seconds before retrying
Traceback (most recent call last):
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/site-packages/finance_dl/scrape_lib.py", line 411, in retry
return func()
^^^^^^
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/site-packages/finance_dl/scrape_lib.py", line 430, in fetch
with temp_scraper(scraper_class, **kwargs) as scraper:
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/contextlib.py", line 137, in __enter__
return next(self.gen)
^^^^^^^^^^^^^^
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/site-packages/finance_dl/scrape_lib.py", line 393, in temp_scraper
scraper = scraper_type(*args, download_dir=download_dir,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/site-packages/finance_dl/schwab.py", line 87, in __init__
super().__init__(use_seleniumrequests=True, **kwargs)
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/site-packages/finance_dl/scrape_lib.py", line 146, in __init__
chrome_options.binary_location = os.getenv("CHROMEDRIVER_CHROME_BINARY")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/site-packages/selenium/webdriver/chromium/options.py", line 52, in binary_location
raise TypeError(self.BINARY_LOCATION_ERROR)
TypeError: Binary Location Must be a String
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/site-packages/finance_dl/cli.py", line 94, in <module>
main()
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/site-packages/finance_dl/cli.py", line 90, in main
module.run(**spec)
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/site-packages/finance_dl/schwab.py", line 374, in run
scrape_lib.run_with_scraper(SchwabScraper, **kwargs)
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/site-packages/finance_dl/scrape_lib.py", line 433, in run_with_scraper
retry(fetch)
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/site-packages/finance_dl/scrape_lib.py", line 411, in retry
return func()
^^^^^^
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/site-packages/finance_dl/scrape_lib.py", line 430, in fetch
with temp_scraper(scraper_class, **kwargs) as scraper:
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/contextlib.py", line 137, in __enter__
return next(self.gen)
^^^^^^^^^^^^^^
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/site-packages/finance_dl/scrape_lib.py", line 393, in temp_scraper
scraper = scraper_type(*args, download_dir=download_dir,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/site-packages/finance_dl/schwab.py", line 87, in __init__
super().__init__(use_seleniumrequests=True, **kwargs)
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/site-packages/finance_dl/scrape_lib.py", line 146, in __init__
chrome_options.binary_location = os.getenv("CHROMEDRIVER_CHROME_BINARY")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/site-packages/selenium/webdriver/chromium/options.py", line 52, in binary_location
raise TypeError(self.BINARY_LOCATION_ERROR)
TypeError: Binary Location Must be a String
There is no CHROMEDRIVER_CHROME_BINARY environment variable set on my machine.
% ipython
Python 3.12.0 | packaged by Anaconda, Inc. | (main, Oct 2 2023, 17:29:18) [GCC 11.2.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.19.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]:
import os
In [2]:
os.getenv("CHROMEDRIVER_CHROME_BINARY") is None
Out[2]:
True
FWIW chromedriver-path shows
% chromedriver-path
/opt/rajulocal/miniconda3/envs/test_finance-dl/lib/python3.12/site-packages/chromedriver_binary
finance-dl formats the url as
https://www.amazon.com/gp/css/summary/print.html?ie=UTF8&orderID=D01-1380792-3469006
which results in an error.
This one, which matches the pattern when I manually visit digital order invoices, works.
https://www.amazon.com/gp/digital/your-account/order-summary.html/ref=ppx_yo_dt_b_dpi_o00?ie=UTF8&orderID=D01-1380792-3469006&print=1
Dunno if the url has been changed since the code was written or if I'm hitting a unique issue.
Updated from 1.0.3 to 1.2.0, and am now getting this error:
python -m finance_dl.update --config-module finance_dl_config --log-dir logs status
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/...snip.../venvs/beancount/lib/python3.6/site-packages/finance_dl/update.py", line 187, in <module>
main()
File "/...snip.../venvs/beancount/lib/python3.6/site-packages/finance_dl/update.py", line 153, in main
subparsers = ap.add_subparsers(dest='command', required=True)
File "/usr/lib/python3.6/argparse.py", line 1716, in add_subparsers
action = parsers_class(option_strings=[], **kwargs)
TypeError: __init__() got an unexpected keyword argument 'required'
Steps to produce a Takeout:
Extracts to Takeout/Purchases _ Reservations/order_##########.json
and contains the extracted ordering information.
Paypal shows a recaptcha after we enter the username / password. However, we navigate away from the security challenge screen before I can click on the recaptcha button.
Here is the log:
[0/1] paypal [0s elapsed] starting
[0/1] paypal [2s elapsed] 2019-06-11 18:44:27,880 paypal.py:136 [INFO] Finding username field
[0/1] paypal [2s elapsed] 2019-06-11 18:44:28,016 paypal.py:139 [INFO] Entering username
[0/1] paypal [3s elapsed] 2019-06-11 18:44:28,154 paypal.py:142 [INFO] Finding password field
[0/1] paypal [3s elapsed] 2019-06-11 18:44:29,016 paypal.py:145 [INFO] Entering password
[0/1] paypal [4s elapsed] 2019-06-11 18:44:30,065 paypal.py:149 [INFO] Logged in
[0/1] paypal [4s elapsed] 2019-06-11 18:44:30,065 paypal.py:175 [INFO] Getting transaction list
[0/1] paypal [4s elapsed] 2019-06-11 18:44:30,065 paypal.py:163 [INFO] Getting CSRF token
[0/1] paypal [38s elapsed] Traceback (most recent call last):
[0/1] paypal [38s elapsed] File "/usr/lib/python3.7/site-packages/finance_dl/scrape_lib.py", line 402, in retry
[0/1] paypal [38s elapsed] return func()
[0/1] paypal [38s elapsed] File "/usr/lib/python3.7/site-packages/finance_dl/scrape_lib.py", line 422, in fetch
[0/1] paypal [38s elapsed] scraper.run()
[0/1] paypal [38s elapsed] File "/usr/lib/python3.7/site-packages/finance_dl/paypal.py", line 246, in run
[0/1] paypal [38s elapsed] self.save_transactions()
[0/1] paypal [38s elapsed] File "/usr/lib/python3.7/site-packages/finance_dl/paypal.py", line 189, in save_transactions
[0/1] paypal [38s elapsed] transaction_list = self.get_transaction_list()
[0/1] paypal [38s elapsed] File "/usr/lib/python3.7/site-packages/finance_dl/paypal.py", line 182, in get_transaction_list
[0/1] paypal [38s elapsed] resp = self.make_json_request(url)
[0/1] paypal [38s elapsed] File "/usr/lib/python3.7/site-packages/finance_dl/paypal.py", line 156, in make_json_request
[0/1] paypal [38s elapsed] 'x-csrf-token': self.get_csrf_token(),
[0/1] paypal [38s elapsed] File "/usr/lib/python3.7/site-packages/finance_dl/paypal.py", line 167, in get_csrf_token
[0/1] paypal [38s elapsed] '//body[@data-token!=""]'))
[0/1] paypal [38s elapsed] File "/usr/lib/python3.7/site-packages/finance_dl/scrape_lib.py", line 256, in wait_and_locate
[0/1] paypal [38s elapsed] message='Waiting to locate %r' % (locators, ))
[0/1] paypal [38s elapsed] File "/usr/lib/python3.7/site-packages/finance_dl/scrape_lib.py", line 238, in wait_and_return
[0/1] paypal [38s elapsed] WebDriverWait(self.driver, timeout).until(predicate, message=message)
[0/1] paypal [38s elapsed] File "/usr/lib/python3.7/site-packages/selenium/webdriver/support/wait.py", line 80, in until
[0/1] paypal [38s elapsed] raise TimeoutException(message, screen, stacktrace)
[0/1] paypal [38s elapsed] selenium.common.exceptions.TimeoutException: Message: Waiting to locate (('xpath', '//body[@data-token!=""]'),)
I recently setup an environment which has "selenium 4.4.3" installed. It's been installed as a direct or indirect dependency of finance-dl. This version of selenium-python has the following change recently, which breaks all find element API calls:
baijum/selenium-python@3b13c2f
Causes:
File "somepath\.venv\lib\site-packages\finance_dl\scrape_lib.py", line 222, in wait_for_page_load
old_page = self.driver.find_element_by_tag_name('html')
AttributeError: 'WebDriver' object has no attribute 'find_element_by_tag_name'
My amazon CONFIG:
def CONFIG_amazon():
return dict(
module='finance_dl.amazon',
credentials=amazon.credentials,
output_directory=os.path.join(data_dir, 'amazon'),
profile_dir=os.path.join(profile_dir, 'amazon'),
)
After executing the cli:
python -m finance_dl.cli --config-module finance_dl_config --config amazon --log=INFO
The output shows:
--connect=http://127.0.0.1:57275 --session-id=93c41049e6af86c2d7d455703aed1a63
2019-04-27 15:50:33,070 amazon.py:102 [INFO] Initiating log in
2019-04-27 15:50:35,883 amazon.py:109 [INFO] You must be already logged in!
2019-04-27 15:50:38,120 amazon.py:206 [INFO] Retrieving order group: 'last 30 days'
2019-04-27 15:50:40,159 amazon.py:185 [INFO] Found no more pages
2019-04-27 15:50:40,309 amazon.py:206 [INFO] Retrieving order group: 'past 6 months'
2019-04-27 15:50:42,365 amazon.py:185 [INFO] Found no more pages
2019-04-27 15:50:42,536 amazon.py:206 [INFO] Retrieving order group: '2019'
2019-04-27 15:50:44,433 amazon.py:185 [INFO] Found no more pages
2019-04-27 15:50:44,529 amazon.py:206 [INFO] Retrieving order group: '2018'
Traceback (most recent call last):
File "/home/philipsd6/devel/finance-dl/finance_dl/scrape_lib.py", line 416, in retry
return func()
File "/home/philipsd6/devel/finance-dl/finance_dl/scrape_lib.py", line 436, in fetch
scraper.run()
File "/home/philipsd6/devel/finance-dl/finance_dl/amazon.py", line 265, in run
self.get_orders(regular=self.regular, digital=self.digital)
File "/home/philipsd6/devel/finance-dl/finance_dl/amazon.py", line 224, in get_orders
retrieve_all_order_groups()
File "/home/philipsd6/devel/finance-dl/finance_dl/amazon.py", line 209, in retrieve_all_order_groups
get_invoice_urls()
File "/home/philipsd6/devel/finance-dl/finance_dl/amazon.py", line 159, in get_invoice_urls
invoices, = self.wait_and_return(invoice_finder)
File "/home/philipsd6/devel/finance-dl/finance_dl/scrape_lib.py", line 252, in wait_and_return
WebDriverWait(self.driver, timeout).until(predicate, message=message)
File "/home/philipsd6/.local/venvs/beancount/lib/python3.6/site-packages/selenium/webdriver/support/wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: Waiting to match conditions
This module used to work for me in the past when I first set it up, but now it's failing in 2018.
For "last 30 days", "past 6 months", and "2019" I have < 10 orders and no pagination, but when it gets to "2018" I have > 50 orders, and > 5 pages of orders. This module navigates to the second page of orders for "2018" but then it times out.
The call that is polling until it times out is this:
2019-04-27 15:49:40,011 remote_connection.py:388 [DEBUG] POST http://127.0.0.1:37509/session/9a048830f7858ba83482d1a0f641a2bd/elements
{"using": "xpath", "value": "//a[contains(@href, \"summary/print.html\")]", "sessionId": "9a048830f7858ba83482d1a0f641a2bd"}
2019-04-27 15:49:40,053 connectionpool.py:396 [DEBUG] http://127.0.0.1:37509 "POST /session/9a048830f7858ba83482d1a0f641a2bd/elements H
TTP/1.1" 200 70
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.