lavovalampa / patreon-scraper Goto Github PK
View Code? Open in Web Editor NEWWIP Patreon attachment download written in TypeScript
License: MIT License
WIP Patreon attachment download written in TypeScript
License: MIT License
Hello there!
Not sure how to describe this issue or provide useful information. I am a supporter of AmaLee (https://www.patreon.com/LeeandLie/posts) and when I try and run the scraper with my session ID I do not get all the mp3 files.
Ones I noticed missing right off the bat are:
https://www.patreon.com/posts/weight-of-world-32708075
https://www.patreon.com/posts/which-lyric-do-33452070
https://www.patreon.com/posts/preview-might-u-33592038
https://www.patreon.com/posts/aliez-remix-33718259
I believe it is not downloaded when the player is shown, look at this
Not sure if it is just a coincidence however.
This is what it looks like when running the scraper: https://gist.github.com/Evernow/b6c72762f07f4898a6031db142975487
When attempting to run this script no content is downloaded, when looking at the script output in the console the body section appears to contain the markup for a CAPTCHA page leading me to believe that this is the root of the issue.
Example:
`\n
\n<!--[if IE 7]>
<html class="no-js ie7 oldie" lang="en-US">
<![endif]-->\n
<!--[if IE 8]>
<html class="no-js ie8 oldie" lang="en-US">
<![endif]-->\n
<!--[if gt IE 8]>
<!-->
<html class="no-js" lang="en-US">
<!--
<![endif]-->\n
<head>\n
<title>Attention Required! | Cloudflare</title>\n
<meta name="captcha-bypass" id="captcha-bypass" />\n
<meta charset="UTF-8" />\n
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />\n
<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />\n
<meta name="robots" content="noindex, nofollow" />\n
<meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1" />\n
<link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/cf.errors.css" type="text/css" media="screen,projection" />\n
<!--[if lt IE 9]>
<link rel="stylesheet" id=\'cf_styles-ie-css\' href="/cdn-cgi/styles/cf.errors.ie.css" type="text/css" media="screen,projection" />
<![endif]-->\n
<style type="text/css">body{margin:0;padding:0}</style>\n\n\n
<!--[if gte IE 10]>
<!-->
<script type="text/javascript" src="/cdn-cgi/scripts/zepto.min.js"></script>
<!--
<![endif]-->\n
<!--[if gte IE 10]>
<!-->
<script type="text/javascript" src="/cdn-cgi/scripts/cf.common.js"></script>
<!--
<![endif]-->\n\n\n\n\n
</head>\n
<body>\n
<div id="cf-wrapper">\n
<div class="cf-alert cf-alert-error cf-cookie-error" id="cookie-alert" data-translate="enable_cookies">Please enable cookies.</div>\n
<div id="cf-error-details" class="cf-error-details-wrapper">\n
<div class="cf-wrapper cf-header cf-error-overview">\n
<h1 data-translate="challenge_headline">One more step</h1>\n
<h2 class="cf-subheadline">
<span data-translate="complete_sec_check">Please complete the security check to access</span> patreon.com
</h2>\n
</div>
<!-- /.header -->\n \n
<div class="cf-section cf-highlight cf-captcha-container">\n
<div class="cf-wrapper">\n
<div class="cf-columns two">\n
<div class="cf-column">\n \n
<div class="cf-highlight-inverse cf-form-stacked">\n
<form class="challenge-form" id="challenge-form" action="/cdn-cgi/l/chk_captcha" method="get">\n
<input type="hidden" name="s" value="6903d23f35b518d64183514a77da7a1e4080565e-1569455077-1800-AXOxIyPcLIrC1ekaD86GCBANIQ6tvR2dsxDi1Op3XgNC+nMdHTBPnPjig2WUKcdW1YeHIAgbCFko2Pjz4MTZYsMKtDf+imsnyFXsz9HFKWlw8V09/GiYfeyNtj1F9O+3L6EI1/CXKDOujHPfIpMFcF9x7Xs1bqSjoPnunmOUV8FA0M9p2vcX5ZR1at4f0ZwXFw=="></input>\n
<script type="text/javascript" src="/cdn-cgi/scripts/cf.challenge.js" data-type="normal" data-ray="51c0ddfb9cfbce6b" async data-sitekey="6LfBixYUAAAAABhdHynFUIMA_sa4s-XsJvnjtgB0"></script>\n
<div class="g-recaptcha"></div>\n
<noscript id="cf-captcha-bookmark" class="cf-captcha-info">\n
<div>
<div style="width: 302px">\n
<div>\n
<iframe src="https://www.google.com/recaptcha/api/fallback?k=6LfBixYUAAAAABhdHynFUIMA_sa4s-XsJvnjtgB0" frameborder="0" scrolling="no" style="width: 302px; height:422px; border-style: none;"></iframe>\n
</div>\n
<div style="width: 300px; border-style: none; bottom: 12px; left: 25px; margin: 0px; padding: 0px; right: 25px; background: #f9f9f9; border: 1px solid #c1c1c1; border-radius: 3px;">\n
<textarea id="g-recaptcha-response" name="g-recaptcha-response" class="g-recaptcha-response" style="width: 250px; height: 40px; border: 1px solid #c1c1c1; margin: 10px 25px; padding: 0px; resize: none;"></textarea>\n
<input type="submit" value="Submit"></input>\n
</div>\n
</div>
</div>\n
</noscript>\n
</form>\n\n \n
</div>\n
</div>\n\n
<div class="cf-column">\n
<div class="cf-screenshot-container">\n \n
<span class="cf-no-screenshot"></span>\n \n
</div>\n
</div>\n
</div>
<!-- /.columns -->\n
</div>\n
</div>
<!-- /.captcha-container -->\n\n
<div class="cf-section cf-wrapper">\n
<div class="cf-columns two">\n
<div class="cf-column">\n
<h2 data-translate="why_captcha_headline">Why do I have to complete a CAPTCHA?</h2>\n \n
<p data-translate="why_captcha_detail">Completing the CAPTCHA proves you are a human and gives you temporary access to the web property.</p>\n
</div>\n\n
<div class="cf-column">\n
<h2 data-translate="resolve_captcha_headline">What can I do to prevent this in the future?</h2>\n \n\n
<p data-translate="resolve_captcha_antivirus">If you are on a personal connection, like at home, you can run an anti-virus scan on your device to make sure it is not infected with malware.</p>\n\n
<p data-translate="resolve_captcha_network">If you are at an office or shared network, you can ask the network administrator to run a scan across the network looking for misconfigured or infected devices.</p>\n \n
</div>\n
</div>\n
</div>
<!-- /.section -->\n \n\n
<div class="cf-error-footer cf-wrapper">\n
<p>\n
<span class="cf-footer-item">Cloudflare Ray ID:
<strong>51c0ddfb9cfbce6b</strong>
</span>\n
<span class="cf-footer-separator">•</span>\n
<span class="cf-footer-item">
<span>Your IP</span>: 51.7.125.220
</span>\n
<span class="cf-footer-separator">•</span>\n
<span class="cf-footer-item">
<span>Performance & security by</span>
<a href="https://www.cloudflare.com/5xx-error-landing?utm_source=error_footer" id="brand_link" target="_blank">Cloudflare</a>
</span>\n \n
</p>\n
</div>
<!-- /.error-footer -->\n\n\n
</div>
<!-- /#cf-error-details -->\n
</div>
<!-- /#cf-wrapper -->\n\n
<script type="text/javascript">\n window._cf_translation = {};\n \n \n</script>\n\n\n \n
</body>\n
</html>`
This is most likely me being terrible at using this rather than anything else, but I'm having an issue when trying to use ts-node index.ts -s --session id, where it returns 'Invalid/expired session ID', and I don't know why or how to fix it? I'd love to figure this out somehow, this program is exactly what I need!
Are you still active in development?
I'm receiving the following error when trying to run the script on both the latest 64-bit NPM build on Windows, as well as when trying to run using npm on Ubuntu WSL:
TSError: ⨯ Unable to compile TypeScript:
src/patreon-stream.ts:93:15 - error TS2739: Type 'ParsedQs' is missing the following properties from type 'FileUrlQS': h, i
93 const result: FileUrlQS | null | undefined = qs.parse(qsPart)
~~~~~~
at createTSError (E:\Software\patreon-scraper\node_modules\ts-node\src\index.ts:434:12)
at reportTSError (E:\Software\patreon-scraper\node_modules\ts-node\src\index.ts:438:19)
at getOutput (E:\Software\patreon-scraper\node_modules\ts-node\src\index.ts:578:36)
at Object.compile (E:\Software\patreon-scraper\node_modules\ts-node\src\index.ts:775:32)
at Module.m._compile (E:\Software\patreon-scraper\node_modules\ts-node\src\index.ts:858:43)
at Module._extensions..js (internal/modules/cjs/loader.js:1092:10)
at Object.require.extensions.<computed> [as .ts] (E:\Software\patreon-scraper\node_modules\ts-node\src\index.ts:861:12)
at Module.load (internal/modules/cjs/loader.js:928:32)
at Function.Module._load (internal/modules/cjs/loader.js:769:14)
at Module.require (internal/modules/cjs/loader.js:952:19)
PS E:\Software\patreon-scraper>
Same error on both systems.
Linux system:
makoto@DESKTOP-1997FKK:/mnt/e/Software/patreon-scraper$ uname -a
Linux DESKTOP-1997FKK 4.19.128-microsoft-standard #1 SMP Tue Jun 23 12:58:10 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
makoto@DESKTOP-1997FKK:/mnt/e/Software/patreon-scraper$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
makoto@DESKTOP-1997FKK:/mnt/e/Software/patreon-scraper$
I have downloaded this an am having trouble. I can't seem to find where the sessionId & outputDir goes in index.ts. So if you can help me, that would be great!
Right now it's attempting to download everything by everyone.
Sorting to sub-directories or permitting a CLI argument to only download from a specific author is a must.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.