Giter Site home page Giter Site logo

patreon-scraper's People

Contributors

dependabot[bot] avatar dotsk avatar lavovalampa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

patreon-scraper's Issues

Not downloading all mp3 from patreons

Hello there!

Not sure how to describe this issue or provide useful information. I am a supporter of AmaLee (https://www.patreon.com/LeeandLie/posts) and when I try and run the scraper with my session ID I do not get all the mp3 files.

Ones I noticed missing right off the bat are:

https://www.patreon.com/posts/weight-of-world-32708075

https://www.patreon.com/posts/which-lyric-do-33452070

https://www.patreon.com/posts/preview-might-u-33592038

https://www.patreon.com/posts/aliez-remix-33718259

I believe it is not downloaded when the player is shown, look at this
DeepinScreenshot_select-area_20200203202626
DeepinScreenshot_select-area_20200203202607
DeepinScreenshot_select-area_20200203202550
Not sure if it is just a coincidence however.

This is what it looks like when running the scraper: https://gist.github.com/Evernow/b6c72762f07f4898a6031db142975487

CAPTCHA prompt seems to prevent any content from downloading

When attempting to run this script no content is downloaded, when looking at the script output in the console the body section appears to contain the markup for a CAPTCHA page leading me to believe that this is the root of the issue.

Example:

`\n

\n
<!--[if IE 7]>
<html class="no-js ie7 oldie" lang="en-US">
	<![endif]-->\n
	<!--[if IE 8]>
	<html class="no-js ie8 oldie" lang="en-US">
		<![endif]-->\n
		<!--[if gt IE 8]>
		<!-->
		<html class="no-js" lang="en-US">
			<!--
			<![endif]-->\n
			<head>\n
				<title>Attention Required! | Cloudflare</title>\n
				<meta name="captcha-bypass" id="captcha-bypass" />\n
				<meta charset="UTF-8" />\n
				<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />\n
				<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />\n
				<meta name="robots" content="noindex, nofollow" />\n
				<meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1" />\n
				<link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/cf.errors.css" type="text/css" media="screen,projection" />\n
				<!--[if lt IE 9]>
				<link rel="stylesheet" id=\'cf_styles-ie-css\' href="/cdn-cgi/styles/cf.errors.ie.css" type="text/css" media="screen,projection" />
				<![endif]-->\n
				<style type="text/css">body{margin:0;padding:0}</style>\n\n\n
				<!--[if gte IE 10]>
				<!-->
				<script type="text/javascript" src="/cdn-cgi/scripts/zepto.min.js"></script>
				<!--
				<![endif]-->\n
				<!--[if gte IE 10]>
				<!-->
				<script type="text/javascript" src="/cdn-cgi/scripts/cf.common.js"></script>
				<!--
				<![endif]-->\n\n\n\n\n
			</head>\n
			<body>\n  
				<div id="cf-wrapper">\n    
					<div class="cf-alert cf-alert-error cf-cookie-error" id="cookie-alert" data-translate="enable_cookies">Please enable cookies.</div>\n    
					<div id="cf-error-details" class="cf-error-details-wrapper">\n      
						<div class="cf-wrapper cf-header cf-error-overview">\n        
							<h1 data-translate="challenge_headline">One more step</h1>\n        
							<h2 class="cf-subheadline">
								<span data-translate="complete_sec_check">Please complete the security check to access</span> patreon.com
							</h2>\n      
						</div>
						<!-- /.header -->\n      \n      
						<div class="cf-section cf-highlight cf-captcha-container">\n        
							<div class="cf-wrapper">\n          
								<div class="cf-columns two">\n            
									<div class="cf-column">\n            \n              
										<div class="cf-highlight-inverse cf-form-stacked">\n                
											<form class="challenge-form" id="challenge-form" action="/cdn-cgi/l/chk_captcha" method="get">\n  
												<input type="hidden" name="s" value="6903d23f35b518d64183514a77da7a1e4080565e-1569455077-1800-AXOxIyPcLIrC1ekaD86GCBANIQ6tvR2dsxDi1Op3XgNC+nMdHTBPnPjig2WUKcdW1YeHIAgbCFko2Pjz4MTZYsMKtDf+imsnyFXsz9HFKWlw8V09/GiYfeyNtj1F9O+3L6EI1/CXKDOujHPfIpMFcF9x7Xs1bqSjoPnunmOUV8FA0M9p2vcX5ZR1at4f0ZwXFw=="></input>\n  
												<script type="text/javascript" src="/cdn-cgi/scripts/cf.challenge.js" data-type="normal"  data-ray="51c0ddfb9cfbce6b" async data-sitekey="6LfBixYUAAAAABhdHynFUIMA_sa4s-XsJvnjtgB0"></script>\n  
												<div class="g-recaptcha"></div>\n  
												<noscript id="cf-captcha-bookmark" class="cf-captcha-info">\n    
													<div>
														<div style="width: 302px">\n      
															<div>\n        
																<iframe src="https://www.google.com/recaptcha/api/fallback?k=6LfBixYUAAAAABhdHynFUIMA_sa4s-XsJvnjtgB0" frameborder="0" scrolling="no" style="width: 302px; height:422px; border-style: none;"></iframe>\n      
															</div>\n      
															<div style="width: 300px; border-style: none; bottom: 12px; left: 25px; margin: 0px; padding: 0px; right: 25px; background: #f9f9f9; border: 1px solid #c1c1c1; border-radius: 3px;">\n        
																<textarea id="g-recaptcha-response" name="g-recaptcha-response" class="g-recaptcha-response" style="width: 250px; height: 40px; border: 1px solid #c1c1c1; margin: 10px 25px; padding: 0px; resize: none;"></textarea>\n        
																<input type="submit" value="Submit"></input>\n      
															</div>\n    
														</div>
													</div>\n  
												</noscript>\n
											</form>\n\n                \n              
										</div>\n            
									</div>\n\n            
									<div class="cf-column">\n              
										<div class="cf-screenshot-container">\n              \n                
											<span class="cf-no-screenshot"></span>\n              \n              
										</div>\n            
									</div>\n          
								</div>
								<!-- /.columns -->\n        
							</div>\n      
						</div>
						<!-- /.captcha-container -->\n\n      
						<div class="cf-section cf-wrapper">\n        
							<div class="cf-columns two">\n          
								<div class="cf-column">\n            
									<h2 data-translate="why_captcha_headline">Why do I have to complete a CAPTCHA?</h2>\n            \n            
									<p data-translate="why_captcha_detail">Completing the CAPTCHA proves you are a human and gives you temporary access to the web property.</p>\n          
								</div>\n\n          
								<div class="cf-column">\n            
									<h2 data-translate="resolve_captcha_headline">What can I do to prevent this in the future?</h2>\n            \n\n            
									<p data-translate="resolve_captcha_antivirus">If you are on a personal connection, like at home, you can run an anti-virus scan on your device to make sure it is not infected with malware.</p>\n\n            
									<p data-translate="resolve_captcha_network">If you are at an office or shared network, you can ask the network administrator to run a scan across the network looking for misconfigured or infected devices.</p>\n            \n          
								</div>\n        
							</div>\n      
						</div>
						<!-- /.section -->\n      \n\n      
						<div class="cf-error-footer cf-wrapper">\n  
							<p>\n    
								<span class="cf-footer-item">Cloudflare Ray ID: 
									<strong>51c0ddfb9cfbce6b</strong>
								</span>\n    
								<span class="cf-footer-separator">&bull;</span>\n    
								<span class="cf-footer-item">
									<span>Your IP</span>: 51.7.125.220
								</span>\n    
								<span class="cf-footer-separator">&bull;</span>\n    
								<span class="cf-footer-item">
									<span>Performance &amp; security by</span>
									<a href="https://www.cloudflare.com/5xx-error-landing?utm_source=error_footer" id="brand_link" target="_blank">Cloudflare</a>
								</span>\n    \n  
							</p>\n
						</div>
						<!-- /.error-footer -->\n\n\n    
					</div>
					<!-- /#cf-error-details -->\n  
				</div>
				<!-- /#cf-wrapper -->\n\n  
				<script type="text/javascript">\n  window._cf_translation = {};\n  \n  \n</script>\n\n\n  \n
			</body>\n
		</html>`

Pipeline not found

pppp
i'm kinda stuck here, what do i do now with this error and how do i use this tool that you said in the description?

Invalid/expired session ID

This is most likely me being terrible at using this rather than anything else, but I'm having an issue when trying to use ts-node index.ts -s --session id, where it returns 'Invalid/expired session ID', and I don't know why or how to fix it? I'd love to figure this out somehow, this program is exactly what I need!

src/patreon-stream.ts:93:15 - error TS2739: Type 'ParsedQs' is missing the following properties from type 'FileUrlQS': h, i

I'm receiving the following error when trying to run the script on both the latest 64-bit NPM build on Windows, as well as when trying to run using npm on Ubuntu WSL:

TSError: ⨯ Unable to compile TypeScript:
src/patreon-stream.ts:93:15 - error TS2739: Type 'ParsedQs' is missing the following properties from type 'FileUrlQS': h, i

93         const result: FileUrlQS | null | undefined = qs.parse(qsPart)
                 ~~~~~~

    at createTSError (E:\Software\patreon-scraper\node_modules\ts-node\src\index.ts:434:12)
    at reportTSError (E:\Software\patreon-scraper\node_modules\ts-node\src\index.ts:438:19)
    at getOutput (E:\Software\patreon-scraper\node_modules\ts-node\src\index.ts:578:36)
    at Object.compile (E:\Software\patreon-scraper\node_modules\ts-node\src\index.ts:775:32)
    at Module.m._compile (E:\Software\patreon-scraper\node_modules\ts-node\src\index.ts:858:43)
    at Module._extensions..js (internal/modules/cjs/loader.js:1092:10)
    at Object.require.extensions.<computed> [as .ts] (E:\Software\patreon-scraper\node_modules\ts-node\src\index.ts:861:12)
    at Module.load (internal/modules/cjs/loader.js:928:32)
    at Function.Module._load (internal/modules/cjs/loader.js:769:14)
    at Module.require (internal/modules/cjs/loader.js:952:19)
PS E:\Software\patreon-scraper>

Same error on both systems.

Linux system:

makoto@DESKTOP-1997FKK:/mnt/e/Software/patreon-scraper$ uname -a
Linux DESKTOP-1997FKK 4.19.128-microsoft-standard #1 SMP Tue Jun 23 12:58:10 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
makoto@DESKTOP-1997FKK:/mnt/e/Software/patreon-scraper$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
makoto@DESKTOP-1997FKK:/mnt/e/Software/patreon-scraper$

Doesn't run

Using Ubuntu 18.04 LTS with npm 6.13.4 and node v12.14.1
When I run it with ./index.ts:

When I run it with node index.ts:

When I run it with tsc index.ts:

Filter by Author

Right now it's attempting to download everything by everyone.

Sorting to sub-directories or permitting a CLI argument to only download from a specific author is a must.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.