Comments (6)
Hmmm, while this should work it's quite a bit work to debug with the site being down (doesn't load for me). Can you bring it back up @tacman ?
from phpscraper.
Try now. It's a slow site, at least initially, because it's running on a free heroku dyno. It can take up to 30 seconds to "wake up" if it's been inactive for a while.
I set up a login for you -- [email protected], password: spekulatius
from phpscraper.
Hey @tacman
can you share some more code on how you add this to PHPScraper?
if ($username) {
$crawler = $gouteClient->request('GET', $url = $baseUrl . "/login", [
]);
// select the form and fill in some values
$form = $crawler->selectButton('login-btn')->form();
$form['_username'] = 'user';
$form['_password'] = 'pass';
// submit that form
$crawler = $gouteClient->submit($form);
$response = $gouteClient->getResponse();
Thanks :)
from phpscraper.
Well, that's kind of the point of this issue -- I don't know how to do that. I only see how to click links with phpScraper:
https://github.com/spekulatius/PHPScraper/blob/master/src/phpscraper.php#L918
I was hoping there was a way to submit a form, which would keep the cookies for that session. So instead of ->clickLink(), a method like ->submitForm(), when I could send in the credentials, and then load a page and follow links that require authentication.
from phpscraper.
Ah okay, now we are getting a bit closer. I've wondered how you did it. Did you get it working with Goutte only?
from phpscraper.
I have a Symfony bundle that crawls a website: https://github.com/survos/SurvosCrawlerBundle
The idea is that if it can create a set of links that are visible (based on different logins), those links can then be used in a simple PHPUnit test. It basically does what almost all testers do in the beginning -- log in, and click blindly on every link. It's amazing how often someone finds a broken page that way.
So I was trying to use PHPScrapper to do that. In the end, I couldn't, so I just used what other tools I had available:
public function authenticateClient(?string $username = null, string $plainPassword=null): void
{
// might be worth checking out: https://github.com/liip/LiipTestFixturesBundle/pull/62#issuecomment-622191412
static $clients = [];
if (!array_key_exists($username, $clients)) {
$gouteClient = new Client();
$gouteClient
->setMaxRedirects(0);
$this->username = $username;
$baseUrl = $this->baseUrl;
$clients[$username] = $gouteClient;
if ($username) {
$crawler = $gouteClient->request('GET', $url = $baseUrl . trim($this->loginPath, '/'), [
'proxy' => '127.0.0.1:7080'
]);
// dd($crawler, $url);
$response = $gouteClient->getResponse();
assert($response->getStatusCode() === 200, "Invalid route: " . $url);
// dd(substr($response->getContent(),0, 1024), $url, $baseUrl);
// select the form and fill in some values
// $form = $crawler->filter('login_form')->form();
try {
$form = $crawler->selectButton($this->submitButtonSelector)->form();
} catch (\Exception $exception) {
throw new \Exception($this->submitButtonSelector . ' does not find a form on ' . $this->loginPath);
}
// assert($form, $this->submitButtonSelector . ' does not find a form on ' . $this->loginPath);
$form['_username'] = $username;
$form['_password'] = $plainPassword;
// submit that form
$crawler = $gouteClient->submit($form);
$response = $gouteClient->getResponse();
assert($response->getStatusCode() == 200, substr($response->getContent(), 0, 512) . "\n\n" . $url);
https://github.com/survos/SurvosCrawlerBundle/blob/main/src/Services/CrawlerService.php#L108
I don't love the code, though it's functional. If I could drop it all and replace it with PHPScraper, I would. Of course, if there's anything of value you can grab from my bundle, please do so!
from phpscraper.
Related Issues (20)
- Idea: Allow to select presets of common browser in recent versions
- [Proposal] Exposing Goutte/Client via client() property/callable method HOT 1
- Allow to set cookies
- TypeError HOT 3
- get http status code HOT 7
- Parsing structured data (microdata) HOT 3
- Idea: Discovery Sets
- Idea: Implement low-level util to access the web. HOT 1
- Idea: Directly exposing received headers HOT 1
- What location PHPSCrapper based on? HOT 1
- Docker Composer Install Error HOT 12
- [Request] Add robots.txt parsing HOT 3
- [Request] Sitemap Index Files HOT 2
- Syntax Error when i tried using PHP 7.3 HOT 3
- fabpot/goutte HOT 14
- Spanish web content not displayed correctly '?' is putted instead of the correct character HOT 1
- Fix problems reported by PHPStan HOT 5
- psr/http-message 2.0 compatibility HOT 2
- issue about php scraping api HOT 1
- Scraping a site with CloudFlare protection/redirect returns no results HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from phpscraper.