Giter Site home page Giter Site logo

Comments (6)

spekulatius avatar spekulatius commented on May 12, 2024

Hmmm, while this should work it's quite a bit work to debug with the site being down (doesn't load for me). Can you bring it back up @tacman ?

from phpscraper.

tacman avatar tacman commented on May 12, 2024

Try now. It's a slow site, at least initially, because it's running on a free heroku dyno. It can take up to 30 seconds to "wake up" if it's been inactive for a while.

I set up a login for you -- [email protected], password: spekulatius

from phpscraper.

spekulatius avatar spekulatius commented on May 12, 2024

Hey @tacman

can you share some more code on how you add this to PHPScraper?

            if ($username) {
                $crawler = $gouteClient->request('GET', $url = $baseUrl . "/login", [
                ]);

// select the form and fill in some values
                $form = $crawler->selectButton('login-btn')->form();
                $form['_username'] = 'user';
                $form['_password'] = 'pass';

// submit that form
                $crawler = $gouteClient->submit($form);
                $response = $gouteClient->getResponse();

Thanks :)

from phpscraper.

tacman avatar tacman commented on May 12, 2024

Well, that's kind of the point of this issue -- I don't know how to do that. I only see how to click links with phpScraper:

https://github.com/spekulatius/PHPScraper/blob/master/src/phpscraper.php#L918

I was hoping there was a way to submit a form, which would keep the cookies for that session. So instead of ->clickLink(), a method like ->submitForm(), when I could send in the credentials, and then load a page and follow links that require authentication.

from phpscraper.

spekulatius avatar spekulatius commented on May 12, 2024

Ah okay, now we are getting a bit closer. I've wondered how you did it. Did you get it working with Goutte only?

from phpscraper.

tacman avatar tacman commented on May 12, 2024

I have a Symfony bundle that crawls a website: https://github.com/survos/SurvosCrawlerBundle

The idea is that if it can create a set of links that are visible (based on different logins), those links can then be used in a simple PHPUnit test. It basically does what almost all testers do in the beginning -- log in, and click blindly on every link. It's amazing how often someone finds a broken page that way.

So I was trying to use PHPScrapper to do that. In the end, I couldn't, so I just used what other tools I had available:

    public function authenticateClient(?string $username = null, string $plainPassword=null): void
    {
        // might be worth checking out: https://github.com/liip/LiipTestFixturesBundle/pull/62#issuecomment-622191412
        static $clients = [];
        if (!array_key_exists($username, $clients)) {
            $gouteClient = new Client();
            $gouteClient
                ->setMaxRedirects(0);
            $this->username = $username;
            $baseUrl = $this->baseUrl;
            $clients[$username] = $gouteClient;
            if ($username) {
                $crawler = $gouteClient->request('GET', $url = $baseUrl . trim($this->loginPath, '/'), [
                    'proxy' => '127.0.0.1:7080'
                ]);

//            dd($crawler, $url);
                $response = $gouteClient->getResponse();
                assert($response->getStatusCode() === 200, "Invalid route: " . $url);
//            dd(substr($response->getContent(),0, 1024), $url, $baseUrl);

// select the form and fill in some values
//                $form = $crawler->filter('login_form')->form();
                try {
                    $form = $crawler->selectButton($this->submitButtonSelector)->form();
                } catch (\Exception $exception) {
                    throw new \Exception($this->submitButtonSelector . ' does not find a form on ' . $this->loginPath);
                }
//                assert($form, $this->submitButtonSelector . ' does not find a form on ' . $this->loginPath);
                    $form['_username'] = $username;
                $form['_password'] = $plainPassword;

// submit that form
                $crawler = $gouteClient->submit($form);
                $response = $gouteClient->getResponse();
                assert($response->getStatusCode() == 200, substr($response->getContent(), 0, 512) . "\n\n" . $url);

https://github.com/survos/SurvosCrawlerBundle/blob/main/src/Services/CrawlerService.php#L108

I don't love the code, though it's functional. If I could drop it all and replace it with PHPScraper, I would. Of course, if there's anything of value you can grab from my bundle, please do so!

from phpscraper.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.