Giter Site home page Giter Site logo

yjg30737 / pyqt-google-image-crawler Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 15 KB

Crawling image files from Google search result with Python and icrawler

License: MIT License

Python 100.00%
beautifulsoup4 crawler icrawler image-crawler pyqt pyqt5 pyqt5-desktop-application

pyqt-google-image-crawler's Introduction

pyqt-google-image-crawler

Crawling image files from Google search result with Python and icrawler

Requirements

  • PyQt5 >= 5.14 - for GUI support
  • icrawler - main package which used for crawling
  • beautifulsoup4 - essential package for using icrawler

You can run this with clone this repo and install all packages with run

pip install -r requirements.txt

and run

python main.py

That's it, then you can see the result like below.

Explanation

image

As you see you can set parameters such as maximum length, color(including transparent), language.

For who don't understand "color" item of the color parameter - "color" means "any color" here.

Bottom portion of the window, you can add your crawling image's topic to the list. Then pressing the run button, it will keep crawling until task is over!

image

image

You can run this as a background application too. Crawling is very time-consuming job usually so i decided to support that feature

for who wants to get rid of this from foreground.

By the way i'm using icrawler in very basic way. It's not good for collecting massive amount of image, but i'm sure this can give you an idea.

After all Google Image search is one of the accessible image storage in the Internet. Even though this icrawler has some flaws.

pyqt-google-image-crawler's People

Contributors

yjg30737 avatar

Stargazers

 avatar  avatar

Watchers

 avatar

pyqt-google-image-crawler's Issues

TypeError: 'NoneType' object is not iterable

Unfortunately the code doesn't work for me.
When I run the program I get this this message:

">>
started
2023-09-03 20:31:45,500 - INFO - icrawler.crawler - start crawling...
2023-09-03 20:31:45,509 - INFO - icrawler.crawler - starting 1 feeder threads...
2023-09-03 20:31:45,517 - INFO - feeder - thread feeder-001 exit
2023-09-03 20:31:45,517 - INFO - icrawler.crawler - starting 2 parser threads...
2023-09-03 20:31:45,531 - INFO - icrawler.crawler - starting 4 downloader threads...
C:\Users\user\AppData\Roaming\Python\Python311\site-packages\urllib3\connectionpool.py:1095: InsecureRequestWarning: Unverified HTTPS request is being made to host 'www.google.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
warnings.warn(
C:\Users\user\AppData\Roaming\Python\Python311\site-packages\urllib3\connectionpool.py:1095: InsecureRequestWarning: Unverified HTTPS request is being made to host 'consent.google.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
warnings.warn(
2023-09-03 20:31:46,425 - INFO - parser - parsing result page https://www.google.com/search?q=10&ijn=0&start=0&tbs=ic%3Acolor&tbm=isch
Exception in thread parser-001:
Traceback (most recent call last):
File "C:\Program Files\Python311\Lib\threading.py", line 1038, in _bootstrap_inner
self.run()
File "C:\Program Files\Python311\Lib\threading.py", line 975, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\user\AppData\Roaming\Python\Python311\site-packages\icrawler\parser.py", line 94, in worker_exec
for task in self.parse(response, **kwargs):
TypeError: 'NoneType' object is not iterable
2023-09-03 20:31:47,540 - INFO - parser - no more page urls for thread parser-002 to parse
2023-09-03 20:31:47,540 - INFO - parser - thread parser-002 exit
2023-09-03 20:31:50,536 - INFO - downloader - no more download task for thread downloader-003
2023-09-03 20:31:50,537 - INFO - downloader - no more download task for thread downloader-004
2023-09-03 20:31:50,538 - INFO - downloader - no more download task for thread downloader-001
2023-09-03 20:31:50,539 - INFO - downloader - no more download task for thread downloader-002
2023-09-03 20:31:50,539 - INFO - downloader - thread downloader-003 exit
2023-09-03 20:31:50,540 - INFO - downloader - thread downloader-004 exit
2023-09-03 20:31:50,541 - INFO - downloader - thread downloader-001 exit
2023-09-03 20:31:50,542 - INFO - downloader - thread downloader-002 exit
2023-09-03 20:31:51,542 - INFO - icrawler.crawler - Crawling task done!
finished"

Can someone help?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.