Giter Site home page Giter Site logo

gilbertekalea / booking.com_crawler Goto Github PK

View Code? Open in Web Editor NEW
8.0 1.0 0.0 22.88 MB

An advanced booking.com scraper. Collect property name, review score, property price and property address . Download property images and relevant url.

License: MIT License

Python 99.61% C 0.25% JavaScript 0.01% PowerShell 0.11% Batchfile 0.01%
selenium-webdriver selenium python crawler bot booking-website booking-bot

booking.com_crawler's Introduction

Booking.com_crawler

An advanced web scraper for extracting hotel data from Booking.com. No sign up or log in required. The code is meant to be simple, easy to use and modify. However,there are few configuration and setups that are necessary for the code program to work.

Please read the following the following sections carefully.

Summary

Booking.com is an online travel agency for lodging reservations & other travel products. The booking.com_crawler is an web scraping bot that crawls the booking.com website to extract hotel data and stores the scrape data in csv file.

Scraper Features

  • Apply filters can be customized
  • Switch browsers tabs
  • Generate date ranges for checkin and checkout
  • Click and follow the link
  • Perform Pagenation
  • Web automation
  • Data conversion - get in csv format or json format.
  • Proxy - not yet implemented

Data Features

  • city_name
  • property_name
  • property_description
  • property_images
  • property_url_link
  • property_address
  • location
  • property_price
  • property_type
  • property_score
  • checkin_date
  • checkout_date
  • number_of_adults
  • number_of_rooms

Getting Started

1. Clone the repository

To clone this repository using Git, use

 git clone https://github.com/gilbertekalea/booking.com_crawler.git

2. Installing Dependencies

The official python package manager for installing dependecies is pip.

If you're new to python please checkout this article on how to install pip

Activate Virtual Environment

To activate virtual environment run the following script in command line. Please refer here Python Virtual Environment on how to activate venv in your machine.

For windows powershell :

my_bot\project_folder_dir> venv\Scripts\activate.ps1

Now install the dependencies using the requirements.txt file.

        my_bot\project_folder_dir> pip -r requirements.txt

Selenium

In order for this project to work in your computer; You need to have a selenium and python installed in your computer. I assume if you are interested in this project,you already know the basics of python and you have python installed.

For window users: Open windows terminal and open project directory.

pip install selenium

Drivers

Selenium requires a driver to interface with the chosen browser. Firefox, for example, requires geckodriver, which needs to be installed before the below examples can be run. Make sure it’s in your PATH, e. g., place it in /usr/bin or /usr/local/bin.

Read more about webdrivers here Selenium Installation and Selenium Official documentation

Downloading WebDriver

This project uses chromedriver. I understand that you're using a different browser;

Here are download links for most popular browsers.

Chrome

Firefox

Edge

Safari

Once you download your prefered driver; You can either save the .exe file in your project folder or you can save it somewhere in the your computer and provide the path. I recommend you save it in a different folder within the project folder or somewhere in your computer and use system path methods to access it.

Quick Guide

client_input folder

The first thing you would want to do is to set your variables. These will set the foundation on what the bot should do in terms which city to enter in booking.com search box, generating date ranges etc. Open client_input/destination_param.csv file and fill the data for the following required variables.

  place - Where you want to go, prefered to enter a city name. 
  start_month - lays the foundation on where to start the checkin dates and updates the checkout. 
  start_year - the start year.  
  duration - How long is your stay. The duration helps the bot to generate date ranges starting from next_month and set checkin and checkout dates. 
  adults - Number of adult,
  rooms - number 0f rooms

run the bot

To run the bot you simply type

  python runbot.py

The bot you automatically open your boooking.com in chrome browser window.

booking.com_crawler's People

Contributors

gilbertekalea avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

booking.com_crawler's Issues

'Booking' object has no attribute 'find_element_by_css_selector'

Hi,
I am using window for this project
Then I run python runbot.py
But getting this error:
Traceback (most recent call last):
File "d:\visual code\danang aihub\booking.com_crawler\runbot.py", line 35, in
bot.change_currency(currency="USD")
File "d:\visual code\danang aihub\booking.com_crawler\booking\booking.py", line 79, in change_currency
currency_element = self.find_element_by_css_selector(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Booking' object has no attribute 'find_element_by_css_selector'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "d:\visual code\danang aihub\booking.com_crawler\runbot.py", line 13, in
with Booking() as bot:
TypeError: Booking.exit() takes 1 positional argument but 4 were given

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "d:\visual code\danang aihub\booking.com_crawler\runbot.py", line 55, in
raise SyntaxError
SyntaxError: None

Am I wrong somewhere, please help.

No such file or directory: 'chromedriver'

Hi,
I am using Macbook and downloaded chromedriver from here https://chromedriver.chromium.org/downloads
Then I add the path to the driver_path variable in the booking.py file
But getting this error:
File "/Users/nguyenthiphuonghao/Desktop/mypython/lib/python3.9/site-packages/selenium/webdriver/common/service.py", line 71, in start
self.process = subprocess.Popen(cmd, env=self.env,
File "/opt/homebrew/Cellar/[email protected]/3.9.17_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 951, prints init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/opt/homebrew/Cellar/[email protected]/3.9.17_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 1837, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'chromedriver'

Am I wrong somewhere, please help.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.