Giter Site home page Giter Site logo

webscrapping_project's Introduction

Web Scrapping

AIM of the project

  • To Extract a list of cottages in France from a popular website using WebScraping technique and saving it in CSV format.
  • The imformation we are extracting are as follows:
    • Name of the cottage
    • The price,
    • Number of bedroom
    • Sleeps
    • Rating of the cottages.
  • website:- https://www.holidayfrancedirect.co.uk/cottages-holidays/index.htm

What is Web Scraping?

WebScraping is the technique from which we can extract data from webpage in automatic by sending a requests to that website. The data we will get is unstructured data with html content and for cleaning that data we can use BeautifulSoup library. there are other ways we can get data is through APIs. Some of the main use case of Webscrapping is Price monitoring, News Gathering, data collection phase of Machine learning etc.

Work Flow

  • Make requests to the website using get() method from the requests library.
  • We will use html parser to parse the Html content.
  • Extract name, price, bedrooms, sleeps and rating of all the cottage from single webpage.
  • Then we will Paginate through all of the webpages and extract all the required information from each page.
  • Then will save the data in csv format by using pandas library..

Use Cases

* Providing quick information to potential clients of various cottages, who are planning for holidays.
* Provide information to a new Buisness owner and competitor looking to get into the field of tourism.

Limitations of WebScraping

* Prior to Scraping website we should check if a website allows to be scrape or not by using /robots.txt,
suppose if we want to look at google if it allows it to be scraped or not then we will use www.google.com/robots.txt 
* Another limitation is that if you provide to much requests at bulk, then website might get slow and the website admin might
block your IP address from acessing the website.

Libraries Used

webscrapping_project's People

Contributors

mhtag avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.