Small group project made at BeCode. The aim was to scrap real estate data from websites and create a Database of more than 10.000 houses for sale. This will be used later in the formation. Objective hoped by the groupe : 80k.
Main packages used:
- Selenium
- Pandas
- Json
- Request
- BeautifulSoup
To scrap real estate data from websites and create a Dataset
The group working on this project is composed of:
We split up the sites to scraps as following :
Site | |
---|---|
Immoweb | Alain |
LogicImmo | Julien |
ImmoVlan | Jeff |
We were racing all day and night toward the record of 50 000, (that we have beaten)
- We don't already past the captcha of ImmoVlan in time after trying with rotating headers, using selenium.
- We had to adapt our data to a common trunk found in the websites. Hence loosing informations.
- We could try NLP techniques to get more information and filter typos inside the code.
- Most of the websites used having receive updates, the code as is isn't working anymore.
From Monday 3 May to 6 May 2021
Run the Core.database_gen.load_database() to get the final database (the file is database.csv under Data)