Giter Site home page Giter Site logo

toby-p / rightmove_webscraper.py Goto Github PK

View Code? Open in Web Editor NEW
238.0 25.0 108.0 3.44 MB

Python class to scrape data from rightmove.co.uk and return listings in a pandas DataFrame object

License: MIT License

Python 100.00%
rightmove webscraper pandas pandas-dataframe csv python python3 data-science data-analysis data-mining

rightmove_webscraper.py's Introduction

rightmove-webscraper

Downloads

rightmove.co.uk is one of the UK's largest property listings websites, hosting thousands of listings of properties for sale and to rent.

rightmove_webscraper.py is a simple Python interface to scrape property listings from the website and prepare them in a Pandas dataframe for analysis.

Installation

Version 1.1 is available to install via Pip:

pip install -U rightmove-webscraper

Scraping property listings

  1. Go to rightmove.co.uk and search for whatever region, postcode, city, etc. you are interested in. You can also add any additional filters, e.g. property type, price, number of bedrooms, etc.

  1. Run the search on the rightmove website and copy the URL of the first results page.

  2. Create an instance of the class with the URL as the init argument.

from rightmove_webscraper import RightmoveData

url = "https://www.rightmove.co.uk/property-for-sale/find.html?searchType=SALE&locationIdentifier=REGION%5E94346"
rm = RightmoveData(url)

What will be scraped?

When a RightmoveData instance is created it automatically scrapes every page of results available from the search URL. However please note that rightmove restricts the total possible number of results pages to 42. Therefore if you perform a search which could theoretically return many thousands of results (e.g. "all rental properties in London"), in practice you are limited to only scraping the first 1050 results (42 pages * 25 listings per page = 1050 total listings). A couple of suggested workarounds to this limitation are:

  • Reduce the search area and perform multiple scrapes, e.g. perform a search for each London borough instead of 1 search for all of London.
  • Add a search filter to shorten the timeframe in which listings were posted, e.g. search for all listings posted in the past 24 hours, and schedule the scrape to run daily.

Finally, note that not every piece of data listed on the rightmove website is scraped, instead it is just a subset of the most useful features, such as price, address, number of bedrooms, listing agent. If there are additional data items you think should be scraped, please submit an issue or even better go find the xml path and submit a pull request with the changes.

Accessing data

The following instance methods and properties are available to access the scraped data.

Full results as a Pandas.DataFrame

rm.get_results.head()
price type address url agent_url postcode full_postcode number_bedrooms search_date
0 3400000.0 2 bedroom apartment for sale Switch House East, Battersea Power Station, SW11 http://www.rightmove.co.uk/properties/121457195#/?channel=RES_BUY http://www.rightmove.co.uk/estate-agents/agent/JLL/London-Residential-Developments-100183.html SW11 NaN 2.0 2022-03-24 09:40:13.769706
1 11080000.0 Property for sale Battersea Power Station, Circus Road East, London http://www.rightmove.co.uk/properties/118473812#/?channel=RES_BUY http://www.rightmove.co.uk/estate-agents/agent/Moveli/London-191324.html NaN NaN NaN 2022-03-24 09:40:13.769706
2 9950000.0 5 bedroom apartment for sale 888 Scott House, Battersea Power Station, SW11 http://www.rightmove.co.uk/properties/89344718#/?channel=RES_BUY http://www.rightmove.co.uk/estate-agents/agent/Prestigious-Property-Ltd/Ruislip-67965.html SW11 NaN 5.0 2022-03-24 09:40:13.769706
3 9200000.0 3 bedroom penthouse for sale Battersea Power Station, Nine Elms, London SW8 http://www.rightmove.co.uk/properties/114236963#/?channel=RES_BUY http://www.rightmove.co.uk/estate-agents/agent/Copperstones/London-82091.html SW8 NaN 3.0 2022-03-24 09:40:13.769706
4 9000000.0 6 bedroom apartment for sale Scott House, Battersea Power Station, SW11 http://www.rightmove.co.uk/properties/107110697#/?channel=RES_BUY http://www.rightmove.co.uk/estate-agents/agent/Dockleys/London-174305.html SW11 NaN 6.0 2022-03-24 09:40:13.769706

Average price of all listings scraped

rm.average_price

1650065.841025641

Total number of listings scraped

rm.results_count

195

Summary statistics

By default shows the number of listings and average price grouped by the number of bedrooms:

rm.summary()
number_bedrooms count mean
0 0 39 9.119231e+05
1 1 46 1.012935e+06
2 2 88 1.654237e+06
3 3 15 3.870867e+06
4 4 2 2.968500e+06
5 5 1 9.950000e+06
6 6 1 9.000000e+06

Alternatively group the results by any other column from the .get_results DataFrame, for example by postcode:

rm.summary(by="postcode")
postcode count mean
0 SW11 76 1.598841e+06
1 SW8 28 2.171357e+06

Legal

@toddy86 has pointed out per the terms and conditions here the use of webscrapers is unauthorised by rightmove. So please don't use this package!

rightmove_webscraper.py's People

Contributors

cottrell avatar csfyrakis avatar dependabot[bot] avatar toby-p avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rightmove_webscraper.py's Issues

Cannot run latest version - connection

Has this script been shut-down/ no longer working?
When running the demo script and identical search I get the following error.

ConnectionError: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))

Limited number of rows

Hi

When searching the dataframe returned by get_results only has 1050 rows

Any way to increase this to get all results?

More rows returned than exist via the website.

Steps to reproduce:

  1. Use the RightMove website to create your desired URL. At the time of posting, this example returns 51 results:

https://www.rightmove.co.uk/property-for-sale/find.html?locationIdentifier=USERDEFINEDAREA%5E%7B%22polylines%22%3A%22ewwgIn%60~VwehBn%7CFiv%7D%40oyr%40zej%40popEcbbAlq%7DCi__Cuy%7CCuccCmtvKx%7DoEw%7BsQnjmMiljJlk%7DJbkeDjbzGzren%40kxyUcyzCm%60yCup~H%22%7D&minBedrooms=3&maxPrice=399000&propertyTypes=detached&secondaryDisplayPropertyType=detachedshouses&maxDaysSinceAdded=1&mustHave=garden&dontShow=newHome%2Cretirement%2CsharedOwnership&furnishTypes=&keywords=

  1. Use above URL in your Python, e.g:
    results = RightmoveData(url).get_results

  2. Count the rows, in this case the result is 55 (still 51 via website, no cache). As you can see, there are 4 "extra" rows:
    len(results.index)

I've reproduced this using multiple random URLs, even in the dead of night, and always end up with a handful of "extra" rows compared to the website. Any ideas please?

Cannot install webscraper

Hi

I'm trying to install the webscraper using pip install -U rightmove-webscraper however I'm getting multiple command errors after the "Installing build dependencies".

I'm a little new to python so trying to understand this but it seems like a lot of the errors are due to "no module named 'numpy.distutils._msvcompiler' in numpy.distutils; trying from distutils customize MSVCCompiler" and then something about the atlas libraries not found.

Any direction / guidance would be immensely appreciated.

Keywords support

I know there's no bandwidth for this project right now, so just an idea.

It'd be handy to be able to filter results by keywords, as supported by the website itself.

URL params example, using keywords "acre" and "acres": &keywords=acre%2Cacres

Fails on import

Seems there are some syntax errors in innit.py

Traceback (most recent call last):
  File "scrape.py", line 7, in <module>
    from rightmove_webscraper import rightmove_data
  File "/home/pi/notebooks/HouseData/HouseFinderENV/lib/python3.5/site-packages/rightmove_webscraper/__init__.py", line 65
    raise ValueError(f"Invalid rightmove search URL:\n\n\t{self.url}")

then on fixing that

Traceback (most recent call last):
  File "scrape.py", line 7, in <module>
    import rightmove_webscraper
  File "/home/pi/notebooks/HouseData/HouseFinderENV/lib/python3.5/site-packages/rightmove_webscraper/__init__.py", line 99
    assert by in self.get_results.columns, f"Column not found in `get_results`: {by}"
                                                                                    ^
SyntaxError: invalid syntax

Violation of Terms of Service

Hi,

I just wanted to make everyone aware that the use of a web scraper on the rightmove.co.uk website is now against their terms of service.

It previously was allowed and I too had assumed it was still allowed. However, if you read the later comments on the below stackoverflow thread (don't just read the accepted solution) and read their ToS, they now explicitly ban the use of any and all web scrapers.

https://stackoverflow.com/questions/36662524/rightmove-api-and-scraping-technical-and-legal
https://www.rightmove.co.uk/this-site/terms-of-use.html

Just a heads up to anyone who might be wanting to use this library.

Todd

sale_object.rent_or_sale

Hi,

I've been trying to get the code up and running, but I can't work out how to resolve this error

sale_object.rent_or_sale
AttributeError: 'rightmove_data' object has no attribute 'rent_or_sale'

I'm probably being stupid, but if you have any advice on it I'd appreciate it.

Cheers

floorplan urls no longer working

Looks like the xpath changed:

xp_floorplan_url = """//*[@id="floorplanTabs"]/div[2]/div[2]/img/@src"""

id floorplanTabs is no longer available on the main page. On inspecting the elements there appears to be an attempt at obfuscation.

This should be able to be fixed by replacing with:
xp_floorplan_url = """//*[contains(@alt, 'Floorplan')]/@src"""

Commercial property

Hi there,
This is a great package !
I have tried property type as commercial which doesnt work.
Any pointers?

Additional columns for data. I can see only 8

Hi, first of all, many thanks for this, loving the tool.
I would find very useful to get access to the columns sold price and year sold for properties for sale.

Is there any way I can do this?
Also, The address doesn't return the house number or the full post code, is there a way around this?

Many thanks

Not a full list

This is a nice program and was testing it out, I notice that only the first 29 rows and then the last 29 rows are returned when doing a .get_results,
Is this an issue, or is it something done by right move?

Error - "local variable 'xp_prices' referenced before assignment"

Hello,

Below piece of code produces following error:

 93         Create data lists from xpaths:
 94         price_pcm = tree.xpath(xp_prices)
 95         titles = tree.xpath(xp_titles)
 96         addresses = tree.xpath(xp_addresses)

UnboundLocalError: local variable 'xp_prices' referenced before assignment

`import pandas as pd
import rightmove_webscraper
import numpy as np

url = "https://www.rightmove.co.uk/new-homes-for-sale/find.html?searchType=SALE&locationIdentifier=REGION%5E87490&insId=1&radius=0.0&minPrice=&maxPrice=&minBedrooms=&maxBedrooms=&displayPropertyType=&maxDaysSinceAdded=&_includeSSTC=on&sortByPriceDescending=&primaryDisplayPropertyType=&secondaryDisplayPropertyType=&oldDisplayPropertyType=&oldPrimaryDisplayPropertyType=&newHome=true&auction=false"

rightmove_data = rightmove_webscraper.rightmove_data(url)`

Any clues on what has wrong with this?

Best,
Janush

Floating point page count

In Python 3 result_pages_count returns a floating point number and that makes get_results explode at line 103. In Python 2 it's fine because both the division operands are integers.

I suggest the following diff, using explicit floor division. This fixes Python 3 and also works in Python 2.

diff --git a/rightmove_webscraper.py b/rightmove_webscraper.py
index aabb56e..837b582 100644
--- a/rightmove_webscraper.py
+++ b/rightmove_webscraper.py
@@ -37,7 +37,7 @@ class rightmove_data(object):
         There are 24 results on each results page, but note that the
         rightmove website limits results pages to a maximum of 42 pages."""
 
-        page_count = self.results_count() / 24
+        page_count = self.results_count() // 24
         if self.results_count() % 24 > 0:
             page_count += 1
 

How can I print the results from get_results?

when I do print(rm.get_results()) I get something like this

     price  ...                search_date
0   230000  ... 2020-10-02 11:46:06.330777
1   715000  ... 2020-10-02 11:46:06.330777

And if I write it to file


f = open("output.html", "a")
f.write(str(rm.get_results))
f.close()

I still get the same thing. How can I see all of the columns?

Memory leak

Running the below on a series of URLs each containing approx 1000 search results causes increasing memory usage until it crashes or freezes. It seems very strange for it to happen in python.

def scrape(searches):
    """ takes a dict 'urls' of  format {'search_name':'url'} of rightmove searches and combines
 them into one pandas dataframe"""
    df = pd.DataFrame(columns= ['price', 'type', 'address', 'url', 'agent_url', 
 'postcode', 'number_bedrooms', 'search_date','search'])

    for search in searches:
        rightmove_object = rightmove_data(searches[search])
        result_df = rightmove_object.get_results
        df = df.append(result_df, ignore_index=True)
        df = df.drop_duplicates().reset_index(drop=True)
        
    return df

I imagine it would take a lot of work to debug, but putting it here to help any future users.

Full house description

Hi!
With the scraper, would it be possible to add a field which pulls the entire description of the property on the rightmove website? I.E what the agent has written about the house?

Thanks!

ImportError: No module named rightmove_webscraper

Hi there,

I've installed the web scraper using pip and created a file (rightm.py) with the following code to test the scraper:

from rightmove_webscraper import rightmove_data 
url ="https://www.rightmove.co.uk/property-for-sale/find.html?locationIdentifier=OUTCODE%5E2378&insId=2&numberOfPropertiesPerPage=24&areaSizeUnit=sqft&googleAnalyticsChannel=buying"
rightmove_object = rightmove_data(url)
print rightmove_object.get_results

But I keep getting this error:

Traceback (most recent call last):
  File "rightm.py", line 1, in <module>
    from rightmove_webscraper import rightmove_data 
ImportError: No module named rightmove_webscraper

Any ideas why this may be happening? Thanks

Don't seem to be getting all the search results

Anyway to tag auction property?

I am doing some research on different cities' property price.
But the guide price for property is making the statistic unreasonable.
Is there any way to tag the auction property?

Reduced Prices

Is there a way to capture the 'reduced yesterday' attribute in the search?

Want to track, for a given search, number of reductions by time. Txs

2 Questions from Python / Panda Noob

Hi,

Apologies if the formatting is wrong - I'm new to GitHub but have tried to follow the guidelines

2 potentially stupid questions from a Python noob who's also trying to get to grips with Panda.

Question 1: Getting the floorplan from the tree uses the following:

xp_floorplan_url = """//*[@id="floorplanTabs"]/div[2]/div[2]/img/@src"""
floorplan_url = tree.xpath(xp_floorplan_url)

However, in the page source for a sample search (4 results to keep it small) and then the individual property page for one of the results there is no "floorplanTabs".

When I inspect the page in Chrome I can't find "floorplanTabs" either.

Can you explain how this works?

Question 2: What does this mean /div[2]/div[2] in the line below?

xp_floorplan_url = """//*[@id="floorplanTabs"]/div[2]/div[2]/img/@src"""

Many thanks for your help.

Scraper only returns 1050 results

Hi Toby, great scraper you got here, but it seems to be only returning 1050 results, I may just be using it wrong though..

I have loaded the script in PyCharm with the correct version of Python and have downloaded the dependencies.

I have entered this into the Python console:
from rightmove_webscraper import RightmoveData
url = "https://www.rightmove.co.uk/property-for-sale/find.html?locationIdentifier=REGION%5E87490&propertyTypes=&includeSSTC=false&mustHave=&dontShow=&furnishTypes=&keywords="
rm = RightmoveData(url)
rm.results_count

image

Even if I change the URL, the results still appear to be of 1050. Maybe my approach is wrong?

Changes to add functionality and hold off errors

#!/usr/bin/env python3

# Dependencies
from lxml import html, etree
import requests
import numpy as np
import pandas as pd
import datetime as dt

class _GetDataFromURL(object):
    """This "private" class does all the heavy lifting of fetching data from the
    URL provided, and then returns data to the main `rightmove_data` class
    instance. The reason for this is so that all the validation and web-scraping
    is done when an instance is created, and afterwards the data is accessible
    quickly via methods on the `rightmove_data` instance."""

    def __init__(self, url):
        """Initialize an instance of the scraper by passing a URL from the
        results of a property search on www.rightmove.co.uk."""
        self.url = url
        self.first_page = self.make_request(self.url)
        self.validate_url()
        self.get_results = self.__get_results

    def validate_url(self):
        """Basic validation that the URL at least starts in the right format and
        returns status code 200."""
        real_url = "{}://www.rightmove.co.uk/{}/find.html?"
        protocols = ["http", "https"]
        types = ["property-to-rent", "property-for-sale", "new-homes-for-sale"]
        left_urls = [real_url.format(p, t) for p in protocols for t in types]
        conditions = [self.url.startswith(u) for u in left_urls]
        conditions.append(self.first_page[1] == 200)
        if not any(conditions):
            raise ValueError("Invalid rightmove URL:\n\n\t{}".format(self.url))

    @property
    def rent_or_sale(self):
        """Tag to determine if the search is for properties for rent or sale.
        Required beacuse the Xpaths are different for the target elements."""
        if "/property-for-sale/" in self.url \
        or "/new-homes-for-sale/" in self.url:
             return "sale"
        elif "/property-to-rent/" in self.url:
            return "rent"
        else:
            raise ValueError("Invalid rightmove URL:\n\n\t{}".format(self.url))

    @property
    def results_count(self):
        """Returns an integer of the total number of listings as displayed on
        the first page of results. Note that not all listings are available to
        scrape because rightmove limits the number of accessible pages."""
        tree = html.fromstring(self.first_page[0])
        xpath = """//span[@class="searchHeader-resultCount"]/text()"""
        try:
            return int(tree.xpath(xpath)[0].replace(",", ""))
        except:
            print('error extracting the result count header')
            return 1050

    @property
    def page_count(self):
        """Returns the number of result pages returned by the search URL. There
        are 24 results per page. Note that the website limits results to a
        maximum of 42 accessible pages."""
        page_count = self.results_count // 24
        if self.results_count % 24 > 0: page_count += 1
        # Rightmove will return a maximum of 42 results pages, hence:
        if page_count > 42: page_count = 42
        return page_count

    @staticmethod
    def make_request(url):
        r = requests.get(url)
        # Minimise the amount returned to reduce overheads:
        return r.content, r.status_code

    def get_page(self, request_content):
        """Method to scrape data from a single page of search results. Used
        iteratively by the `get_results` method to scrape data from every page
        returned by the search."""
        # Process the html:
        tree = html.fromstring(request_content)

        # Set xpath for price:
        if self.rent_or_sale == "rent":
            xp_prices = """//span[@class="propertyCard-priceValue"]/text()"""
        elif self.rent_or_sale == "sale":
            xp_prices = """//div[@class="propertyCard-priceValue"]/text()"""

        # Set xpaths for listing title, property address, URL, and agent URL:
        xp_titles = """//div[@class="propertyCard-details"]\
        //a[@class="propertyCard-link"]\
        //h2[@class="propertyCard-title"]/text()"""
        xp_addresses = """//address[@class="propertyCard-address"]//span/text()"""
        xp_weblinks = """//div[@class="propertyCard-details"]\
        //a[@class="propertyCard-link"]/@href"""
        xp_agent_urls = """//div[@class="propertyCard-contactsItem"]\
        //div[@class="propertyCard-branchLogo"]\
        //a[@class="propertyCard-branchLogo-link"]/@href"""
        

        # Create data lists from xpaths:
        price_pcm = tree.xpath(xp_prices)
        titles = tree.xpath(xp_titles)
        addresses = tree.xpath(xp_addresses)
        base = "http://www.rightmove.co.uk"
        weblinks = ["{}{}".format(base, tree.xpath(xp_weblinks)[w]) \
                    for w in range(len(tree.xpath(xp_weblinks)))]
        agent_urls = ["{}{}".format(base, tree.xpath(xp_agent_urls)[a]) \
                      for a in range(len(tree.xpath(xp_agent_urls)))]
        
        #get floorplan from property urls
        floorplan_urls = []
        for weblink in weblinks:
            rc = self.make_request(weblink)
            tree = html.fromstring(rc[0])
        
            xp_floorplan_url = """//*[@id="floorplanTabs"]/div[2]/div[2]/img/@src"""
            floorplan_url = tree.xpath(xp_floorplan_url)
            if floorplan_url == []:
                floorplan_urls.append(np.nan)
            else:
                floorplan_urls.append(floorplan_url[0])
       
        # Store the data in a Pandas DataFrame:
        data = [price_pcm, titles, addresses, weblinks, agent_urls, floorplan_urls]
        temp_df = pd.DataFrame(data)
        temp_df = temp_df.transpose()
        temp_df.columns = ["price", "type", "address", "url", "agent_url", "floorplan_url"]
        
        # Drop empty rows which come from placeholders in the html:
        temp_df = temp_df[temp_df["address"].notnull()]
        return temp_df

    @property
    def __get_results(self):
        """Pandas DataFrame with all results returned by the search."""
        # Create DataFrame of the first page (which has already been requested):
        results = self.get_page(self.first_page[0])

        # Iterate through the rest of the pages scraping results:
        if self.page_count > 1:
            for p in range(1, self.page_count + 1, 1):

                # Create the URL of the specific results page:
                p_url = "{}&index={}".format(str(self.url), str((p * 24)))

                # Make the request:
                rc = self.make_request(p_url)

                # Requests to scrape lots of pages eventually get status 400, so:
                if rc[1] != 200: break

                # Create a temporary dataframe of page results:
                temp_df = self.get_page(rc[0])

                # Concatenate the temporary dataframe with the full dataframe:
                frames = [results, temp_df]
                results = pd.concat(frames)

        # Reset the index:
        results.reset_index(inplace=True, drop=True)

        # Convert price column to numeric type:
        results["price"].replace(regex=True, inplace=True, to_replace=r"\D", value=r"")
        results["price"] = pd.to_numeric(results["price"])

        # Extract postcodes to a separate column:
        pat = r"\b([A-Za-z][A-Za-z]?[0-9][0-9]?[A-Za-z]?)\b"
        results["postcode"] = results["address"].astype(str).str.extract(pat, expand=True)

        # Extract number of bedrooms from "type" to a separate column:
        pat = r"\b([\d][\d]?)\b"
        results["number_bedrooms"] = results.type.astype(str).str.extract(pat, expand=True)
        results.loc[results["type"].astype(str).str.contains("studio", case=False), "number_bedrooms"] = 0

        # Clean up annoying white spaces and newlines in "type" column:
        for row in range(len(results)):
            type_str = results.loc[row, "type"]
            clean_str = type_str.strip("\n").strip()
            results.loc[row, "type"] = clean_str

        # Add column with datetime when the search was run (i.e. now):
        now = dt.datetime.today()
        results["search_date"] = now

        return results

class rightmove_data(object):
    """The `rightmove_data` web scraper collects structured data on properties
    returned by a search performed on www.rightmove.co.uk

    An instance of the class created with a rightmove URL provides attributes to
    easily access data from the search results, the most useful being
    `get_results`, which returns all results as a Pandas DataFrame object.
    """
    def __init__(self, url):
        """Initialize the scraper with a URL from the results of a property
        search performed on www.rightmove.co.uk"""
        self.__request_object = _GetDataFromURL(url)
        self.__url = url

    @property
    def url(self):
        return self.__url

    @property
    def get_results(self):
        """Pandas DataFrame of all results returned by the search."""
        return self.__request_object.get_results

    @property
    def results_count(self):
        """Total number of results returned by `get_results`. Note that the
        rightmove website may state a much higher number of results; this is
        because they artificially restrict the number of results pages that can
        be accessed to 42."""
        return len(self.get_results)

    @property
    def average_price(self):
        """Average price of all results returned by `get_results` (ignoring
        results which don't list a price)."""
        total = self.get_results["price"].dropna().sum()
        return int(total / self.results_count)

    def summary(self, by="number_bedrooms"):
        """Pandas DataFrame summarising the the results by mean price and count.
        By default grouped by the `number_bedrooms` column but will accept any
        column name from `get_results` as a grouper."""
        df = self.get_results.dropna(axis=0, subset=["price"])
        groupers = {"price":["count", "mean"]}
        df = df.groupby(df[by]).agg(groupers).astype(int)
        df.columns = df.columns.get_level_values(1)
        df.reset_index(inplace=True)
        if "number_bedrooms" in df.columns:
            df["number_bedrooms"] = df["number_bedrooms"].astype(int)
            df.sort_values(by=["number_bedrooms"], inplace=True)
        else:
            df.sort_values(by=["count"], inplace=True, ascending=False)
        return df.reset_index(drop=True)

I have made some changes on a fork. Unfortunately as I did a lot of renaming it wouldn't be possible for me to put them in a pull request.

Mostly I butchered in functionality to scrape floor plan URLs and changed some of the pandas processing to cast .astype(str) to prevent some errors I was getting. I wanted to share these changes with you in case they were of use.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.