toby-p / rightmove_webscraper.py Goto Github PK

Python class to scrape data from rightmove.co.uk and return listings in a pandas DataFrame object

License: MIT License

Python 100.00%

rightmove webscraper pandas pandas-dataframe csv python python3 data-science data-analysis data-mining

rightmove_webscraper.py's Introduction

rightmove-webscraper

rightmove.co.uk is one of the UK's largest property listings websites, hosting thousands of listings of properties for sale and to rent.

rightmove_webscraper.py is a simple Python interface to scrape property listings from the website and prepare them in a Pandas dataframe for analysis.

Installation

Version 1.1 is available to install via Pip:

pip install -U rightmove-webscraper

Scraping property listings

Go to rightmove.co.uk and search for whatever region, postcode, city, etc. you are interested in. You can also add any additional filters, e.g. property type, price, number of bedrooms, etc.

Run the search on the rightmove website and copy the URL of the first results page.
Create an instance of the class with the URL as the init argument.

from rightmove_webscraper import RightmoveData

url = "https://www.rightmove.co.uk/property-for-sale/find.html?searchType=SALE&locationIdentifier=REGION%5E94346"
rm = RightmoveData(url)

What will be scraped?

When a RightmoveData instance is created it automatically scrapes every page of results available from the search URL. However please note that rightmove restricts the total possible number of results pages to 42. Therefore if you perform a search which could theoretically return many thousands of results (e.g. "all rental properties in London"), in practice you are limited to only scraping the first 1050 results (42 pages * 25 listings per page = 1050 total listings). A couple of suggested workarounds to this limitation are:

Reduce the search area and perform multiple scrapes, e.g. perform a search for each London borough instead of 1 search for all of London.
Add a search filter to shorten the timeframe in which listings were posted, e.g. search for all listings posted in the past 24 hours, and schedule the scrape to run daily.

Finally, note that not every piece of data listed on the rightmove website is scraped, instead it is just a subset of the most useful features, such as price, address, number of bedrooms, listing agent. If there are additional data items you think should be scraped, please submit an issue or even better go find the xml path and submit a pull request with the changes.

Accessing data

The following instance methods and properties are available to access the scraped data.

Full results as a Pandas.DataFrame

rm.get_results.head()

	price	type	address	url	agent_url	postcode	full_postcode	number_bedrooms	search_date
0	3400000.0	2 bedroom apartment for sale	Switch House East, Battersea Power Station, SW11	http://www.rightmove.co.uk/properties/121457195#/?channel=RES_BUY	http://www.rightmove.co.uk/estate-agents/agent/JLL/London-Residential-Developments-100183.html	SW11	NaN	2.0	2022-03-24 09:40:13.769706
1	11080000.0	Property for sale	Battersea Power Station, Circus Road East, London	http://www.rightmove.co.uk/properties/118473812#/?channel=RES_BUY	http://www.rightmove.co.uk/estate-agents/agent/Moveli/London-191324.html	NaN	NaN	NaN	2022-03-24 09:40:13.769706
2	9950000.0	5 bedroom apartment for sale	888 Scott House, Battersea Power Station, SW11	http://www.rightmove.co.uk/properties/89344718#/?channel=RES_BUY	http://www.rightmove.co.uk/estate-agents/agent/Prestigious-Property-Ltd/Ruislip-67965.html	SW11	NaN	5.0	2022-03-24 09:40:13.769706
3	9200000.0	3 bedroom penthouse for sale	Battersea Power Station, Nine Elms, London SW8	http://www.rightmove.co.uk/properties/114236963#/?channel=RES_BUY	http://www.rightmove.co.uk/estate-agents/agent/Copperstones/London-82091.html	SW8	NaN	3.0	2022-03-24 09:40:13.769706
4	9000000.0	6 bedroom apartment for sale	Scott House, Battersea Power Station, SW11	http://www.rightmove.co.uk/properties/107110697#/?channel=RES_BUY	http://www.rightmove.co.uk/estate-agents/agent/Dockleys/London-174305.html	SW11	NaN	6.0	2022-03-24 09:40:13.769706

Average price of all listings scraped

rm.average_price

1650065.841025641

Total number of listings scraped

rm.results_count

195

Summary statistics

By default shows the number of listings and average price grouped by the number of bedrooms:

rm.summary()

	number_bedrooms	count	mean
0	0	39	9.119231e+05
1	1	46	1.012935e+06
2	2	88	1.654237e+06
3	3	15	3.870867e+06
4	4	2	2.968500e+06
5	5	1	9.950000e+06
6	6	1	9.000000e+06

Alternatively group the results by any other column from the .get_results DataFrame, for example by postcode:

rm.summary(by="postcode")

	postcode	count	mean
0	SW11	76	1.598841e+06
1	SW8	28	2.171357e+06

Legal

@toddy86 has pointed out per the terms and conditions here the use of webscrapers is unauthorised by rightmove. So please don't use this package!

rightmove_webscraper.py's People

Contributors

Stargazers

Watchers

Forkers

70030015 duccioa franklin993 gheydon meleantonio pau1mi11er optionalg fabiengueret yerikz radski jean-chretien cottrell guptaaks007 jbmcclean osgirl kmsmgsh csfyrakis zassa norbertsiwiec tosukriti yishairasowsky harvsg joe-nano chhavai claudenovamb fzlondon theonus1001 p2327 manelbdcosta agamat samhastings1066 lkaihua abdulrafay sanardi dmitriyg228 blutooth mrjps fivosts paulliecoetser blademaster680 mejihero olly374 dbxnr fmdefranca markxgold queenie54719 benliong jmdoyle duncan-hunter ksmksm85 mattjung nathan-d gormisdomai r00tmebaby matwasilewski connorjones1991 cloudlassouk gfcq88 andreas-armstrong elsavino edenlau metemu alelom angela-src olliepage tobygodwin idkmanwhoknows dressingsalads eladialiliana micheal0034 hamshaikh bitsym yehiaa williamy2k varin6 aymencito rsmahabir nmduc t00fy briggysmalls valeryk13 antonliashenko faren90 burtonrj chloerulesok xiaodongliang2018 thesekyi oscarpowell hyperchriskibasi wtflop cwilko hen0014 lcebaman kangqiwang schnehowebking rahulpdev-forks banquetofbeggars yayuelaurazhou seeli-rag thejustinjames

rightmove_webscraper.py's Issues

Cannot run latest version - connection

Has this script been shut-down/ no longer working?
When running the demo script and identical search I get the following error.

ConnectionError: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))

Limited number of rows

When searching the dataframe returned by get_results only has 1050 rows

Any way to increase this to get all results?

More rows returned than exist via the website.

Steps to reproduce:

Use the RightMove website to create your desired URL. At the time of posting, this example returns 51 results:

https://www.rightmove.co.uk/property-for-sale/find.html?locationIdentifier=USERDEFINEDAREA%5E%7B%22polylines%22%3A%22ewwgIn%60~VwehBn%7CFiv%7D%40oyr%40zej%40popEcbbAlq%7DCi__Cuy%7CCuccCmtvKx%7DoEw%7BsQnjmMiljJlk%7DJbkeDjbzGzren%40kxyUcyzCm%60yCup~H%22%7D&minBedrooms=3&maxPrice=399000&propertyTypes=detached&secondaryDisplayPropertyType=detachedshouses&maxDaysSinceAdded=1&mustHave=garden&dontShow=newHome%2Cretirement%2CsharedOwnership&furnishTypes=&keywords=

Use above URL in your Python, e.g:
results = RightmoveData(url).get_results
Count the rows, in this case the result is 55 (still 51 via website, no cache). As you can see, there are 4 "extra" rows:
len(results.index)

I've reproduced this using multiple random URLs, even in the dead of night, and always end up with a handful of "extra" rows compared to the website. Any ideas please?

Cannot install webscraper

I'm trying to install the webscraper using pip install -U rightmove-webscraper however I'm getting multiple command errors after the "Installing build dependencies".

I'm a little new to python so trying to understand this but it seems like a lot of the errors are due to "no module named 'numpy.distutils._msvcompiler' in numpy.distutils; trying from distutils customize MSVCCompiler" and then something about the atlas libraries not found.

Any direction / guidance would be immensely appreciated.

Keywords support

I know there's no bandwidth for this project right now, so just an idea.

It'd be handy to be able to filter results by keywords, as supported by the website itself.

URL params example, using keywords "acre" and "acres": &keywords=acre%2Cacres

Fails on import

Seems there are some syntax errors in innit.py

Traceback (most recent call last):
  File "scrape.py", line 7, in <module>
    from rightmove_webscraper import rightmove_data
  File "/home/pi/notebooks/HouseData/HouseFinderENV/lib/python3.5/site-packages/rightmove_webscraper/__init__.py", line 65
    raise ValueError(f"Invalid rightmove search URL:\n\n\t{self.url}")

then on fixing that

Traceback (most recent call last):
  File "scrape.py", line 7, in <module>
    import rightmove_webscraper
  File "/home/pi/notebooks/HouseData/HouseFinderENV/lib/python3.5/site-packages/rightmove_webscraper/__init__.py", line 99
    assert by in self.get_results.columns, f"Column not found in `get_results`: {by}"
                                                                                    ^
SyntaxError: invalid syntax

Violation of Terms of Service

Hi,

I just wanted to make everyone aware that the use of a web scraper on the rightmove.co.uk website is now against their terms of service.

It previously was allowed and I too had assumed it was still allowed. However, if you read the later comments on the below stackoverflow thread (don't just read the accepted solution) and read their ToS, they now explicitly ban the use of any and all web scrapers.

https://stackoverflow.com/questions/36662524/rightmove-api-and-scraping-technical-and-legal
https://www.rightmove.co.uk/this-site/terms-of-use.html

Just a heads up to anyone who might be wanting to use this library.

Todd

sale_object.rent_or_sale

Hi,

I've been trying to get the code up and running, but I can't work out how to resolve this error

sale_object.rent_or_sale
AttributeError: 'rightmove_data' object has no attribute 'rent_or_sale'

I'm probably being stupid, but if you have any advice on it I'd appreciate it.

Cheers

floorplan urls no longer working

Looks like the xpath changed:

xp_floorplan_url = """//*[@id="floorplanTabs"]/div[2]/div[2]/img/@src"""

id floorplanTabs is no longer available on the main page. On inspecting the elements there appears to be an attempt at obfuscation.

This should be able to be fixed by replacing with:
xp_floorplan_url = """//*[contains(@alt, 'Floorplan')]/@src"""

Commercial property

Hi there,
This is a great package !
I have tried property type as commercial which doesnt work.
Any pointers?

Additional columns for data. I can see only 8

Hi, first of all, many thanks for this, loving the tool.
I would find very useful to get access to the columns sold price and year sold for properties for sale.

Is there any way I can do this?
Also, The address doesn't return the house number or the full post code, is there a way around this?

Many thanks

No longer appears to return postcode

This only seems to have changed in the last week.

Not a full list

This is a nice program and was testing it out, I notice that only the first 29 rows and then the last 29 rows are returned when doing a .get_results,
Is this an issue, or is it something done by right move?

Error - "local variable 'xp_prices' referenced before assignment"

Hello,

Below piece of code produces following error:

 93         Create data lists from xpaths:
 94         price_pcm = tree.xpath(xp_prices)
 95         titles = tree.xpath(xp_titles)
 96         addresses = tree.xpath(xp_addresses)

UnboundLocalError: local variable 'xp_prices' referenced before assignment

`import pandas as pd
import rightmove_webscraper
import numpy as np

url = "https://www.rightmove.co.uk/new-homes-for-sale/find.html?searchType=SALE&locationIdentifier=REGION%5E87490&insId=1&radius=0.0&minPrice=&maxPrice=&minBedrooms=&maxBedrooms=&displayPropertyType=&maxDaysSinceAdded=&_includeSSTC=on&sortByPriceDescending=&primaryDisplayPropertyType=&secondaryDisplayPropertyType=&oldDisplayPropertyType=&oldPrimaryDisplayPropertyType=&newHome=true&auction=false"

rightmove_data = rightmove_webscraper.rightmove_data(url)`

Any clues on what has wrong with this?

Best,
Janush

Package not working on M1 processors becasue of Pandas 1.0.3 - can we bump?

Pandas 1.03 (indirectly through older numpy I think?) does not support M1 processors. To workaround this issue, I forked the repo, updated pandas to 1.3.5 and installed from wheel and everything seems to be fine; is there any reason for not updating pandas in the repo?

urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

I think this plugin had died.. i'm unable to run a simple query.

[root ~]# pip3.9 freeze
certifi==2023.7.22
charset-normalizer==3.3.2
idna==3.4
lxml==4.9.3
numpy==1.26.1
pandas==2.1.2
python-dateutil==2.8.2
pytz==2023.3.post1
requests==2.31.0
rightmove-webscraper==1.1.2
six==1.16.0
tzdata==2023.3
urllib3==2.0.7

Floating point page count

In Python 3 result_pages_count returns a floating point number and that makes get_results explode at line 103. In Python 2 it's fine because both the division operands are integers.

I suggest the following diff, using explicit floor division. This fixes Python 3 and also works in Python 2.

diff --git a/rightmove_webscraper.py b/rightmove_webscraper.py
index aabb56e..837b582 100644
--- a/rightmove_webscraper.py
+++ b/rightmove_webscraper.py
@@ -37,7 +37,7 @@ class rightmove_data(object):
         There are 24 results on each results page, but note that the
         rightmove website limits results pages to a maximum of 42 pages."""
 
-        page_count = self.results_count() / 24
+        page_count = self.results_count() // 24
         if self.results_count() % 24 > 0:
             page_count += 1

How can I print the results from get_results?

when I do print(rm.get_results()) I get something like this

     price  ...                search_date
0   230000  ... 2020-10-02 11:46:06.330777
1   715000  ... 2020-10-02 11:46:06.330777

And if I write it to file


f = open("output.html", "a")
f.write(str(rm.get_results))
f.close()

I still get the same thing. How can I see all of the columns?

Memory leak

Running the below on a series of URLs each containing approx 1000 search results causes increasing memory usage until it crashes or freezes. It seems very strange for it to happen in python.

def scrape(searches):
    """ takes a dict 'urls' of  format {'search_name':'url'} of rightmove searches and combines
 them into one pandas dataframe"""
    df = pd.DataFrame(columns= ['price', 'type', 'address', 'url', 'agent_url', 
 'postcode', 'number_bedrooms', 'search_date','search'])

    for search in searches:
        rightmove_object = rightmove_data(searches[search])
        result_df = rightmove_object.get_results
        df = df.append(result_df, ignore_index=True)
        df = df.drop_duplicates().reset_index(drop=True)
        
    return df

I imagine it would take a lot of work to debug, but putting it here to help any future users.

Alternative dataset - sold prices

Toby

Would be interesting to create an additional class to take data from previously sold
https://www.rightmove.co.uk/house-prices/London-87490.html?soldIn=1&page=1

Thoughts?

Full house description

Hi!
With the scraper, would it be possible to add a field which pulls the entire description of the property on the rightmove website? I.E what the agent has written about the house?

Thanks!

ImportError: No module named rightmove_webscraper

Hi there,

I've installed the web scraper using pip and created a file (rightm.py) with the following code to test the scraper:

from rightmove_webscraper import rightmove_data 
url ="https://www.rightmove.co.uk/property-for-sale/find.html?locationIdentifier=OUTCODE%5E2378&insId=2&numberOfPropertiesPerPage=24&areaSizeUnit=sqft&googleAnalyticsChannel=buying"
rightmove_object = rightmove_data(url)
print rightmove_object.get_results

But I keep getting this error:

Traceback (most recent call last):
  File "rightm.py", line 1, in <module>
    from rightmove_webscraper import rightmove_data 
ImportError: No module named rightmove_webscraper

Any ideas why this may be happening? Thanks

Don't seem to be getting all the search results

Thanks for this btw!

I have just tried to use it on this search - http://www.rightmove.co.uk/property-for-sale/find.html?searchType=SALE&locationIdentifier=POSTCODE%5E1266434&insId=1&radius=40.0&minPrice=&maxPrice=&minBedrooms=&maxBedrooms=&displayPropertyType=&maxDaysSinceAdded=&_includeSSTC=on&sortByPriceDescending=&primaryDisplayPropertyType=&secondaryDisplayPropertyType=&oldDisplayPropertyType=&oldPrimaryDisplayPropertyType=&newHome=&auction=false - which has over 120k results, however, when I write the results to CSV I get only 1k results. Is there any reason this might be?

Scrape photo URLs

It'd be pretty handy to return a list of property photo URLs, or at least the primary/featured photo URL, e.g: https://media.rightmove.co.uk/64k/63334/85611534/63334_11482068_IMG_00_0000.jpeg

Anyway to tag auction property?

I am doing some research on different cities' property price.
But the guide price for property is making the statistic unreasonable.
Is there any way to tag the auction property?

Reduced Prices

Is there a way to capture the 'reduced yesterday' attribute in the search?

Want to track, for a given search, number of reductions by time. Txs

2 Questions from Python / Panda Noob

Hi,

Apologies if the formatting is wrong - I'm new to GitHub but have tried to follow the guidelines

2 potentially stupid questions from a Python noob who's also trying to get to grips with Panda.

Question 1: Getting the floorplan from the tree uses the following:

xp_floorplan_url = """//*[@id="floorplanTabs"]/div[2]/div[2]/img/@src"""
floorplan_url = tree.xpath(xp_floorplan_url)

However, in the page source for a sample search (4 results to keep it small) and then the individual property page for one of the results there is no "floorplanTabs".

When I inspect the page in Chrome I can't find "floorplanTabs" either.

Can you explain how this works?

Question 2: What does this mean /div[2]/div[2] in the line below?

xp_floorplan_url = """//*[@id="floorplanTabs"]/div[2]/div[2]/img/@src"""

Many thanks for your help.

Scraper only returns 1050 results

Hi Toby, great scraper you got here, but it seems to be only returning 1050 results, I may just be using it wrong though..

I have loaded the script in PyCharm with the correct version of Python and have downloaded the dependencies.

I have entered this into the Python console:
from rightmove_webscraper import RightmoveData
url = "https://www.rightmove.co.uk/property-for-sale/find.html?locationIdentifier=REGION%5E87490&propertyTypes=&includeSSTC=false&mustHave=&dontShow=&furnishTypes=&keywords="
rm = RightmoveData(url)
rm.results_count

Even if I change the URL, the results still appear to be of 1050. Maybe my approach is wrong?

Changes to add functionality and hold off errors

#!/usr/bin/env python3

# Dependencies
from lxml import html, etree
import requests
import numpy as np
import pandas as pd
import datetime as dt

class _GetDataFromURL(object):
    """This "private" class does all the heavy lifting of fetching data from the
    URL provided, and then returns data to the main `rightmove_data` class
    instance. The reason for this is so that all the validation and web-scraping
    is done when an instance is created, and afterwards the data is accessible
    quickly via methods on the `rightmove_data` instance."""

    def __init__(self, url):
        """Initialize an instance of the scraper by passing a URL from the
        results of a property search on www.rightmove.co.uk."""
        self.url = url
        self.first_page = self.make_request(self.url)
        self.validate_url()
        self.get_results = self.__get_results

    def validate_url(self):
        """Basic validation that the URL at least starts in the right format and
        returns status code 200."""
        real_url = "{}://www.rightmove.co.uk/{}/find.html?"
        protocols = ["http", "https"]
        types = ["property-to-rent", "property-for-sale", "new-homes-for-sale"]
        left_urls = [real_url.format(p, t) for p in protocols for t in types]
        conditions = [self.url.startswith(u) for u in left_urls]
        conditions.append(self.first_page[1] == 200)
        if not any(conditions):
            raise ValueError("Invalid rightmove URL:\n\n\t{}".format(self.url))

    @property
    def rent_or_sale(self):
        """Tag to determine if the search is for properties for rent or sale.
        Required beacuse the Xpaths are different for the target elements."""
        if "/property-for-sale/" in self.url \
        or "/new-homes-for-sale/" in self.url:
             return "sale"
        elif "/property-to-rent/" in self.url:
            return "rent"
        else:
            raise ValueError("Invalid rightmove URL:\n\n\t{}".format(self.url))

    @property
    def results_count(self):
        """Returns an integer of the total number of listings as displayed on
        the first page of results. Note that not all listings are available to
        scrape because rightmove limits the number of accessible pages."""
        tree = html.fromstring(self.first_page[0])
        xpath = """//span[@class="searchHeader-resultCount"]/text()"""
        try:
            return int(tree.xpath(xpath)[0].replace(",", ""))
        except:
            print('error extracting the result count header')
            return 1050

    @property
    def page_count(self):
        """Returns the number of result pages returned by the search URL. There
        are 24 results per page. Note that the website limits results to a
        maximum of 42 accessible pages."""
        page_count = self.results_count // 24
        if self.results_count % 24 > 0: page_count += 1
        # Rightmove will return a maximum of 42 results pages, hence:
        if page_count > 42: page_count = 42
        return page_count

    @staticmethod
    def make_request(url):
        r = requests.get(url)
        # Minimise the amount returned to reduce overheads:
        return r.content, r.status_code

    def get_page(self, request_content):
        """Method to scrape data from a single page of search results. Used
        iteratively by the `get_results` method to scrape data from every page
        returned by the search."""
        # Process the html:
        tree = html.fromstring(request_content)

        # Set xpath for price:
        if self.rent_or_sale == "rent":
            xp_prices = """//span[@class="propertyCard-priceValue"]/text()"""
        elif self.rent_or_sale == "sale":
            xp_prices = """//div[@class="propertyCard-priceValue"]/text()"""

        # Set xpaths for listing title, property address, URL, and agent URL:
        xp_titles = """//div[@class="propertyCard-details"]\
        //a[@class="propertyCard-link"]\
        //h2[@class="propertyCard-title"]/text()"""
        xp_addresses = """//address[@class="propertyCard-address"]//span/text()"""
        xp_weblinks = """//div[@class="propertyCard-details"]\
        //a[@class="propertyCard-link"]/@href"""
        xp_agent_urls = """//div[@class="propertyCard-contactsItem"]\
        //div[@class="propertyCard-branchLogo"]\
        //a[@class="propertyCard-branchLogo-link"]/@href"""
        

        # Create data lists from xpaths:
        price_pcm = tree.xpath(xp_prices)
        titles = tree.xpath(xp_titles)
        addresses = tree.xpath(xp_addresses)
        base = "http://www.rightmove.co.uk"
        weblinks = ["{}{}".format(base, tree.xpath(xp_weblinks)[w]) \
                    for w in range(len(tree.xpath(xp_weblinks)))]
        agent_urls = ["{}{}".format(base, tree.xpath(xp_agent_urls)[a]) \
                      for a in range(len(tree.xpath(xp_agent_urls)))]
        
        #get floorplan from property urls
        floorplan_urls = []
        for weblink in weblinks:
            rc = self.make_request(weblink)
            tree = html.fromstring(rc[0])
        
            xp_floorplan_url = """//*[@id="floorplanTabs"]/div[2]/div[2]/img/@src"""
            floorplan_url = tree.xpath(xp_floorplan_url)
            if floorplan_url == []:
                floorplan_urls.append(np.nan)
            else:
                floorplan_urls.append(floorplan_url[0])
       
        # Store the data in a Pandas DataFrame:
        data = [price_pcm, titles, addresses, weblinks, agent_urls, floorplan_urls]
        temp_df = pd.DataFrame(data)
        temp_df = temp_df.transpose()
        temp_df.columns = ["price", "type", "address", "url", "agent_url", "floorplan_url"]
        
        # Drop empty rows which come from placeholders in the html:
        temp_df = temp_df[temp_df["address"].notnull()]
        return temp_df

    @property
    def __get_results(self):
        """Pandas DataFrame with all results returned by the search."""
        # Create DataFrame of the first page (which has already been requested):
        results = self.get_page(self.first_page[0])

        # Iterate through the rest of the pages scraping results:
        if self.page_count > 1:
            for p in range(1, self.page_count + 1, 1):

                # Create the URL of the specific results page:
                p_url = "{}&index={}".format(str(self.url), str((p * 24)))

                # Make the request:
                rc = self.make_request(p_url)

                # Requests to scrape lots of pages eventually get status 400, so:
                if rc[1] != 200: break

                # Create a temporary dataframe of page results:
                temp_df = self.get_page(rc[0])

                # Concatenate the temporary dataframe with the full dataframe:
                frames = [results, temp_df]
                results = pd.concat(frames)

        # Reset the index:
        results.reset_index(inplace=True, drop=True)

        # Convert price column to numeric type:
        results["price"].replace(regex=True, inplace=True, to_replace=r"\D", value=r"")
        results["price"] = pd.to_numeric(results["price"])

        # Extract postcodes to a separate column:
        pat = r"\b([A-Za-z][A-Za-z]?[0-9][0-9]?[A-Za-z]?)\b"
        results["postcode"] = results["address"].astype(str).str.extract(pat, expand=True)

        # Extract number of bedrooms from "type" to a separate column:
        pat = r"\b([\d][\d]?)\b"
        results["number_bedrooms"] = results.type.astype(str).str.extract(pat, expand=True)
        results.loc[results["type"].astype(str).str.contains("studio", case=False), "number_bedrooms"] = 0

        # Clean up annoying white spaces and newlines in "type" column:
        for row in range(len(results)):
            type_str = results.loc[row, "type"]
            clean_str = type_str.strip("\n").strip()
            results.loc[row, "type"] = clean_str

        # Add column with datetime when the search was run (i.e. now):
        now = dt.datetime.today()
        results["search_date"] = now

        return results

class rightmove_data(object):
    """The `rightmove_data` web scraper collects structured data on properties
    returned by a search performed on www.rightmove.co.uk

    An instance of the class created with a rightmove URL provides attributes to
    easily access data from the search results, the most useful being
    `get_results`, which returns all results as a Pandas DataFrame object.
    """
    def __init__(self, url):
        """Initialize the scraper with a URL from the results of a property
        search performed on www.rightmove.co.uk"""
        self.__request_object = _GetDataFromURL(url)
        self.__url = url

    @property
    def url(self):
        return self.__url

    @property
    def get_results(self):
        """Pandas DataFrame of all results returned by the search."""
        return self.__request_object.get_results

    @property
    def results_count(self):
        """Total number of results returned by `get_results`. Note that the
        rightmove website may state a much higher number of results; this is
        because they artificially restrict the number of results pages that can
        be accessed to 42."""
        return len(self.get_results)

    @property
    def average_price(self):
        """Average price of all results returned by `get_results` (ignoring
        results which don't list a price)."""
        total = self.get_results["price"].dropna().sum()
        return int(total / self.results_count)

    def summary(self, by="number_bedrooms"):
        """Pandas DataFrame summarising the the results by mean price and count.
        By default grouped by the `number_bedrooms` column but will accept any
        column name from `get_results` as a grouper."""
        df = self.get_results.dropna(axis=0, subset=["price"])
        groupers = {"price":["count", "mean"]}
        df = df.groupby(df[by]).agg(groupers).astype(int)
        df.columns = df.columns.get_level_values(1)
        df.reset_index(inplace=True)
        if "number_bedrooms" in df.columns:
            df["number_bedrooms"] = df["number_bedrooms"].astype(int)
            df.sort_values(by=["number_bedrooms"], inplace=True)
        else:
            df.sort_values(by=["count"], inplace=True, ascending=False)
        return df.reset_index(drop=True)

I have made some changes on a fork. Unfortunately as I did a lot of renaming it wouldn't be possible for me to put them in a pull request.

Mostly I butchered in functionality to scrape floor plan URLs and changed some of the pandas processing to cast .astype(str) to prevent some errors I was getting. I wanted to share these changes with you in case they were of use.