Saturday, August 10, 2019

NSW house sale prices data for free

I have a new interest in the prices that houses in my state have sold for recently. Searching for this data turns up many services that provide it but generally they ask for a fee or even a subscription to give access. It seemed to me that surely this is public information that the government has for stamp duty or other reasons.

The information is available on a weekly basis from Property NSW on the Bulk property sales information page. The data "is available under open access licensing as part of the NSW Government Open Data Policy and is subject to the Creative Commons Attribution 4.0 Licence". Great stuff!

Each of the blue buttons is a zip file of .DAT files with sales data in them, the format is a little inconvenient but is well documented here.

To make use of this data, by loading it into a spreadsheet, I've written a little python 3 program that takes a folder with downloaded zip files, extracts the .DAT files, parses them and outputs tab delimited CSV files that will open in a spreadsheet.

# Read house price data files from: https://valuation.property.nsw.gov.au/embed/propertySalesInformation
# B;001;4229165;5;20190805 01:00;;;86;MAURICE RD;POKOLBIN;2320;1.038;H;20190613;20190725;850000;;R;RESIDENCE;;;;0;AP418234;

# Directory that contains the zip files
DOWNLOAD_DIR = "Downloads"
FIELD_NAMES = ["Record Type",
"District Code",
"Property Id.",
"Sale Counter",
"Download Date / Time",
"Property Name",
"Property Unit Number",
"Property House Number",
"Property Street Name",
"Property Locality",
"Property Post Code",
"Area",
"Area Type",
"Contract Date",
"Settlement Date",
"Purchase Price",
"Zoning",
"Nature of Property",
"Primary Purpose",
"Strata Lot Number",
"Component code",
"Sale Code",
"% Interest of Sale",
"Dealing Number"]

import os
import zipfile

def main():
printFieldHeaders()
files = os.listdir(DOWNLOAD_DIR)
for azipfile in files:
zip_file_path = os.path.join(DOWNLOAD_DIR, azipfile)
archive = zipfile.ZipFile(zip_file_path)
data_file_list = archive.namelist()
for data_file in data_file_list:
if data_file.endswith(".DAT"):
for line in archive.open(data_file):
lineStr = line.decode('UTF-8')
if lineStr.startswith("B"):
fields = lineStr.strip().split(";")
for field in fields:
print("%s\t" %field, end='')
print()


def printFieldHeaders():
for fieldName in FIELD_NAMES:
print("%s\t" %fieldName, end='')
print()

if __name__ == "__main__":
main()

No comments: