Master Thesis of our Masters in Business Analytics. The focus of this project is the city of Madrid, Spain.
Title in Spanish: Predicción del precio de la vivienda mediante el uso de métodos estadísticos espaciales.
Title in English: Prediction of house prices through the use of spatial statistical methods.
DISCLAIMER: educational purposes only. Some data files were excluded from the repository in order to respect certain privacy laws. This repository does not represent the Contributors' or any other person's or institution's political or personal views or opinions.
The information contained in this repository is for general information and educational purposes only. While we – the Contributors – endeavour to keep the information up to date and correct, we make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability or availability with respect to the content or the websites, services, or related graphics contained on this repository for any purpose. Any reliance you place on such information is therefore strictly at your own risk. In no event will we be liable for any loss or damage including without limitation, indirect or consequential loss or damage, or any loss or damage whatsoever arising from loss of data or profits arising out of, or in connection with, the use of the information stated in this repository.
The housing market offers great opportunities to those who know how to take advantage of them. In this document we propose a spatial statistical study of the price of housing aimed both at obtaining an accurate prediction and at ensuring that said prediction is based on a solid and stable statistical model throughout the space; guaranteeing acceptable results even when trying to predict based on the data of a new house, totally unrelated to those used in the modeling phase. Thanks to these techniques, we intend to offer a different point of view, basing our study on the influence that the price of a house suffers due to the value of its neighboring homes, and thus proposing a new framework of work and research little explored compared to other more traditional statistical models used for this topic.
- Generalized Linear Model (GLM)
- Spatial Generalized Linear Model (GLM)
- Spatial AutoRegressive (SAR)
-
Idealista API: allowed us to get 50 property records per GET request, up to 100 requests per month. (Requested from Idealista Labs). Requested in February 2020.
-
ALL-JSON-FILES.csv: all JSON data collected from the API, united into one single file.
-
Datos_abiertos_Red_de_Metros.zip: Metro Madrid Shapefile. Source: https://data-crtm.opendata.arcgis.com/datasets/m4-accesos?geometry=-3.730%2C40.417%2C-3.669%2C40.429. (Powered by CRTM). Downloaded in June 2020.
-
Madrid_Postal_Codes.zip: Postal Codes of Madrid Shapefile. Source: https://www.madrid.org/nomecalles/DescargaBDTCorte.icm. (Centro Regional de Información Cartográfica. Comunidad de Madrid). Downloaded in June 2020.
-
Salud_Farmacias.zip: Pharmacies in Madrid Shapefile. Source: https://www.madrid.org/nomecalles/DescargaBDTCorte.icm. (Centro Regional de Información Cartográfica. Comunidad de Madrid). Downloaded in June 2020.
-
Educacion_Centros.zip: Public Education Centres in Madrid Shapefile. Source: https://www.madrid.org/nomecalles/DescargaBDTCorte.icm. (Centro Regional de Información Cartográfica. Comunidad de Madrid). Downloaded in June 2020.
Data downloaded over time (API requests) as JSON files with Idealista_API.R, then unified all records and deleted duplicates (by propertyCode), and finally converted to DataFrame with JSON2DF.ipynb on Google Colab.
Made sure that all datatypes are appropriate (e.g. size = number, district = character).
Data sources which are especifically used for the Shiny Dashboard.
-
final_df.rds: DataFrame of consolidated data.
-
final_df_scaled.rds: scaled/normalized DataFrame of consolidated data.
-
final_model.rds: Final statistical model.
-
leaflet_data.Rda: DataFrame with all visual data for the interactive leaflet.
-
cod_postal_analysis.Rda: DataFrame of Madrid postal data with average property price per area.
-
Andrés David DELGADO MOSQUERA
-
Iván Carlos BARRIO HERREROS
-
David Jo Konstantin TOFAN