Towards a Revamped Real Estate Index


Harvard Data Science Capstone Project, Fall 2020

Will Fried
Credit: Pexels
Figure 1: NAR Confidence Index local market report for Boston (2020 Q3)
Figure 2: NAR Confidence Index expected price change (October 2020)
Figure 3: Case-Schiller Home Price Index in Boston metro area since 2000

Modeling Approach


Figure 4: Popularity of search terms in Colorado since March 2016

Forecasting Models

Figure 5: diagram of forecasting methodology

1. Time Series Model

Figure 6: 3-month forecast of BSTS model

2. Google Trends Model

Figure 7: feature set construction
Figure 8: Cross validation MSE of several candidate models
Figure 9: 3-month forecast of final Google Trends model

3. Ensemble Model

Figure 10: 6-month forecast of ensemble model

Census Tract Features

Predictive Model

Figure 11: distribution of ratios of 15 randomly selected census tracts

Interpretable Models

1. Linear Regression

2. Gradient Boosting

Figure 12: Feature importance plot for XGBoost model
Figure 13: Feature contribution plot for two arbitrary census tracts
Figure 14: Visualization of project methodology

Methodology Validation

