Zenz David, Piven Sofiia, Zischg Johannes, Kraljevska Melanija
For a university project, we tried to predict prices of Airbnb listings. This task was embedded in the more general business question, of how to increase the platforms attractiveness for hosts. We decided to introduce a system that would help hosts understand how much their apartment would cost in certain conditions and how to maximize this price.
We used publicly available data of Vienna of the Inside AirBnB project, which contains 11,409 listings of apartments, and 74 features:
After an extensive data cleaning and feature engineering process (creation of new features using the presented data set in combination with other publicly available data sources), we ended up with a total of 128 features.
After training different models using 10-Fold Cross Validation, we also tried combining various approaches of variable selection algorithms before trying to find the final model. Hence, we tried to reduce our large set of features. For this we used Spike and Slab variable selection, and were able to reduce the set to 27 predictors. Next we re-run all models, but with fewer features, selected according to our variable selection algorithm, also excluding features with near-zero-variance. The best model in terms of accuracy, a random forest, was selected as the final prediction model.
This led to a substantial improvement in terms of error measures (MAE down from 14.8 to 11.5), and also in terms of explained variance (R^2 up from 63% to 84%) as compared to our baseline modeling.
Finally, we created an R Shiny Show app to play around with parameters that input the model. This allows for better understanding of how various parameters influence the prediction and shape the predicted price.
Find the app here: https://davidzenz.shinyapps.io/airbnb/