Overview
- Create a tool that functions to provide rental price predictions for various properties based on the features possessed by the property.
- Machine learning regression model is used to predict property rental rates.
- Using data sets of San Francisco,CA property rental price.
Code and Resources Used
Python Version: 3.10.3
Packages: pandas, numpy, sklearn, math
Dataset source: Datacamp case study
Processing Data
After collect Dataset, I need to check and clean up the dataset to make sure no missing and anomaly values on Dataset before creating a Machine Learning model. I made the following changes and created the following variables:
- Import Dataset to Data Frame with pandas.
- Fix datatype of every feature.
- Check and dealing with missing values.
- Cleaning object columns and labeled.
- Analyze and resolve anomaly values.
Data Comparation
Before resolve anomaly
After resolve anomaly
Property Location
Data Engineering
From property location we can calculate the distance of property location to downtown. So, i add new feature ‘distance’. Here the heat map of linear correlations
Model and Result
Rank | Model | Score | MAE | RMSE |
---|---|---|---|---|
1 | RandomForestRegressor | 0.575 | 72.24 | 132.62 |
2 | GradientBoostingRegressor | 0.570 | 73.16 | 133.41 |
3 | LinearRegression | 0.445 | 84.19 | 151.52 |
4 | Ridge | 0.445 | 84.19 | 151.51 |
5 | Lasso | 0.445 | 83.69 | 151.59 |
6 | DecisionTreeRegressor | 0.225 | 91.96 | 179.02 |
From several models, RandomForestRegressor obtained the highest accuracy score, with score 57.5%. and with optimization get results:
Model | Score | MAE | RMSE |
---|---|---|---|
RandomForestRegressor | 0.622 | 69.23 | 125.13 |
EDA
Feature importance
Here we can see that the high and low price of property rental is strongly influenced by the number of rooms and sligly influenced by property location.
More rooms, more prices.
Distance? doesn’t really matter
Output Sample
Here, sample distribution of predict and actual price rent.