Overview
- Create a tool that functions to provide rental price predictions for various properties based on the features possessed by the property.
- Machine learning regression model is used to predict property rental rates.
- Using data sets of San Francisco,CA property rental price.
Code and Resources Used
Python Version: 3.10.3
Packages: pandas, numpy, sklearn, math
Dataset source: Datacamp case study
Processing Data
After collect Dataset, I need to check and clean up the dataset to make sure no missing and anomaly values on Dataset before creating a Machine Learning model. I made the following changes and created the following variables:
- Import Dataset to Data Frame with pandas.
- Fix datatype of every feature.
- Check and dealing with missing values.
- Cleaning object columns and labeled.
- Analyze and resolve anomaly values.
Data Comparation
Before resolve anomaly

After resolve anomaly

Property Location

Data Engineering
From property location we can calculate the distance of property location to downtown. So, i add new feature ‘distance’. Here the heat map of linear correlations

Model and Result
| Rank | Model | Score | MAE | RMSE |
|---|---|---|---|---|
| 1 | RandomForestRegressor | 0.575 | 72.24 | 132.62 |
| 2 | GradientBoostingRegressor | 0.570 | 73.16 | 133.41 |
| 3 | LinearRegression | 0.445 | 84.19 | 151.52 |
| 4 | Ridge | 0.445 | 84.19 | 151.51 |
| 5 | Lasso | 0.445 | 83.69 | 151.59 |
| 6 | DecisionTreeRegressor | 0.225 | 91.96 | 179.02 |
From several models, RandomForestRegressor obtained the highest accuracy score, with score 57.5%. and with optimization get results:
| Model | Score | MAE | RMSE |
|---|---|---|---|
| RandomForestRegressor | 0.622 | 69.23 | 125.13 |
EDA
Feature importance

Here we can see that the high and low price of property rental is strongly influenced by the number of rooms and sligly influenced by property location.
More rooms, more prices.

Distance? doesn’t really matter

Output Sample
Here, sample distribution of predict and actual price rent.
