Calorie prediction using DecisionTreeRegressor, Random Forest Model
-
Requirement
We are running a company which need to predict calories needed by atheletes based on duration, pulse, maxpulse rates of their workouts
We need a ML model which takes (Duration,Pulse,Maxpulse,Calories) as input and predicts Calories burned during athlete workout.
How it Works
We will write Decision Tree Regressor(Supervised Learning Machine Learning Model) & Random Forest Regressor(Supervised Learning Machine Learning Model) which takes (Duration, Pulse, MaxPulse) as input and predict Calories burned by athelete and them compare Mean Absolute Error of both models
Steps
1. We are provided with Raw data(data.csv containing Duration,Pulse,Maxpulse,Calories). We will:
- Clean the raw data ie remove empty rows
|
|
X = Data, y = prediction
|- train_X (traning data 1)
|- val_X (training data 2)
data -|- train_y (prediction for 1)
|- val_y (prediction for 2)
2. Feed model with traning and prediction data3. Now model is ready for predictions, feed the model with real world data. get predictions
4. Measure how far model deviated by comparing predictions from traning, Real world data using mean absolute error.
Model doing prediction
| DecisionTree, Random Forest Regression Models | Description |
|---|---|
|
1. import pandas (We will convert data from csv to
dataframe using pandas.) 2. Use DecisionTreeRegressor, Random Forest from Scikit Learn 2. Import train_test_split to split the data into 2 sets 2. import mean_absolute_error for calculating MEA later in code 3. read_csv() (From Pandas): Read csv data into dataframe using 4. dropna() (From Pandas): Clean the data by removing empty rows We only have 1 set of data, from which we will seperate traning and prediction data 5. Traning data: 'Duration','Pulse','Maxpulse' 6. Prediction data: 'Calories' 7. Create DecisionTreeRegressor, RandomForestRegressor models 8. fit() (From scikit-learn) Feed the training and prediction data to model. Supervised learning 9. predict(on Training data 2) (From scikit-learn) Output of model. Predict the calories based on traninig data. You will see model gives same values for calories as fed. This is overfitting 11. Validate the model using mean absolute error Output: MAE of Random Forest is Far less than Decision Tree
|
Model Validation (Mean Absolute Error = MEA)
-
Predictive Accuracy? What is quality of prediction that model
made? How close are model's predictions to actual result?
How to measure?
Compare predicted values from training data and actual predicted values from real world data. It will mix of good and bad predictions
Looking through a list of 10,000 predicted and actual values would be pointless. We need to summarize this into a single metric.
Mean Absolute Error (MAE): Average of abs value of (actual - predicted)
| Actual Calories | Predicted Calories |
|---|---|
| 200 | 190 |
| 300 | 310 |
Random Forest is better than Decision Tree, due to low MEA