Calorie prediction using DecisionTreeRegressor Model
-
Scenario
You are running a company which need to predict calories needed by atheletes based on duration and intensity(pulse, maxpulse) rates of their workouts
You need to develop a ML model which takes (Duration,Pulse,Maxpulse,Calories) as input and predicts Calories burned during athlete workout.
Inputs
You would be provided with data.csv file which contains historical data of atheletes workout sessions
# data.csv
Duration,Pulse,Maxpulse,Calories
60, 110, 130, 409.1
60, 117, 145, 479.0
60, 103, 135, 340.0
45, 109, 175, 282.4
RequirementYou need to write a Machine Learning model which takes (Duration, Pulse, MaxPulse) as input and predict Calories burned by athelete.
This way they can take appropriate calorie intake during/before their workouts
Model doing prediction
| Code | Description |
|---|---|
|
1. import pandas (We will convert data from csv to dataframe using pandas.) 2. Use DecisionTreeRegressor from Scikit Learn 2. import mean_absolute_error for calculating MEA later in code 3. read_csv() (From Pandas): Read csv data into dataframe using 4. dropna() (From Pandas): Clean the data by removing empty rows We only have 1 set of data, from which we will seperate traning and prediction data 5. Traning data: 'Duration','Pulse','Maxpulse' 6. Prediction data: 'Calories' 7. Initialize the model DecisionTreeRegressor 8. fit() (From scikit-learn) Feed the training and prediction data to model. Supervised learning 9. predict() (From scikit-learn) Output of model. Predict the calories based on traninig data. You will see model gives same values for calories as fed. This is overfitting 10. Now model is ready, feed with real world data 11. predict() (From scikit-learn) Model predicts calories from real world data 12. Validate the model using mean absolute error |
Model Validation (MEA)
-
Predictive Accuracy? What is quality of prediction that model made? How close are model's predictions to actual result?
How to measure?
Compare predicted values from training data and actual predicted values from real world data. It will mix of good and bad predictions
Looking through a list of 10,000 predicted and actual values would be pointless. We need to summarize this into a single metric.
Mean Absolute Error (MAE): Average of abs value of (actual - predicted)
| Actual Calories | Predicted Calories |
|---|---|
| 200 | 190 |
| 300 | 310 |