While working as a delivery driver for many years, I kept record of how much I made in tips. I decided to use that information to create a model that predicts daily tips. πŸ€‘
To test the model, click >Here<

Model Features
When I worked, I kept track of the date, day of the week, how many miles I drove, reimbursement I received for mileage, my daily tips, and the total hours I spent on the road. Most of this I categorized as data leakage and had to throw away. However, I was able to extract weather data to get the daily precipitation. I also engineered a new feature to capture the business demand by using the average miles per hour I drove to determine if a day was normal, slow, or busy. That left me with seven features I could use in my model.

target = 'Tips'  
features = ['Day_of_the_week', 'Year', 'Month', 'Day', 'Hours', 'Prep', 'Demand']

Models
As a baseline, I wanted to see how accurate my predictions would be if I selected the mean tip value of $66.34 for every prediction. With that model, my mean absolute error was $25.85. That gave me a good starting point and a benchmark to compare other models to. I also trained a linear regression and random forest model. After testing them on validation data, the linear regression seemed to perform better with a MAE of $13.47.

Insights
I used the eli5 library to compute the permutation importances for my model features. I was pleased the β€œDemand” feature I create was extremely useful in predicting tips. The precipitation was not as important as I though in determining the value of tips.


Check out my deployed application >Here<