guidelines - parallelized 2023

Overview
Getting Started
Understanding the Problem
Task 1: Calculating Overall Customer Satisfaction
Task 2: Calculating Driver Efficiency
Task 3: Analyse Driver Efficiency
Task 4: Calculate Driver Performance
Task 5: Calculate Driver Loyalty
Task 6: Calculate Paycheck for Drivers
Task 7: Calculate Bonus for Drivers
Task 8: Implement ‘insert_analysis_row’
Tying It All Together (5% Bonus Score)
Time to Optimize!
Time Yourself
Submission
Scoring

Results

Parallelized 2023 has ended! See the results!

Overview

You work for a thriving ride-sharing company that connects hundreds of drivers with passengers daily. As part of your role, you need to determine which drivers deserve bonuses based on their performance over the last quarter. To make data-driven decisions regarding the situation, you need to assess driver efficiency and pinpoint reasons for inefficiencies. Additionally, you want to incorporate a way to automate payout calculations that include bonuses for drivers doing well (in terms of efficiency and ratings). Lastly, you’ve been asked to determine the overall customer satisfaction (average overall ratings across all rides) and identify which drivers are negative outliers in this regard (those with very low average ratings).

Getting Started

Please fork this repository: https://github.com/parallelized/parallelized-2023-starter. Here’s a helpful link to work with Git: Git Cheatsheet

Understanding the Problem

Let’s examine our file structure!

parallelized-starter/
│
├── data/
│   ├── 2023-11-02.db         # SQLite database containing driver and ride data
│
├── src/
│   ├── main.py                # Main script for data processing
│   ├── database.py            # Database-related functions for data retrieval and insertion
│   ├── metrics.py             # Functions for calculating efficiency metrics
│
├── results/
│   ├── efficiency_results.csv  # CSV file to store efficiency metrics results

We’ve been given a pretty simple structure - a SQLite database which contains driver data from the last quarter, a main script, and two files for functions needed for the task. The SQLite database has two tables: drivers and rides. Let’s go through a few rows of each.

drivers
rides

Please examine the columns of both tables to understand the parameters we have. For today, our job is to implement 8 main functions. calculate_overall_customer_satisfaction, calculate_driver_effiency, analyse_driver_efficiency, calculate_driver_performance, calculate_driver_loyalty, calculate_paycheck, calculate_bonus and insert_analysis_row Along the way, you can implement any required helper functions as needed (some have been provided already in database.py)

Task 1: Calculating Overall Customer Satisfaction

Implement a function calculate_overall_custom_satisfaction: should get the average ratings overall rides for all drivers (average of ratings in the ‘rides’ table).

Task 2: Calculating Driver Efficiency

Implement a function calculate_driver_effiency: should analyze a driver’s overall efficiency based on their average cost/km. This value will be stored.

Task 3: Analyse Driver Efficiency

Implement a function analyse_driver_efficiency: should analyze why these inefficiencies are occurring in the first place. There are three likely reasons that a driver could be inefficient.

They have a poor average cost/km with a specific vehicle (higher than Rs. 2.7 for 2-wheelers, higher than Rs. 3.6 for 3-wheelers, higher than Rs. 15 for 4-wheelers, and higher than Rs. 8 for other): “HIGHCOST”
They are very slow in terms of average speed across all rides (below 20km/h): “SLOW”
They have faced poor traffic (the majority of their rides are ‘POOR’ or ‘VPOOR’ - consider a 50/50 split poor): “TRAFFIC” You should find which one of these three applies to each driver, and add to a comma-separated string. For eg. if a driver is poor average cost/km and poor traffic then it would be “HIGHCOST,TRAFFIC” This string will eventually be stored.

Task 4: Calculate Driver Performance

Implement a function calculate_driver_performance which should compute the average ratings of a driver. This will be needed eventually to calculate bonuses.

Task 5: Calculate Driver Loyalty

Implement a function calculate_driver_loyalty which should calculate the loyalty of a driver as follows: Loyalty is calculated on a scale of 1 to 5: lowest to highest. It is based on the number of rides.

Less than 200: 1
Between 200-300: 2
Between 300-400: 3
Between 400-500: 4
Above 500: 5

This will eventually be needed to calculate bonuses.

Task 6: Calculate Paycheck for Drivers

Implement a function calculate_paycheck which should calculate the total paycheck for each driver. Drivers are paid with a base fee of Rs. 50 with Rs. 10 per kilometer driven and Rs. 5 per minute driven. Calculate the total paycheck for each driver. This will eventually be stored.

Task 7: Calculate Bonus for Drivers

Implement a function calculate_bonus which should calculate the bonus for each driver. An additional bonus, up to a limit of 65% of their total costs for the quarter, is given. The function should return the bonus value in rupees. It is calculated as per their loyalty, performance, and efficiency.

Loyalty to the percentage of cost added to total bonus → 5: 30%, 4: 20%, 3: 15%, 2: 2.5%, 1: 0% (Eg. a loyalty of 5 with a total cost of Rs. 1200 means that Rs. 360 is added as a bonus)
Performance → >=4.5: 20%, >=3: 10%, Else: None
Efficiency → if ‘SLOW’ or ‘HIGHCOST’ are in their analysis, then there is no bonus given. If no inefficiencies, 10% bonus. If there is only high traffic, a 5% bonus.

Task 8: Implement ‘insert_analysis_row’

Let’s take a look at the below snippet from the table schematic for ‘driver_analyses’, which is the table that you will work with while implementing this function.

CREATE TABLE IF NOT EXISTS driver_analyses (driver_id INTEGER PRIMARY KEY NOT NULL, averageSpeed REAL NOT NULL, inefficiencyReason TEXT, payout REAL NOT NULL, bonus REAL NOT NULL);

Examine the fields in this table. Your function should insert a new row into the SQLite table ‘driver_analyses’. Pretty straightforward: just a bit of tying stuff together!

Tying It All Together (5% Bonus Score)

You’ve created all these functions. Now it’s time to tie it all together! In ‘main.py’, correctly insert a row with all the required fields (you just made a function to help you with this!) for every single driver. Take a look at the ‘drivers’ table and make sure that there is a row for all the drivers (check the row counts).

Time to Optimize!

Optimize what you’ve done so far as well as you can!

Time Yourself

You can time your implementations using the built-in Python library time. Here’s an example that times calculate_bonus for a certain driver.

# Let's assume that your implementation of calculate_bonus takes in a driver_id as its only argument, we can time it as below.

# Importing your calculate_bonus function
from metrics import calculate_bonus

# Importing the required functions from the time library
from time import time, process_time

# Take start time and CPU time
start_time = time()
start_cpu_time = process_time()

# Run the function n times (20 in this case) to account for possible variations in run-by-run time
for i in range(20):
    calculate_bonus(123)

# Get ending time and CPU time
end_time = time()
end_cpu_time = process_time()

print(f"Time taken: {end_time - start_time}")
print(f"CPU Time taken: {end_cpu_time - start_cpu_time}")

Submission

Please send submissions as ZIP files using WeTransfer to varun@parallelized.in

Scoring

Evaluations are based on functionality and speed. Your grade consists of 50% for successful implementations, 50% for speed improvement over a naive staff implementation, and a 5% bonus for integrating everything together!

Table of Contents