Table of contents
Intro
In this project, we will analyze data for a telecom company called Techconnect. At the end of this project, you will have a fair understanding of how to approach data analysis when working with a large database.
Why Use SQL?
Before we dive into the project, let's briefly discuss why we use SQL for data analysis. While Python is a reliable tool for most data analysis tasks, it can encounter inefficiencies when dealing with large datasets containing millions of rows. This is where SQL shines. SQL was designed to analyze massive amounts of data efficiently, allowing us to get insights from large databases in seconds.
The database
We will use Microsoft SQL Server to run all our SQL code for this project. Connect to SQL Server using the following information:
Server Name: dap-projects-database.database.windows.net
Username: hackathon
Password: ---------------------
Database Name: dapDB
You will find a dapDB database with 3 tables named customers, location, and current_offer_balance.
Case Study
Let me introduce you to TechConnect, a mobile services provider offering voice, data, SMS, and various value-added services to its vast customer base. To stay competitive, TechConnect relies on data-driven decision-making and customer-centric strategies. Our task as data analysts is to transform raw data into actionable insights that will shape the company's decisions, product offerings, and marketing campaigns.
Dataset Description
Table: dbo.current_officers_balance
The "current_officers_balance" table was the backbone of TechConnect's data infrastructure. It stored detailed information about each customer's account and usage patterns. The table held records of customer IDs, call durations, data consumption, messaging activities, revenue details, and various other metrics. This data allowed TechConnect to gain insights into customer behavior, identify potential upselling opportunities, and provide personalized services.
Table: dbo.customers
The "customers" table contained a comprehensive list of all TechConnect's customers. It included activation dates, tenure information, and each customer's spending habits. This data helped the company understand the customer lifecycle, and loyalty trends and tailor marketing strategies based on customer preferences and behavior.
Table: dbo.location
The "location" table held vital information about the geographical distribution of customers and the network infrastructure. It stored site IDs, regions, and cell types associated with specific cellular towers and base stations.
This data will enable TechConnect to optimize network coverage, identify potential network expansion areas, and assess regional performance.
TechConnect's data analysts and business intelligence teams (You
) will work diligently to transform raw data into actionable insights. The insights generated from the database will play a pivotal role in shaping the company's decisions, product offerings, and marketing campaigns.
Data Analysis
The company is still in its early stages, so even though we are making money and we have achieved product-market-fit (creating a product or service that people are willing to pay for), we must ask important questions that will ensure that the company stays afloat.
The most important question we need to answer is: Why are people paying for our services? What exactly are they paying for, and who are our customers? To reduce cost and make more money, we must pinpoint exactly what our customers need most. This is the goal of any business analysis. Let's dive into it!
Questions
Customer spending behavior
1.0 How do customers spend money on different services such as voice, data, SMS, and other offerings? Identify the most popular services among customers.
SELECT
'Voice' AS Service,
SUM(VOICE_SPENT) AS TotalSpent
FROM
dbo.customers
UNION ALL
SELECT
'Data' AS Service,
SUM(DATA_SPENT) AS TotalSpent
FROM
dbo.customers
UNION ALL
SELECT
'SMS' AS Service,
SUM(SMS_SPENT) AS TotalSpent
FROM
dbo.customers
UNION ALL
SELECT
'Other Services' AS Service,
SUM(OTHER_SPENT) AS TotalSpent
FROM
dbo.customers
ORDER BY TotalSpent DESC;
1.1 Result
Voice is the most popular service that our customers spend money on. This means our customers mostly make calls, send text messages, and buy data.
Result of the query
You only have read permissions; however, right-click on the results to save the results as a CSV file for your visualizations.
Column chart of spending Vs service
Since voice is the most popular service, lets take a closer look at the customers that make calls.
2.0 What is the device_type of customers that spend money to make calls?
--Voice Spending Variation Based on Customer's Device Type
-- Query to get voice spending for smartphone and non-smartphone users
SELECT
CASE WHEN dbo.location.cell_type = 2 THEN 'Smartphone' --assuming 2 is smartphone
WHEN dbo.location.cell_type = 3 THEN 'Non-Smartphone'
ELSE 'Other-Devices' --null values as other-devices
END AS device_type,
SUM(VOICE_SPENT) AS sum_voice_spent
--AVG(VOICE_SPENT) AS average_voice_spent we can't use mean cuz of outliers
FROM dbo.customers
LEFT JOIN dbo.location
ON dbo.customers.Cell_ID = dbo.location.CELL_ID
GROUP BY dbo.location.cell_type;
2.1 Result
Majority of our voice customers are smartphone users.
Result of the query
Column chart of Voice_spent Vs Device_type
3.0 Top recharge method used by customers
--top recharge method used by customers
SELECT
SUM(RECHARGE_EVD_AMOUNT) AS evd,
SUM(RECHARGE_MOMO_AMOUNT) AS momo,
SUM(RECHARGE_CARD_AMOUNT) AS recharge_card
FROM dbo.customers;
3.1 Result
Most customers use recharge card as their recharge method
4.0 Can existing customers be upsold to higher-value data plans?
SELECT TOP 10
Customer_ID,
SUM(Data_Vol_MB) AS Total_Data_MB,
SUM(TOTAL_RECHARGE_AMOUNT) AS TOTAL_RECHARGE_AMOUNT
FROM dbo.customers
GROUP BY Customer_ID
HAVING SUM(Data_Vol_MB) = 0
ORDER BY TOTAL_RECHARGE_AMOUNT DESC;
4.1 Result
Yes, some customers have high recharge amount but no data at all, so they can be convinced to buy data.
It's good to know what customers want, but where exactly is the money coming from
Now let's look at the MONEY.
Main sources of revenue
1.0 Who are our best customers?
--Top 5 customers by total revenue:
SELECT TOP 5
cob.Customer_ID AS top_customers_by_revenue,
SUM(cob.tot_rev) AS total_revenue,
SUM(cust.VOICE_SPENT + cust.DATA_SPENT + cust.SMS_SPENT + cust.OTHER_SPENT) AS total_spending
FROM dbo.current_offers_balance AS cob
JOIN dbo.customers cust ON cob.Customer_ID = cust.Customer_ID
GROUP BY cob.Customer_ID
ORDER BY total_revenue DESC;
--Top 5 customers by spending
SELECT TOP 5
cob.Customer_ID AS top_customers_by_spending,
SUM(cob.tot_rev) AS total_revenue,
SUM(cust.VOICE_SPENT + cust.DATA_SPENT + cust.SMS_SPENT + cust.OTHER_SPENT) AS total_spending
FROM dbo.current_offers_balance AS cob
JOIN dbo.customers cust ON cob.Customer_ID = cust.Customer_ID
GROUP BY cob.Customer_ID
ORDER BY total_spending DESC;
1.1 Result
From the tables below, our top spenders don’t make us the most money, but customers who make us the most (total_revenue) may not be the best spenders. This goes to show that revenue does not correlate with spending.
top customers by revenue and spending
2.0 What are the main sources of revenue for the company?
--Revenue breakdown by revenue types
SELECT
SUM(cob.rev_data_total) AS data_revenue,
SUM(cob.rev_voice_int) AS voice_revenue,
SUM(cob.rev_other_vas) AS vas_revenue,
SUM(cob.rev_rentals) AS rentals_revenue
FROM dbo.current_offers_balance cob;
2.1 Result
Voice is the major revenue source of the company. However, we are losing a lot of money from rentals. This means our customers borrow data, voice, and value-added services from us; this borrowing of services is causing us to lose potential revenue.
3.0 Do we make more money from incoming calls and texts sent to our customers, or otherwise?
--percentage of total revenue from services coming in Vs out
SELECT
(SUM(tot_rev_in) / SUM(tot_rev)) * 100 AS total_rev_in_percentage,
(SUM(tot_rev_out) / SUM(tot_rev)) * 100 AS total_rev_out_percentage
FROM dbo.current_offers_balance;
3.1 Result
We make more money from outgoing services when our customers send text messages to outside networks or when our customers make calls to other networks.
4.0 How much money have we lost from customers who left (churned).
-- Calculate total revenue for each distinct value of status (pie chart)
SELECT
'ACTIVE' AS status, SUM(tot_rev) AS tot_rev
FROM dbo.current_offers_balance
WHERE status = 'ACTIVE'
UNION ALL
SELECT
'DORMANT & INACTIVE' AS status, SUM(tot_rev) AS tot_rev
FROM dbo.current_offers_balance
WHERE status IN ('DORMANT', 'INACTIVE')
UNION ALL
SELECT
'CHURNED' AS status, SUM(tot_rev) AS tot_rev
FROM dbo.current_offers_balance
WHERE status IN ('CHURNED', 'CHURN');
4.1 Result
From the table below, it appears that we didn’t make much money from customers who left or churned; compared to revenue from our active customers.
5.0 Do we make more money from international services and transactions?
--international vs domestic voice revenue
SELECT
SUM(rev_voice_int) + SUM(rev_voice_roam_incoming) + SUM(rev_voice_roam_outgoing) AS total_international_revenue,
SUM(tot_rev) - (SUM(rev_voice_int) + SUM(rev_voice_roam_incoming) + SUM(rev_voice_roam_outgoing)) AS total_domestic_revenue
FROM dbo.current_offers_balance;
5.1 Result
No, we make more money locally.
6.0 Which handset models bring more revenue?
-- Calculate total revenue for each distinct value of hs_model
SELECT TOP 5 hs_model, SUM(tot_rev) AS total_revenue
FROM dbo.current_offers_balance
GROUP BY hs_model
ORDER BY SUM(tot_rev) DESC;
6.1 Result
We make the most money from android smartphone users.
Revenue and customer engagement performance across different regions.
1.0 Compare customers' spending patterns in different regions and identify which services are more popular in each region.
--Compare the spending patterns of customers in different regions and
-- Identify which services are more popular in each region
SELECT TOP 5
loc.region,
SUM(cust.VOICE_SPENT) AS total_voice_spent,
SUM(cust.DATA_SPENT) AS total_data_spent,
SUM(cust.SMS_SPENT) AS total_sms_spent,
CASE
WHEN SUM(cust.VOICE_SPENT) > SUM(cust.DATA_SPENT)
AND SUM(cust.VOICE_SPENT) > SUM(cust.SMS_SPENT)
AND SUM(cust.VOICE_SPENT) > SUM(cust.OTHER_SPENT) THEN 'Voice'
WHEN SUM(cust.DATA_SPENT) > SUM(cust.VOICE_SPENT)
AND SUM(cust.DATA_SPENT) > SUM(cust.SMS_SPENT)
AND SUM(cust.DATA_SPENT) > SUM(cust.OTHER_SPENT) THEN 'Data'
WHEN SUM(cust.SMS_SPENT) > SUM(cust.VOICE_SPENT)
AND SUM(cust.SMS_SPENT) > SUM(cust.DATA_SPENT)
AND SUM(cust.SMS_SPENT) > SUM(cust.OTHER_SPENT) THEN 'SMS'
ELSE 'Other'
END AS popular_service
FROM
dbo.customers AS cust
JOIN
dbo.location AS loc ON cust.Cell_ID = loc.CELL_ID
GROUP BY
loc.region
ORDER BY
total_voice_spent DESC; --we order by voice since voice is the most popular service across all regions;
1.1 Result
The top regions are the “Greater Accra Region” and the “Ashanti Region” by spending. Voice is the most popular service for all regions.
Spending rate might not determine revenue, meaning we might make more money from regions where customers spent less. So, let us look at regional revenue.
2.0 What are the most profitable regions?
--Compare the total_revenue of customers in different regions
SELECT TOP 5
l.region AS Region,
SUM(cob.tot_rev) AS Total_Revenue
FROM dbo.current_offers_balance cob
JOIN dbo.customers cu ON cu.Customer_ID = cob.Customer_ID
JOIN dbo.location l ON cu.Cell_ID = l.CELL_ID
GROUP BY l.region
ORDER BY Total_Revenue DESC;
2.1 Result
The top regions are the “Greater Accra Region” and the “Western Region” by revenue.
Analyze customer churn
1.0 Which regions have the highest churn rate?
--Actionable insight: Use insights from high revenue regions to create personalized offerings and improve customer experiences in other regions.
--regions with the highest churn rate
--The Volta Region has the highest churn rate but 3rd highest revenue
SELECT TOP 5
l.region,
AVG(c.tenure) AS avg_tenure,
SUM(CASE WHEN co.churn_date IS NOT NULL THEN 1 ELSE 0 END) * 100.0 / COUNT(c.customer_id) AS churn_rate_percentage --if there is a date for churn, then true or 1
FROM dbo.location l
LEFT JOIN dbo.customers c ON l.cell_id = c.cell_id
LEFT JOIN dbo.current_offers_balance co ON c.customer_id = co.customer_id
GROUP BY l.region
ORDER BY churn_rate_percentage DESC;
1.1 Result
The Volta Region has the highest churn rate but brings the 3rd highest revenue.
2.0 Evaluate the effectiveness of the loyalty program. How customers engage with it.
The loyalty program was implemented to encourage our active customers to stay with us and continue to pay for our services.
SELECT DISTINCT loyalty_points_balance
FROM dbo.current_offers_balance;
SELECT DISTINCT loyalty_points_redeemed
FROM dbo.current_offers_balance;
SELECT DISTINCT loyalty_points_earned
FROM dbo.current_offers_balance;
2.1 Result
It's time to try something other than loyalty points because customers have not responded. Either that or it was recently introduced and needs some time to kick off.
Track customers' adoption of different mobile network technologies (2G).
1.0 Compare the revenue from customers on 2G networks to those from others.
SELECT '2G' AS nw_2g_ind, SUM(tot_rev) AS Total_Revenue
FROM dbo.current_offers_balance WHERE nw_2g_ind = -1
UNION ALL
SELECT '3G' AS nw_2g_ind, SUM(tot_rev) AS Total_Revenue
FROM dbo.current_offers_balance WHERE nw_2g_ind = 0
UNION ALL
SELECT '4G' AS nw_2g_ind, SUM(tot_rev) AS Total_Revenue
FROM dbo.current_offers_balance WHERE nw_2g_ind = 1;
1.1 Result
We should advertise to those using 2G network to upgrade to 3G or 4G.
Conclusion
After queries have been written and beautiful dashboards have been built, the single most important thing a data analyst has to give is actionable insights. So from our analysis, here are some things I recommend:
- Focus on Voice Services*: Voice is the most popular service among customers, so it is essential to maintain and improve the voice service quality. Additionally, consider introducing voice-based value-added services to engage customers further.*
- Target Smartphone Users*: the majority of voice customers are smartphone users. This presents an opportunity to create smartphone-specific offers, promotions, and services to cater to this segment.*
- Promote Data Plans to High-Recharge Customers*: Identify customers with high recharge amounts but do not use data. Target these customers with personalized offers to upsell higher-value data plans.*
- Optimize Recharge Methods*: As most customers use recharge cards, consider promoting mobile money and electronic voucher distribution (EVD) methods to increase convenience and accessibility.*
- Address Churn Rate in Volta Region*: The Volta Region has the highest churn rate despite being a high-revenue region. Investigate the reasons behind the churn and implement strategies to improve customer retention in this area.*
- Reduce Losses from Rentals*: Analyze the rental services and identify ways to reduce losses from customers borrowing services. Encourage customers to opt for regular data and voice plans instead of rentals.*
- Upgrade 2G Customers*: Encourage customers on 2G networks to upgrade to 3G or 4G to capitalize on the revenue potential from these advanced services.*