Data Analysis with SQL

Data Analysis with SQL

Intro

In this project, we will analyze data for a telecom company called Techconnect. At the end of this project, you will have a fair understanding of how to approach data analysis when working with a large database.

Why Use SQL?

Before we dive into the project, let's briefly discuss why we use SQL for data analysis. While Python is a reliable tool for most data analysis tasks, it can encounter inefficiencies when dealing with large datasets containing millions of rows. This is where SQL shines. SQL was designed to analyze massive amounts of data efficiently, allowing us to get insights from large databases in seconds.

The database

We will use Microsoft SQL Server to run all our SQL code for this project. Connect to SQL Server using the following information:

You will find a dapDB database with 3 tables named customers, location, and current_offer_balance.

Case Study

Let me introduce you to TechConnect, a mobile services provider offering voice, data, SMS, and various value-added services to its vast customer base. To stay competitive, TechConnect relies on data-driven decision-making and customer-centric strategies. Our task as data analysts is to transform raw data into actionable insights that will shape the company's decisions, product offerings, and marketing campaigns.

Dataset Description

Table: dbo.current_officers_balance

The "current_officers_balance" table was the backbone of TechConnect's data infrastructure. It stored detailed information about each customer's account and usage patterns. The table held records of customer IDs, call durations, data consumption, messaging activities, revenue details, and various other metrics. This data allowed TechConnect to gain insights into customer behavior, identify potential upselling opportunities, and provide personalized services.

Table: dbo.customers

The "customers" table contained a comprehensive list of all TechConnect's customers. It included activation dates, tenure information, and each customer's spending habits. This data helped the company understand the customer lifecycle, and loyalty trends and tailor marketing strategies based on customer preferences and behavior.

Table: dbo.location

The "location" table held vital information about the geographical distribution of customers and the network infrastructure. It stored site IDs, regions, and cell types associated with specific cellular towers and base stations.

This data will enable TechConnect to optimize network coverage, identify potential network expansion areas, and assess regional performance.

TechConnect's data analysts and business intelligence teams (You

winking face

) will work diligently to transform raw data into actionable insights. The insights generated from the database will play a pivotal role in shaping the company's decisions, product offerings, and marketing campaigns.

Data Analysis

The company is still in its early stages, so even though we are making money and we have achieved product-market-fit (creating a product or service that people are willing to pay for), we must ask important questions that will ensure that the company stays afloat.

The most important question we need to answer is: Why are people paying for our services? What exactly are they paying for, and who are our customers? To reduce cost and make more money, we must pinpoint exactly what our customers need most. This is the goal of any business analysis. Let's dive into it!

Questions

Customer spending behavior

1.0 How do customers spend money on different services such as voice, data, SMS, and other offerings? Identify the most popular services among customers.

SELECT

    'Voice' AS Service,

    SUM(VOICE_SPENT) AS TotalSpent

FROM

    dbo.customers

UNION ALL

SELECT

    'Data' AS Service,

    SUM(DATA_SPENT) AS TotalSpent

FROM

    dbo.customers

UNION ALL

SELECT

    'SMS' AS Service,

    SUM(SMS_SPENT) AS TotalSpent

FROM

    dbo.customers

UNION ALL

SELECT

    'Other Services' AS Service,

    SUM(OTHER_SPENT) AS TotalSpent

FROM

    dbo.customers

ORDER BY TotalSpent DESC;

1.1 Result

Voice is the most popular service that our customers spend money on. This means our customers mostly make calls, send text messages, and buy data.

Result of the query

You only have read permissions; however, right-click on the results to save the results as a CSV file for your visualizations.

Column chart of spending Vs service

Since voice is the most popular service, lets take a closer look at the customers that make calls.

2.0 What is the device_type of customers that spend money to make calls?

 --Voice Spending Variation Based on Customer's Device Type


-- Query to get voice spending for smartphone and non-smartphone users

SELECT 

    CASE WHEN dbo.location.cell_type = 2 THEN 'Smartphone' --assuming 2 is smartphone

                 WHEN dbo.location.cell_type = 3 THEN 'Non-Smartphone'

                 ELSE 'Other-Devices' --null values as other-devices

                 END AS device_type,

    SUM(VOICE_SPENT) AS sum_voice_spent

        --AVG(VOICE_SPENT) AS average_voice_spent we can't use mean cuz of outliers

FROM dbo.customers

LEFT JOIN dbo.location 

ON dbo.customers.Cell_ID = dbo.location.CELL_ID

GROUP BY dbo.location.cell_type;

2.1 Result

Majority of our voice customers are smartphone users.

Result of the query

Column chart of Voice_spent Vs Device_type

3.0 Top recharge method used by customers

--top recharge method used by customers

SELECT 

       SUM(RECHARGE_EVD_AMOUNT) AS evd,

       SUM(RECHARGE_MOMO_AMOUNT) AS momo,

       SUM(RECHARGE_CARD_AMOUNT) AS recharge_card

FROM dbo.customers;

3.1 Result

Most customers use recharge card as their recharge method

4.0 Can existing customers be upsold to higher-value data plans?

SELECT TOP 10

    Customer_ID,

    SUM(Data_Vol_MB) AS Total_Data_MB,

    SUM(TOTAL_RECHARGE_AMOUNT) AS TOTAL_RECHARGE_AMOUNT

FROM dbo.customers

GROUP BY Customer_ID

HAVING SUM(Data_Vol_MB) = 0

ORDER BY TOTAL_RECHARGE_AMOUNT DESC;

4.1 Result

Yes, some customers have high recharge amount but no data at all, so they can be convinced to buy data.

It's good to know what customers want, but where exactly is the money coming from

thinking face

Now let's look at the MONEY.

money-mouth face

Main sources of revenue

1.0 Who are our best customers?

--Top 5 customers by total revenue:

SELECT TOP 5

    cob.Customer_ID AS top_customers_by_revenue,

    SUM(cob.tot_rev) AS total_revenue,

        SUM(cust.VOICE_SPENT + cust.DATA_SPENT + cust.SMS_SPENT + cust.OTHER_SPENT) AS total_spending

FROM dbo.current_offers_balance AS cob

JOIN dbo.customers cust ON cob.Customer_ID = cust.Customer_ID

GROUP BY cob.Customer_ID

ORDER BY total_revenue DESC;


--Top 5 customers by spending

SELECT TOP 5

    cob.Customer_ID AS top_customers_by_spending,

    SUM(cob.tot_rev) AS total_revenue,

    SUM(cust.VOICE_SPENT + cust.DATA_SPENT + cust.SMS_SPENT + cust.OTHER_SPENT) AS total_spending

FROM dbo.current_offers_balance AS cob

JOIN dbo.customers cust ON cob.Customer_ID = cust.Customer_ID

GROUP BY cob.Customer_ID

ORDER BY total_spending DESC;

1.1 Result

From the tables below, our top spenders don’t make us the most money, but customers who make us the most (total_revenue) may not be the best spenders. This goes to show that revenue does not correlate with spending.

top customers by revenue and spending

2.0 What are the main sources of revenue for the company?

--Revenue breakdown by revenue types

SELECT

    SUM(cob.rev_data_total) AS data_revenue,

    SUM(cob.rev_voice_int) AS voice_revenue,

    SUM(cob.rev_other_vas) AS vas_revenue,

    SUM(cob.rev_rentals) AS rentals_revenue

FROM dbo.current_offers_balance cob;

2.1 Result

Voice is the major revenue source of the company. However, we are losing a lot of money from rentals. This means our customers borrow data, voice, and value-added services from us; this borrowing of services is causing us to lose potential revenue.

3.0 Do we make more money from incoming calls and texts sent to our customers, or otherwise?

--percentage of total revenue from services coming in Vs out

SELECT

        (SUM(tot_rev_in) / SUM(tot_rev)) * 100 AS total_rev_in_percentage,

        (SUM(tot_rev_out) / SUM(tot_rev)) * 100 AS total_rev_out_percentage

FROM dbo.current_offers_balance;

3.1 Result

We make more money from outgoing services when our customers send text messages to outside networks or when our customers make calls to other networks.

4.0 How much money have we lost from customers who left (churned).

-- Calculate total revenue for each distinct value of status (pie chart)

SELECT

    'ACTIVE' AS status, SUM(tot_rev) AS tot_rev

FROM dbo.current_offers_balance

WHERE status = 'ACTIVE'
UNION ALL
SELECT

    'DORMANT & INACTIVE' AS status, SUM(tot_rev) AS tot_rev

FROM dbo.current_offers_balance

WHERE status IN ('DORMANT', 'INACTIVE')


UNION ALL


SELECT

    'CHURNED' AS status, SUM(tot_rev) AS tot_rev

FROM dbo.current_offers_balance

WHERE status IN ('CHURNED', 'CHURN');

4.1 Result

From the table below, it appears that we didn’t make much money from customers who left or churned; compared to revenue from our active customers.

5.0 Do we make more money from international services and transactions?

--international vs domestic voice revenue

SELECT

    SUM(rev_voice_int) + SUM(rev_voice_roam_incoming) + SUM(rev_voice_roam_outgoing) AS total_international_revenue,

    SUM(tot_rev) - (SUM(rev_voice_int) + SUM(rev_voice_roam_incoming) + SUM(rev_voice_roam_outgoing)) AS total_domestic_revenue

FROM dbo.current_offers_balance;

5.1 Result

No, we make more money locally.

6.0 Which handset models bring more revenue?

-- Calculate total revenue for each distinct value of hs_model

SELECT TOP 5 hs_model, SUM(tot_rev) AS total_revenue

FROM dbo.current_offers_balance

GROUP BY hs_model

ORDER BY SUM(tot_rev) DESC;

6.1 Result

We make the most money from android smartphone users.

Revenue and customer engagement performance across different regions.

1.0 Compare customers' spending patterns in different regions and identify which services are more popular in each region.

--Compare the spending patterns of customers in different regions and

-- Identify which services are more popular in each region

SELECT TOP 5

    loc.region,

    SUM(cust.VOICE_SPENT) AS total_voice_spent,

    SUM(cust.DATA_SPENT) AS total_data_spent,

    SUM(cust.SMS_SPENT) AS total_sms_spent,

    CASE

        WHEN SUM(cust.VOICE_SPENT) > SUM(cust.DATA_SPENT)

            AND SUM(cust.VOICE_SPENT) > SUM(cust.SMS_SPENT)

            AND SUM(cust.VOICE_SPENT) > SUM(cust.OTHER_SPENT) THEN 'Voice'

        WHEN SUM(cust.DATA_SPENT) > SUM(cust.VOICE_SPENT)

            AND SUM(cust.DATA_SPENT) > SUM(cust.SMS_SPENT)

            AND SUM(cust.DATA_SPENT) > SUM(cust.OTHER_SPENT) THEN 'Data'

        WHEN SUM(cust.SMS_SPENT) > SUM(cust.VOICE_SPENT)

            AND SUM(cust.SMS_SPENT) > SUM(cust.DATA_SPENT)

            AND SUM(cust.SMS_SPENT) > SUM(cust.OTHER_SPENT) THEN 'SMS'

        ELSE 'Other'

    END AS popular_service

FROM

    dbo.customers AS cust

JOIN

    dbo.location AS loc ON cust.Cell_ID = loc.CELL_ID

GROUP BY

    loc.region

ORDER BY

        total_voice_spent DESC; --we order by voice since voice is the most popular service across all regions;

1.1 Result

The top regions are the “Greater Accra Region” and the “Ashanti Region” by spending. Voice is the most popular service for all regions.

Spending rate might not determine revenue, meaning we might make more money from regions where customers spent less. So, let us look at regional revenue.

2.0 What are the most profitable regions?

--Compare the total_revenue of customers in different regions

SELECT TOP 5

        l.region AS Region, 

        SUM(cob.tot_rev) AS Total_Revenue

FROM dbo.current_offers_balance cob

JOIN dbo.customers cu ON cu.Customer_ID = cob.Customer_ID

JOIN dbo.location l ON cu.Cell_ID = l.CELL_ID

GROUP BY l.region

ORDER BY Total_Revenue DESC;

2.1 Result

The top regions are the “Greater Accra Region” and the “Western Region” by revenue.

Analyze customer churn

1.0 Which regions have the highest churn rate?

--Actionable insight: Use insights from high revenue regions to create personalized offerings and improve customer experiences in other regions.

--regions with the highest churn rate

--The Volta Region has the highest churn rate but 3rd highest revenue

SELECT TOP 5

           l.region,

       AVG(c.tenure) AS avg_tenure,

       SUM(CASE WHEN co.churn_date IS NOT NULL THEN 1 ELSE 0 END) * 100.0 / COUNT(c.customer_id) AS churn_rate_percentage --if there is a date for churn, then true or 1

FROM dbo.location l

LEFT JOIN dbo.customers c ON l.cell_id = c.cell_id

LEFT JOIN dbo.current_offers_balance co ON c.customer_id = co.customer_id

GROUP BY l.region

ORDER BY churn_rate_percentage DESC;

1.1 Result

The Volta Region has the highest churn rate but brings the 3rd highest revenue.

2.0 Evaluate the effectiveness of the loyalty program. How customers engage with it.

The loyalty program was implemented to encourage our active customers to stay with us and continue to pay for our services.

SELECT DISTINCT loyalty_points_balance

FROM dbo.current_offers_balance;

SELECT DISTINCT loyalty_points_redeemed

FROM dbo.current_offers_balance;

SELECT DISTINCT loyalty_points_earned

FROM dbo.current_offers_balance;

2.1 Result

It's time to try something other than loyalty points because customers have not responded. Either that or it was recently introduced and needs some time to kick off.

Track customers' adoption of different mobile network technologies (2G).

1.0 Compare the revenue from customers on 2G networks to those from others.

SELECT '2G' AS nw_2g_ind, SUM(tot_rev) AS Total_Revenue 

FROM dbo.current_offers_balance WHERE nw_2g_ind = -1

UNION ALL

SELECT '3G' AS nw_2g_ind, SUM(tot_rev) AS Total_Revenue 

FROM dbo.current_offers_balance WHERE nw_2g_ind = 0

UNION ALL

SELECT '4G' AS nw_2g_ind, SUM(tot_rev) AS Total_Revenue 

FROM dbo.current_offers_balance WHERE nw_2g_ind = 1;

1.1 Result

We should advertise to those using 2G network to upgrade to 3G or 4G.

Conclusion

After queries have been written and beautiful dashboards have been built, the single most important thing a data analyst has to give is actionable insights. So from our analysis, here are some things I recommend:

  • Focus on Voice Services*: Voice is the most popular service among customers, so it is essential to maintain and improve the voice service quality. Additionally, consider introducing voice-based value-added services to engage customers further.*
  • Target Smartphone Users*: the majority of voice customers are smartphone users. This presents an opportunity to create smartphone-specific offers, promotions, and services to cater to this segment.*
  • Promote Data Plans to High-Recharge Customers*: Identify customers with high recharge amounts but do not use data. Target these customers with personalized offers to upsell higher-value data plans.*
  • Optimize Recharge Methods*: As most customers use recharge cards, consider promoting mobile money and electronic voucher distribution (EVD) methods to increase convenience and accessibility.*
  • Address Churn Rate in Volta Region*: The Volta Region has the highest churn rate despite being a high-revenue region. Investigate the reasons behind the churn and implement strategies to improve customer retention in this area.*
  • Reduce Losses from Rentals*: Analyze the rental services and identify ways to reduce losses from customers borrowing services. Encourage customers to opt for regular data and voice plans instead of rentals.*
  • Upgrade 2G Customers*: Encourage customers on 2G networks to upgrade to 3G or 4G to capitalize on the revenue potential from these advanced services.*

Dashboard

Resources