Project Overview

It’s all about a fictional bank with idea of launching a new credit card, they needed strong data-backed insights to identify potential customers. The task was exciting: dig deep into customer, transaction, and credit profile data to tell a story—who should get this card, and why?
As part of the Codebasics Data Analytics Bootcamp, I took on this challenge using a dataset sourced from data.lelo, provided by my mentor. Here’s the story of how I tackled the problem end-to-end.

The main goal was to:

Understand customer behavior using demographics and transaction history.
Identify creditworthy individuals likely to respond well to a new credit card offer.
Recommend a target customer group for the credit card rollout.

Dataset Breakdown

The data is got in sql format with database “e_master_card” and the main tables inside it are:

1. `customers`

This table contains demographic and financial details of bank customers. It serves as the base dataset for identifying potential credit card holders.

Key Columns:

customer_id: Unique identifier for each customer.
gender: Gender of the customer.
age: Age in years.
marital_status: Marital status (e.g., Married, Single).
occupation: Profession type (e.g., Salaried, Self-Employed).
annual_income: Reported yearly income (cleaned to fix invalid 0 values).
joining_date: Date the customer joined the bank.

2. `credit_profiles`

This table provides credit behavior and scoring details for each customer. It is essential for assessing financial reliability and risk.

Key Columns:

customer_id: Foreign key linked to customers.
credit_score: Numeric score indicating the customer’s creditworthiness.
credit_card_debt: Total current debt on credit cards.
num_credit_cards: Number of active credit cards.
credit_utilization_ratio: Percentage of credit used versus total available credit.

3. `transactions`

This dataset records individual financial transactions made by customers, used to evaluate spending behavior and activity level.

Key Columns:

transaction_id: Unique ID for each transaction.
customer_id: Foreign key linked to customers.
transaction_amount: Monetary value of the transaction.
transaction_type: Type of transaction (e.g., POS, Online).
merchant_category: Category of the purchase (e.g., Travel, Groceries).
transaction_date: Date of the transaction.

Technical Analysis

GitHub Project Repo Link

GitHub README File Link

The Full Story

Income Imputation

One glaring issue was that some customers had an annual income of 0, which is unrealistic. Deleting them would lead to a 5% data loss (50 out of 1000), so I:

Grouped customers by occupation
Replaced the 0 income values with the median income for their occupation

This preserved integrity without introducing skewed mean-based imputations.

Other Validations

Verified no nulls in key columns
Standardized formatting for date columns and categorical fields
Ensured consistency in transaction types and merchant categories

Data Transformation: Making Data Speak Business

After cleaning, I performed several transformations to prepare for analysis:

Merged all datasets using customer_id
Created new fields:
- Total spend per customer
- Number of credit cards held
- Days since last transaction
- Credit utilization ratio
Grouped customers by age buckets: 18–25, 26–35, 36–45, etc.
Flagged inactive customers (no transaction in the last 90 days)

Once I had completed the initial transformations—merging the customer profiles, credit behavior, and transactional activity—I turned my focus toward understanding the story behind the numbers. I wasn’t just looking for surface-level metrics; I was hunting for behavioral signals and under-served customer segments the bank may have overlooked.

At this point, I had engineered several new features that enriched the dataset—such as transaction frequency, total spend per customer, credit utilization, and even categorical preferences based on merchant types. My next step was to slice this information through a demographic lens, starting with age groups.

I used binning to create distinct age ranges like 18–25, 26–35, 36–45, and so on. It was during this age-based grouping that something caught my attention: the 18–25 age group made up nearly 26% of the entire customer base. That’s over a quarter of all customers—yet, everything else about them seemed to sit quietly at the lower end of the charts. That contradiction intrigued me.

To dig deeper, I filtered the dataset for this specific segment and began evaluating their income, credit activity, and spending behavior. The income analysis showed that their average annual income was below Rs. 50,000—expected, given they are likely students or early-career professionals. But it was their credit behavior that stood out: very few of them had a healthy credit history, and most had low credit scores and minimal credit card usage.

Rather than dismissing this group as unqualified, I considered the context: this demographic isn’t necessarily risky—they’re just new to the financial system. They’re in the early stages of their credit journey and likely haven’t had access to personalized credit products. That’s when it clicked: the issue isn’t risk—it’s opportunity.

To validate this, I turned to their transactional behavior. I grouped their spend by merchant categories and saw a very clear trend—Electronics, Fashion & Apparel, and Beauty & Personal Care topped the charts. These aren’t the patterns of passive savers. These are digitally active, trend-driven consumers, making them ideal for a curated credit card product.

Moreover, a deeper look at their payment modes revealed a clear dependency on UPI and debit cards. They weren’t using credit cards—not because they didn’t want to, but because they likely weren’t offered one that felt tailored to them.

All of this built a powerful case: this 18–25 group was young, eager, and ready to engage. They just needed the right entry point.

That’s when the narrative shifted—from analyzing data to recommending action. It wasn’t just about identifying how this group performed; it was about realizing what was missing in their experience—and how the bank could fill that gap. And so, I earmarked them as the untapped market with high future value, perfect for a low-limit, reward-based starter credit card.

This discovery was the result of a careful blend of data storytelling, behavioral insight, and an understanding of market timing. Sometimes, the biggest opportunities aren’t hidden—they’re just not yet visible through a traditional lens.

The Untapped Potential: Age Group 18–25

To identify growth opportunities, I started by segmenting the customer base into age groups using the age column from the customers table. I used pd.cut() to categorize ages (e.g., 18–25, 26–35, etc.).

Then, I filtered the dataset to isolate the 18–25 group and performed the following analyses:

Income Check: I calculated the average annual_income for this segment. It came out to less than Rs. 50,000, which was the lowest among all groups.
Credit History: From the credit_profiles table, I checked their credit_score and credit_limit, and found both were consistently on the lower end — indicating thin or new credit histories.
Card Usage Behavior: I examined their transaction_type from the transactions table and found that credit card usage was noticeably lower than other groups, while UPI and debit payments were more frequent.
Spending Habits: I grouped transaction data by merchant_category and aggregated the spend by age group. For 18–25, the top 3 categories were:
- Electronics
- Fashion & Apparel
- Beauty & Personal Care

These behavioral patterns revealed that while this group spends actively, especially on lifestyle-related purchases, they are not yet targeted effectively with credit card products — presenting a valuable business opportunity.

Future Analysis

We want to do trial run new credit card. First of all need to figure out how many customers do we need A/B testing. We will form control and test group. For both the groups we can figure out the number of customers we need based on the statistical power and effect size that wee agree while discussing with business.

The bank planned to launch a new credit card product specifically designed for younger customers (ages 18–25). From the dataset, I identified 246 such customers, and to run a pilot, I decided to use a test and control group structure.

But first, I needed to determine:

How many customers do we need for the experiment to be statistically valid?

Calculating Required Sample Size

Using the statsmodels power analysis function (sms.tt_ind_solve_power), I set:

Significance level (α): 0.05 (5%)
Power: 0.8 (80%)
Effect size: 0.2 (moderate)

This returned a required sample size of ~393 customers per group.

However, due to budget constraints, the business could only target about 100 customers in the test group. With that, we could detect an effect size of 0.4, which still offered meaningful insight. (This is imaginary and the data for this was got from same data.lelo)

The Experiment Setup

Test Group: 100 customers (aged 18–25) received the new credit card.
- After 2 months, 40 customers actively used it (conversion rate: 40%).
Control Group: 40 customers who did not receive the card.
The experiment ran for 2 months (from 09-Oct-2023 to 11-Dec-2023).
I tracked the average daily transaction amount per group in the avg_transactions_after_campaign table.

The Hypotheses

Null Hypothesis (H₀): There is no difference in average transaction amounts between the control and test groups.
Alternative Hypothesis (H₁): The test group spends more than the control group.

Test Selection: Two-Sample Z-Test

Since each group had 40 customers and the sample size exceeded 30, I opted for a two-sample z-test to compare means.

Descriptive Stats:

Group	Mean (₹)	Std Dev (₹)
Control	248.94	9.14
Test (New Card)	370.54	63.25

Z-Test Results

Z-score: 3.73
Critical Z-value (α = 0.05): 1.64
p-value: 0.00009658

Because:
Z-score > critical value
p-value < 0.05

We rejected the null hypothesis.

Conclusion:

The average transaction amount for the test group was significantly higher than the control group.
This strongly suggests that the new credit card drove increased spending behavior.

Confidence Interval

I also calculated a 95% Confidence Interval for the average transaction amount in the test group:(354.80, 386.28)

This interval reinforces the observed uplift: customers using the new credit card spend Rs. 355– Rs.386 per day on average, compared to Rs. 249 in the control group.

What This Means for the Bank

The test validated that the new credit card positively impacted customer spending — especially within the targeted 18–25 age group. These findings provide the evidence needed for a broader rollout, backed by real customer behavior and not just assumptions.

Credit Card Launch Analytics: My Journey with Data for a Banking Client

Project Overview

Dataset Breakdown

1. `customers`

2. `credit_profiles`

3. `transactions`

Technical Analysis

The Full Story

Income Imputation

Other Validations

Data Transformation: Making Data Speak Business

The Untapped Potential: Age Group 18–25

Future Analysis

Calculating Required Sample Size

The Experiment Setup

The Hypotheses

Test Selection: Two-Sample Z-Test

Descriptive Stats:

Conclusion:

Confidence Interval

What This Means for the Bank

Submit a Comment Cancel reply

Recent Posts

Recent Comments

Credit Card Launch Analytics: My Journey with Data for a Banking Client

Project Overview

Dataset Breakdown

1. customers

2. credit_profiles

3. transactions

Technical Analysis

The Full Story

Income Imputation

Other Validations

Data Transformation: Making Data Speak Business

The Untapped Potential: Age Group 18–25

Future Analysis

Calculating Required Sample Size

The Experiment Setup

The Hypotheses

Test Selection: Two-Sample Z-Test

Descriptive Stats:

Conclusion:

Confidence Interval

What This Means for the Bank

Submit a Comment Cancel reply

Recent Posts

Recent Comments

1. `customers`

2. `credit_profiles`

3. `transactions`