Project Overview
It’s all about a fictional bank with idea of launching a new credit card, they needed strong data-backed insights to identify potential customers. The task was exciting: dig deep into customer, transaction, and credit profile data to tell a story—who should get this card, and why?
As part of the Codebasics Data Analytics Bootcamp, I took on this challenge using a dataset sourced from data.lelo, provided by my mentor. Here’s the story of how I tackled the problem end-to-end.
The main goal was to:
-
Understand customer behavior using demographics and transaction history.
-
Identify creditworthy individuals likely to respond well to a new credit card offer.
-
Recommend a target customer group for the credit card rollout.
Dataset Breakdown
The data is got in sql format with database “e_master_card” and the main tables inside it are:
1. customers
This table contains demographic and financial details of bank customers. It serves as the base dataset for identifying potential credit card holders.
Key Columns:
-
customer_id: Unique identifier for each customer. -
gender: Gender of the customer. -
age: Age in years. -
marital_status: Marital status (e.g., Married, Single). -
occupation: Profession type (e.g., Salaried, Self-Employed). -
annual_income: Reported yearly income (cleaned to fix invalid 0 values). -
joining_date: Date the customer joined the bank.
2. credit_profiles
This table provides credit behavior and scoring details for each customer. It is essential for assessing financial reliability and risk.
Key Columns:
-
customer_id: Foreign key linked tocustomers. -
credit_score: Numeric score indicating the customer’s creditworthiness. -
credit_card_debt: Total current debt on credit cards. -
num_credit_cards: Number of active credit cards. -
credit_utilization_ratio: Percentage of credit used versus total available credit.
3. transactions
This dataset records individual financial transactions made by customers, used to evaluate spending behavior and activity level.
Key Columns:
-
transaction_id: Unique ID for each transaction. -
customer_id: Foreign key linked tocustomers. -
transaction_amount: Monetary value of the transaction. -
transaction_type: Type of transaction (e.g., POS, Online). -
merchant_category: Category of the purchase (e.g., Travel, Groceries). -
transaction_date: Date of the transaction.
Technical Analysis
The Full Story
Income Imputation
One glaring issue was that some customers had an annual income of 0, which is unrealistic. Deleting them would lead to a 5% data loss (50 out of 1000), so I:
-
Grouped customers by occupation
-
Replaced the 0 income values with the median income for their occupation
This preserved integrity without introducing skewed mean-based imputations.
Other Validations
-
Verified no nulls in key columns
-
Standardized formatting for date columns and categorical fields
-
Ensured consistency in transaction types and merchant categories
Data Transformation: Making Data Speak Business
After cleaning, I performed several transformations to prepare for analysis:
-
Merged all datasets using
customer_id -
Created new fields:
-
Total spend per customer
-
Number of credit cards held
-
Days since last transaction
-
Credit utilization ratio
-
-
Grouped customers by age buckets: 18–25, 26–35, 36–45, etc.
-
Flagged inactive customers (no transaction in the last 90 days)
Once I had completed the initial transformations—merging the customer profiles, credit behavior, and transactional activity—I turned my focus toward understanding the story behind the numbers. I wasn’t just looking for surface-level metrics; I was hunting for behavioral signals and under-served customer segments the bank may have overlooked.
At this point, I had engineered several new features that enriched the dataset—such as transaction frequency, total spend per customer, credit utilization, and even categorical preferences based on merchant types. My next step was to slice this information through a demographic lens, starting with age groups.
I used binning to create distinct age ranges like 18–25, 26–35, 36–45, and so on. It was during this age-based grouping that something caught my attention: the 18–25 age group made up nearly 26% of the entire customer base. That’s over a quarter of all customers—yet, everything else about them seemed to sit quietly at the lower end of the charts. That contradiction intrigued me.
To dig deeper, I filtered the dataset for this specific segment and began evaluating their income, credit activity, and spending behavior. The income analysis showed that their average annual income was below Rs. 50,000—expected, given they are likely students or early-career professionals. But it was their credit behavior that stood out: very few of them had a healthy credit history, and most had low credit scores and minimal credit card usage.
Rather than dismissing this group as unqualified, I considered the context: this demographic isn’t necessarily risky—they’re just new to the financial system. They’re in the early stages of their credit journey and likely haven’t had access to personalized credit products. That’s when it clicked: the issue isn’t risk—it’s opportunity.
To validate this, I turned to their transactional behavior. I grouped their spend by merchant categories and saw a very clear trend—Electronics, Fashion & Apparel, and Beauty & Personal Care topped the charts. These aren’t the patterns of passive savers. These are digitally active, trend-driven consumers, making them ideal for a curated credit card product.
Moreover, a deeper look at their payment modes revealed a clear dependency on UPI and debit cards. They weren’t using credit cards—not because they didn’t want to, but because they likely weren’t offered one that felt tailored to them.
All of this built a powerful case: this 18–25 group was young, eager, and ready to engage. They just needed the right entry point.
That’s when the narrative shifted—from analyzing data to recommending action. It wasn’t just about identifying how this group performed; it was about realizing what was missing in their experience—and how the bank could fill that gap. And so, I earmarked them as the untapped market with high future value, perfect for a low-limit, reward-based starter credit card.
This discovery was the result of a careful blend of data storytelling, behavioral insight, and an understanding of market timing. Sometimes, the biggest opportunities aren’t hidden—they’re just not yet visible through a traditional lens.
The Untapped Potential: Age Group 18–25
To identify growth opportunities, I started by segmenting the customer base into age groups using the age column from the customers table. I used pd.cut() to categorize ages (e.g., 18–25, 26–35, etc.).
Then, I filtered the dataset to isolate the 18–25 group and performed the following analyses:
-
Income Check: I calculated the average
annual_incomefor this segment. It came out to less than Rs. 50,000, which was the lowest among all groups. -
Credit History: From the
credit_profilestable, I checked theircredit_scoreandcredit_limit, and found both were consistently on the lower end — indicating thin or new credit histories. -
Card Usage Behavior: I examined their
transaction_typefrom thetransactionstable and found that credit card usage was noticeably lower than other groups, while UPI and debit payments were more frequent. -
Spending Habits: I grouped transaction data by
merchant_categoryand aggregated the spend by age group. For 18–25, the top 3 categories were:-
Electronics
-
Fashion & Apparel
-
Beauty & Personal Care
-
These behavioral patterns revealed that while this group spends actively, especially on lifestyle-related purchases, they are not yet targeted effectively with credit card products — presenting a valuable business opportunity.
Future Analysis
We want to do trial run new credit card. First of all need to figure out how many customers do we need A/B testing. We will form control and test group. For both the groups we can figure out the number of customers we need based on the statistical power and effect size that wee agree while discussing with business.
The bank planned to launch a new credit card product specifically designed for younger customers (ages 18–25). From the dataset, I identified 246 such customers, and to run a pilot, I decided to use a test and control group structure.
But first, I needed to determine:
How many customers do we need for the experiment to be statistically valid?
Calculating Required Sample Size
Using the statsmodels power analysis function (sms.tt_ind_solve_power), I set:
-
Significance level (α): 0.05 (5%)
-
Power: 0.8 (80%)
-
Effect size: 0.2 (moderate)
This returned a required sample size of ~393 customers per group.
However, due to budget constraints, the business could only target about 100 customers in the test group. With that, we could detect an effect size of 0.4, which still offered meaningful insight. (This is imaginary and the data for this was got from same data.lelo)
The Experiment Setup
-
Test Group: 100 customers (aged 18–25) received the new credit card.
-
After 2 months, 40 customers actively used it (conversion rate: 40%).
-
-
Control Group: 40 customers who did not receive the card.
-
The experiment ran for 2 months (from 09-Oct-2023 to 11-Dec-2023).
-
I tracked the average daily transaction amount per group in the
avg_transactions_after_campaigntable.
The Hypotheses
-
Null Hypothesis (H₀): There is no difference in average transaction amounts between the control and test groups.
-
Alternative Hypothesis (H₁): The test group spends more than the control group.
Test Selection: Two-Sample Z-Test
Since each group had 40 customers and the sample size exceeded 30, I opted for a two-sample z-test to compare means.
Descriptive Stats:
| Group | Mean (₹) | Std Dev (₹) |
|---|---|---|
| Control | 248.94 | 9.14 |
| Test (New Card) | 370.54 | 63.25 |
-
Z-score: 3.73
-
Critical Z-value (α = 0.05): 1.64
-
p-value: 0.00009658
Because:
-
Z-score > critical value -
p-value < 0.05We rejected the null hypothesis.
Conclusion:
The average transaction amount for the test group was significantly higher than the control group.
This strongly suggests that the new credit card drove increased spending behavior.
Confidence Interval
I also calculated a 95% Confidence Interval for the average transaction amount in the test group:(354.80, 386.28)
This interval reinforces the observed uplift: customers using the new credit card spend Rs. 355– Rs.386 per day on average, compared to Rs. 249 in the control group.
What This Means for the Bank
The test validated that the new credit card positively impacted customer spending — especially within the targeted 18–25 age group. These findings provide the evidence needed for a broader rollout, backed by real customer behavior and not just assumptions.
