SHEBEEB S

With a strong foundation in software engineering, I discovered my passion for data-driven decision making and intelligent systems. This curiosity led me to transition into Data Science, exploring art of data and passionately solving real-world problems through Data Science, Machine Learning and storytelling.

Project Overview

It’s all about a fictional bank with idea of launching a new credit card, they needed strong data-backed insights to identify potential customers. The task was exciting: dig deep into customer, transaction, and credit profile data to tell a story—who should get this card, and why?
As part of the Codebasics Data Analytics Bootcamp, I took on this challenge using a dataset sourced from data.lelo, provided by my mentor. Here’s the story of how I tackled the problem end-to-end.

The main goal was to:

  • Understand customer behavior using demographics and transaction history.

  • Identify creditworthy individuals likely to respond well to a new credit card offer.

  • Recommend a target customer group for the credit card rollout.

Dataset Breakdown

The data is got in sql format with database “e_master_card” and the main tables inside it are:

1. customers

This table contains demographic and financial details of bank customers. It serves as the base dataset for identifying potential credit card holders.

Key Columns:

  • customer_id: Unique identifier for each customer.

  • gender: Gender of the customer.

  • age: Age in years.

  • marital_status: Marital status (e.g., Married, Single).

  • occupation: Profession type (e.g., Salaried, Self-Employed).

  • annual_income: Reported yearly income (cleaned to fix invalid 0 values).

  • joining_date: Date the customer joined the bank.


2. credit_profiles

This table provides credit behavior and scoring details for each customer. It is essential for assessing financial reliability and risk.

Key Columns:

  • customer_id: Foreign key linked to customers.

  • credit_score: Numeric score indicating the customer’s creditworthiness.

  • credit_card_debt: Total current debt on credit cards.

  • num_credit_cards: Number of active credit cards.

  • credit_utilization_ratio: Percentage of credit used versus total available credit.


3. transactions

This dataset records individual financial transactions made by customers, used to evaluate spending behavior and activity level.

Key Columns:

  • transaction_id: Unique ID for each transaction.

  • customer_id: Foreign key linked to customers.

  • transaction_amount: Monetary value of the transaction.

  • transaction_type: Type of transaction (e.g., POS, Online).

  • merchant_category: Category of the purchase (e.g., Travel, Groceries).

  • transaction_date: Date of the transaction.

Technical Analysis

The Full Story

Income Imputation

One glaring issue was that some customers had an annual income of 0, which is unrealistic. Deleting them would lead to a 5% data loss (50 out of 1000), so I:

  • Grouped customers by occupation

  • Replaced the 0 income values with the median income for their occupation

This preserved integrity without introducing skewed mean-based imputations.

Other Validations

  • Verified no nulls in key columns

  • Standardized formatting for date columns and categorical fields

  • Ensured consistency in transaction types and merchant categories


Data Transformation: Making Data Speak Business

After cleaning, I performed several transformations to prepare for analysis:

  • Merged all datasets using customer_id

  • Created new fields:

    • Total spend per customer

    • Number of credit cards held

    • Days since last transaction

    • Credit utilization ratio

  • Grouped customers by age buckets: 18–25, 26–35, 36–45, etc.

  • Flagged inactive customers (no transaction in the last 90 days)


Once I had completed the initial transformations—merging the customer profiles, credit behavior, and transactional activity—I turned my focus toward understanding the story behind the numbers. I wasn’t just looking for surface-level metrics; I was hunting for behavioral signals and under-served customer segments the bank may have overlooked.

At this point, I had engineered several new features that enriched the dataset—such as transaction frequency, total spend per customer, credit utilization, and even categorical preferences based on merchant types. My next step was to slice this information through a demographic lens, starting with age groups.

I used binning to create distinct age ranges like 18–25, 26–35, 36–45, and so on. It was during this age-based grouping that something caught my attention: the 18–25 age group made up nearly 26% of the entire customer base. That’s over a quarter of all customers—yet, everything else about them seemed to sit quietly at the lower end of the charts. That contradiction intrigued me.

To dig deeper, I filtered the dataset for this specific segment and began evaluating their income, credit activity, and spending behavior. The income analysis showed that their average annual income was below Rs. 50,000—expected, given they are likely students or early-career professionals. But it was their credit behavior that stood out: very few of them had a healthy credit history, and most had low credit scores and minimal credit card usage.

Rather than dismissing this group as unqualified, I considered the context: this demographic isn’t necessarily risky—they’re just new to the financial system. They’re in the early stages of their credit journey and likely haven’t had access to personalized credit products. That’s when it clicked: the issue isn’t risk—it’s opportunity.

To validate this, I turned to their transactional behavior. I grouped their spend by merchant categories and saw a very clear trend—Electronics, Fashion & Apparel, and Beauty & Personal Care topped the charts. These aren’t the patterns of passive savers. These are digitally active, trend-driven consumers, making them ideal for a curated credit card product.

Moreover, a deeper look at their payment modes revealed a clear dependency on UPI and debit cards. They weren’t using credit cards—not because they didn’t want to, but because they likely weren’t offered one that felt tailored to them.

All of this built a powerful case: this 18–25 group was young, eager, and ready to engage. They just needed the right entry point.

That’s when the narrative shifted—from analyzing data to recommending action. It wasn’t just about identifying how this group performed; it was about realizing what was missing in their experience—and how the bank could fill that gap. And so, I earmarked them as the untapped market with high future value, perfect for a low-limit, reward-based starter credit card.

This discovery was the result of a careful blend of data storytelling, behavioral insight, and an understanding of market timing. Sometimes, the biggest opportunities aren’t hidden—they’re just not yet visible through a traditional lens.



The Untapped Potential: Age Group 18–25

To identify growth opportunities, I started by segmenting the customer base into age groups using the age column from the customers table. I used pd.cut() to categorize ages (e.g., 18–25, 26–35, etc.).

Then, I filtered the dataset to isolate the 18–25 group and performed the following analyses:

  • Income Check: I calculated the average annual_income for this segment. It came out to less than Rs. 50,000, which was the lowest among all groups.

  • Credit History: From the credit_profiles table, I checked their credit_score and credit_limit, and found both were consistently on the lower end — indicating thin or new credit histories.

  • Card Usage Behavior: I examined their transaction_type from the transactions table and found that credit card usage was noticeably lower than other groups, while UPI and debit payments were more frequent.

  • Spending Habits: I grouped transaction data by merchant_category and aggregated the spend by age group. For 18–25, the top 3 categories were:

    • Electronics

    • Fashion & Apparel

    • Beauty & Personal Care

These behavioral patterns revealed that while this group spends actively, especially on lifestyle-related purchases, they are not yet targeted effectively with credit card products — presenting a valuable business opportunity.


Future Analysis