February 02, 2022

Customer Personality Analysis

The most successful companies today are the one's who know their customers so well that they are able to anticipate their needs beforehand. This can better be achieved if we can segment the customers into different groups that reflect the similarities among the customers in each group. The goal of the segmentation is to foresee the needs of customers, get to know their interests, lifestyles, priorities and learn their spending habits so that to maximize the value of customers to the business. Customer segmentation has many advantages for the businesses which include:

  • Price Optimization: Understanding the customers and their financial status will help to pace up with price optimization.
  • Enhanced Competitiveness: More the customer attention and more will be the revenue generated and this would in return enhances company competitiveness in the market. If the company can segment the market, it is well known to the customers and the company can come up with new products and optimize the existing products according to the changing preferences of the customers.
  • Acquisition and Retention: A personalized connection with the customers helps a company to win satisfied customers. Better segmentation of the customers will lead to better relationships with the prospective customers. About 75% of satisfied customers are more likely to stay with a company.
  • Increased Revenues: By fine tuning the marketing strategies will help generate more revenues because users will more likely to purchase when they are delivered exactly what they need. Personalized and segmented Emails increase the liklihood that more Emails will be opened. Infact the more Emails are opened, the more sales will be made. Successful marketing not only require knowledge about who your customers are but also where exactly they are in the buying process and customer segmentation based on such information ensure that the marketing campaigns are truly effective.
  • For the given Dataset, I will perform the exploratory data analysis with the help of customer segmentation. Customer segmentation will be carried out with the help of K-Means alogorithm. At the end of analysis I would like to answer some questions by gaining some insights from the data, which are as follows:

  • What are the statistical characteristics of the customers?
  • What are the spending habits of the customers?
  • Are there some products which need more marketing?
  • How the marketing can be made effective?
  • Data Fields

    People

  • ID — Customer's unique identifier
  • Year_Birth — Customer's birth year.
  • Education — Customer's education level.
  • Marital_Status — Customer's marital status.
  • Income — Customer's yearly household income.
  • Kidhome — Number of children in customer's household.
  • Teenhome — Number of teenagers in customer's household.
  • Dt_Customer — Date of customer's enrollment with the company.
  • Recency — Number of days since customer's last purchase.
  • Complain — 1 if customer complained in the last 2 years, 0 otherwise.
  • Products

  • MntWines — Amount spent on wine in last 2 years.
  • MntFruits — Amount spent on fruits in last 2 years.
  • MntMeatProducts — Amount spent on meat in last 2 years.
  • MntFishProducts — Amount spent on fish in last 2 years.
  • MntSweetProducts — Amount spent on sweets in last 2 years.
  • MntGoldProds — Amount spent on gold in last 2 years.
  • Promotion

  • NumDealsPurchases — Number of purchases made with a discount.
  • AcceptedCmp1 — 1 if customer accepted the offer in the 1st campaign, 0 otherwise.
  • AcceptedCmp2 — 1 if customer accepted the offer in the 2nd campaign, 0 otherwise.
  • AcceptedCmp3 — 1 if customer accepted the offer in the 3rd campaign, 0 otherwise.
  • AcceptedCmp4 — 1 if customer accepted the offer in the 4th campaign, 0 otherwise.
  • AcceptedCmp5 — 1 if customer accepted the offer in the 5th campaign, 0 otherwise.
  • Response — 1 if customer accepted the offer in the last campaign, 0 otherwise.
  • Place

  • NumWebPurchases — Number of purchases made through the company’s web site.
  • NumCatalogPurchases — Number of purchases made using a catalogue.
  • NumStorePurchases — Number of purchases made directly in stores.
  • NumWebVisitsMonth — Number of visits to company’s web site in the last month.
  • Import Package and Data

    Started with imports of some basic libraries that are needed throughout the case. This includes Pandas and Numpy for data handling and processing as well as Matplotlib and Seaborn for visualization.

    import numpy as np
    import pandas as pd
    from matplotlib import pyplot as plt
    import seaborn as sns
    import os
    
    import plotly.express as px
    
    %matplotlib inline
    

    For this exercise, the data set (.csv format) is downloaded to a local folder, read into the Jupyter notebook and stored in a Pandas DataFrame.

    customer = pd.read_csv('C:\My Files\Document\Coding\Datasheet\marketing_campaign.csv', sep='\t')
    

    Data Preparation

    Get statistical information on numerical features.

    Handle Missing Value

    Lets check the missing value from the predictor.

    The Income column has some missing data. Let's drop the rows in the data with missing values.

    Feature Engineering

    There is a lot of information given in the dataset related to the customers. In some cases we can group some columns together to create new features and in some cases we can create new columns based on the existing one's to create new features. This would help to better explore the data and draw meaningful insights from it.

    Age of Customers

    Let's calculate the age of every customer from the birth year of customers. Since the customers enrollment with the company between 2012 and 2014, so we assume that the data was collected in January 2015 for the sake of simplicity.

    import datetime as dt
    customer['Age'] = 2015 - customer.Year_Birth
    

    Months Since Enrollment

    From the enrollment date of customers, let's calculate the number of months the customers are affiliated with the company.

    customer['Dt_Customer'] = pd.to_datetime(customer['Dt_Customer'])
    customer['Month_Customer'] = 12.0 * (2015 - customer.Dt_Customer.dt.year ) + (1 - customer.Dt_Customer.dt.month)
    

    Total Spendings

    The customer's spendings are given separately for different products. Let's sum them up to calculate the total spendings of the customers.

    customer['TotalSpendings'] =  customer.MntWines + customer.MntFruits + customer.MntMeatProducts + customer.MntFishProducts + customer.MntSweetProducts + customer.MntGoldProds
    

    Age Groups

    On the basis of Age let's divide the customers into different age groups.

    customer.loc[(customer['Age'] >= 13) & (customer['Age'] <= 19), 'AgeGroup'] = 'Teen'
    customer.loc[(customer['Age'] >= 20) & (customer['Age']<= 39), 'AgeGroup'] = 'Adult'
    customer.loc[(customer['Age'] >= 40) & (customer['Age'] <= 59), 'AgeGroup'] = 'Middle Age Adult'
    customer.loc[(customer['Age'] > 60), 'AgeGroup'] = 'Senior Adult'
    

    Number of Children

    Informationis given separately for kids and teens at home for every customers. Let's sum them up, as they can be better represented together as the number of children at home.

    customer['Children'] = customer['Kidhome'] + customer['Teenhome']
    

    Marital Status

    The Marital Status column has different string values: Together, Married, Divorced, Widow, Alone, Absurd, YOLO. Most of them fall under the same category. So let's represent the marital status of customers based on 2 main categories i.e. Partner and Single.

    customer.Marital_Status = customer.Marital_Status.replace({'Together': 'Partner',
                                                               'Married': 'Partner',
                                                               'Divorced': 'Single',
                                                               'Widow': 'Single', 
                                                               'Alone': 'Single',
                                                               'Absurd': 'Single',
                                                               'YOLO': 'Single'})
    

    Removing Outliers

    There seems to be some outliers in the Age and Income columns. Let's check them.

    plt.figure(figsize=(20,10))
    sns.boxplot(y=customer.Age);
    plt.ylabel('Age', fontsize=20, labelpad=20);
    
    plt.figure(figsize=(20,10))
    sns.boxplot(y=customer.Income);
    plt.ylabel('Income', fontsize=20, labelpad=20);
    

    There are some customers aged above 100. This is unlikely to happen. Let's drop those customers from data.

    There are some customers who are earning more than 120,000 and some of them even more than 600,000. They are clearly the outliers in the data, so we will leave them out.

    Exploratory Data Analysis

    Marital Status

    INSIGHT:

    2/3rd of the customers are living with partners while about 1/3rd are singles

    Average Spendings: Marital Status Wise

    INSIGHT:

    Despite being the minority, the Singles spent more money on the average as compared to the customers having partners

    Education Level

    INSIGHT:

  • Half of the customers are University graduates
  • There are more customers who hold PhD degrees than the customers who did Masters
  • Child Status

    INSIGHT:

  • About 50% of the customers have only one child.
  • 28% of the customers do-not have any children at home while 19% of them have 2 children.
  • Average Spendings: Child Status Wise

    INSIGHT:

  • Customers who don't have any children at home spent higher than the customers having 1 children.
  • The customers having 1 children are spending higher than the customers havin 2 and 3 children.
  • Age Distribution of Customers

    INSIGHT:

    Age of the customers is nearly normally distributed, with most of the customers aged between 40 and 60.

    Relationship: Age vs Spendings

    INSIGHT:

    There doesn't seem to be any clear relationship between age of customers and their spending habits.

    Customers Segmentation: Age Group Wise

    INSIGHT:

  • More than 50% of the customers are Middle Age Adults aged between 40 and 60.
  • The 2nd famous age category is Adult, aged between 20 and 40.
  • Average Spendings: Age Group Wise

    INSIGHT:

    Middle age adults spent much more than the other age groups.

    Income Distribution of Customers

    INSIGHT:

    The salaries of the customers have normal distribution with most of the customers earning between 25000 and 85000.

    Relationship: Income vs Spendings

    INSIGHT:

    The relationship is linear. Customers having higher salaries are spending more.

    Most Bought Products

    INSIGHT:

  • Wine and Meats products are the most famous products among the customers.
  • Sweets and Fruits are not being purchased often.
  • Machine Learning Model

    Let's find out the different segmenst of the customers based on different features of the customers data using the K-Means Clusters. Let's first drop the unnecessary columns from the data.

    Optimum Clusters Using Elbow Method

    Let's choose the optimum number of clusters based on the Elbow method.

    Based on the above plot we will segment the customers into 4 clusters, as the inertia value donot decrase much after 4 clusters.

    Clusters Identification

    Let's try to identify the modelled 4 clusters from different features of the data

    Clusters Interpretation

    From the above analysis we can segment the customers into 4 groups based on their income and total spendings:

  • Platinum: The one's with highest earnings and highest spendings.
  • Gold: The one's with high earnings and high spendings.
  • Silver: The one's having low salary and less spendings.
  • Bronze: The one's having lowest salary and least spendings.
  • Data Exploration: Clusters Based

    Let's explore the data again based on the modelled clusters to identify the spending habits of the customers

    customer_kmeans.clusters = customer_kmeans.clusters.replace({1: 'Bronze',
                                                                 2: 'Platinum',
                                                                 3: 'Silver',
                                                                 0: 'Gold'})
    
    customer['clusters'] = customer_kmeans.clusters
    

    Customers Distribution

    INSIGHT:

  • Most of the customers lie in the Silver and Gold categories, about 29% and 28% respectively.
  • Most of the customers lie in the Silver and Gold categories, about 29% and 28% respectively.
  • Relationship: Income vs. Spendings

    INSIGHT:

  • The 4 clusters can easily be identified from the above plot.
  • Those earning more are also spending more
  • Spending Habits by Clusters

    INSIGHT:

    Customers from all the segments have spent most of their money on Wine and Meat products

    Purchasing Habits by Clusters

    INSIGHT:

  • Platinum and Gold Customers mostly likely to do store purchasing.
  • Most of the web and catalog purchases are also done by the customers from Platinum and Gold segments.
  • Silver and Gold categoriesnalso like to buy from the stores.
  • Deal purchases are common among the Gold and Silver customers.
  • Silver category customers made the most number of web visits while customers from Platinum segment have least web visits.
  • Promotions Acceptance by Clusters

    INSIGHT:

  • Platinum customers accepted the most of the offers from the comapany.
  • Compaign 1, 5 and final one seems to be the most successful one's.
  • Compaign 1, 5 and final one seems to be the most successful one's.
  • Conclusion

  • Most of the customers are university graduates.
  • Most of the customers are living with partners.
  • Those living alone have spent more than those living with partners.
  • Most of the customers have only one child.
  • Those having no children have spent more.
  • Middle Age Adults, aged between 40 and 60, are famous age group category.
  • Middle Age Adults are spending on average, more than the other age groups.
  • Most of the customers are earning between 25000 and 85000.
  • Wine and Meat products are very famous among the customers.
  • On the basis of income and total spendings, customers are divided into 4 clusters i.e. Platinum, Gold, Silver and Bronze.
  • Most of the customers fall into the Silver and Gold categories.
  • Those who are earning more are also spending more.
  • Most of the customers like to buy from store and then online from the web.
  • Platinum customers showed more acceptance towards promotion campaigns while bronze customers the least interest.
  • Answering Question

    What are the statistical characteristics of the customers?

    The company's customers are mostly married. There are more Middle Aged Adults, aged between 40 and 60 and most of them like to have one child. Most of the customers hold bachelor degree and their earnings are mostly between 25,000 and 85,000.

    What are the spending habits of the customers?

    Customers have spent more on wine and meat products. Those without children have spent more than those having children. Singles are spending more than the one's with the partners. Middle aged adults have spent more than the other age groups. Store shopping is the preferred channel for purchasing among the customers. Web and Catalog purchasing also have potential.

    Are there some products which need more marketing?

    Sweets and Fruits need some effective marketing. Company needs to run promotions for these products in order to increase the revenue from these products. Baskets of the least selling products combined with the most selling products can be effective.

    How the marketing can be made effective?

    As a marketing recommendation give coupons to the old and high spending customers. Market the cheap and on-offer products to the low income and low spending customers. Web purchasing has some potential. To unlock this give special discounts to the customers who sign up on company's website.