March 01, 2022

Pakistan Used Cars

This dataset was collected for the research purposes and was uploaded on kaggle platform about 2 years ago. It contains information regarding the sale of mostly used cars in Pakistan along with their prices, year, model, brand, kilometers driven, registration information etc. The data was collected from differenet car selling websites in Pakistan. I would formulate some questions to gain deeper understanding of the data and bring some interesting conclusions from the data.

  • What is the average price of Suzuki, Toyota and Honda cars i.e. the 3 most famous car brands in Pakistan?
  • Cars from which release years are most cheapest (on average) in Pakistan for the release years beyond 2000?
  • What amount of investment is needed to get employees up to a fully productive speed?
  • Which brand cars have covered most kilometers on the roads?
  • Which fuel type cars are cheapest on average?
  • Import Package and Data

    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    %matplotlib inline
    import plotly.express as px
    

    For this exercise, the data set (.csv format) is uploaded to my repository github, read into the Jupyter notebook and stored in a Pandas DataFrame.

    Data Preparation and Cleaning

    As a first step, the data will be cleaned. Columns with missing values will be identified and then the missing values will be handeled. By displaying the dataset we can quickly recognize that there are 24973 rows and 9 columns in the dataframe

    By displaying the unique values from Brand column of dataset, we can see that the cars from 23 different brands are being dricven in Pakistan. The dataset also contains NAN values, which means that for some cars the brand information is missing. The missing values will be handed later.

    By sorting the unique values from the Year column, it is infered that the dataset contains the information for car models from 1915 till 2020. The release year for some cars is missing as the array also conatins NAN values.

    Looking at the non-null counts it is clear that only for the Price column the data is complete. For all other columns there are missing entries. For a customer buying a new car, the most important information is the price, the kilometers (in used cars) driven and the release year. I will next replace the missing values for kilometers driven and year with the average values of the respective columns.

    Using the decribe method we can easily fetch the mean values of kilometers driven and year.

    Replacing Missing Values

    cars_df['KMs Driven'].replace(np.nan, cars_df['KMs Driven'].mean(), inplace=True)
    
    cars_df.Year.replace(np.nan, cars_df.Year.mean(), inplace=True)
    

    Now using the info method we can verify that the data for driven kilometers and year columns is also complete

    Exploratory Data Analysis

    I will explore the different columns in the dataset to gain some insight into the data. This would be done mostly by drawing different graphs. Visualization of the data would help to draw interesting conclusion from the data

    Popular Models

    The most popular car models are Cultus VXR, Alto, Corolla GLI, Mehran VX, Mehran VXR and Bolan.

    Popularity of Release Years

    The cars from the last 15 years starting from Release Year 2004 are popular ones.

    Relationship between Car's Release Year and Price

    The better way to study this relationship is to consider the age of car than than the year when it was released. Let's add another column in the dataframe for the age of car. The age is calculated with the help of datetime library in Python.

    Adding Age Column

    It's an inverse relationship with the price decreasing as the age of the car is increasing. As the cars prices vary in very large range, so it would be better to plot the price in logrithmic scale

    Adding Log Price Column

    On the logrithmic scale, the visualization becomes much clearer than before and the inverse relationship is more obvious

    Relationship between Mileage and Price

    The relationship is exponentially decreasing. The cars which have mileage less than 1000 kilometers are considered new and the cars which have mileage more than 1 million are very rare as they are not considered to be in good condition. Let's drop the rows in the data accordingly for better visualization

    The relationship is exponentially decreasing. The price of car decreases sharply with increasing mileage

    European Vs. Non-european Brands

    Let's divide the brands into non-european and european brands and study the same relationship

    The european car brands are very rare in Pakistan, so not enough data is available to draw any clear inferences. The available data only contains less than 1% of european cars as compared to the overall car's data

    Popularity based on Fuel

    From the above bar graph it can be easily visulaized that the people in Pakistan prefer to drive mostly Petrol cars. LPG cars have the lowest numbers of drivers in Pakistan as this fuel is not available in abundant in Pakistan

    Sales of Top 10 Brands

    The bar plot shows that the people in Pakistan prefer cars from Japenese brands as the most selling cars from the top 10 brands in Pakistan are from Japenese brands. This is obvious as the most of the Japenese car companies have manufacturing plants in Pakistan

    Correlation between Release Year and Kilometers Driven

    The above scatterplot shows that Pakistani people like to drive cars of the new models i.e. released between 1980 and 2020. For the cars between 1980 and 2000 models, the preferred fuel was CNG as it is the cheapest fuel available in Pakistan. For the 2000 - 2020 model cars, the people tend to shift to the Petrol cars due to the shortage of CNG in Pakistan lately

    Cities with Highest Registration of Cars

    Most of the cars sold online are registered in Karachi and Lahore, which are the two biggest cities of Pakistan according to the population. Islamabad, the capital of Pakistan lies at the third position as most corporate people live there. Next two positions are occupied by Rawalpindi and Multan for cars registration, which also lies in the list of top 10 most populas cities of Pakistan. So it can be concluded that most of the car's business is being done in the most populas cities of Pakistan

    Most Expensive Car Brands

    The figure shows that the most expensive barnds are european brands like Range Rover, Audi, Porsche, BMW etc as the cars from these barnds get imported in Pakistan. These brands don't have manufacturing in Pakistan

    Distribution of Kilometers Driven

    The used cars which had already been driven less than a million are still considered to be in good conditions in a country like Pakistan. So let's visulaize the distribution of kilometers driven by cars

    The histogram shows that the the used cars which had been put up for the sale online are mostly driven between 50 thousand and 2 lack kilometers while a very less number of cars which are driven more than 2 lack are available online. It means people prefer to buy cars which are driven less than the 2 lack kilometers

    Distribution of Price

    Lets explore the distribution of car's prices in range of 1 lack and 1 million

    The distribution shows that the number of cars are almost evenly distributed in the explored price range with the large number of cars lies in the range less than a half a million. It means people in Pakistan prefer to buy cars which cost less than half a million

    Price Distribution of Top 3 Brands

    Among the top three brands in Pakistan, people again prefer to buy cars with the prices lower than half a million. The major car brand in this price range is Suzuki, which is the most famous brand in Pakistan. The distribution also shows that the Toyota and Honda cars are relatively expensive than the Suzuki cars as the sample density of Toyota and Honda cars tend to increase after half million price range

    Answering Some Basic Questions

    What is the average price of Suzuki, Toyota and Honda cars i.e. the 3 most famous car brands in Pakistan?

    The average prices of Suzuki, Toyota and Honda cars are 5.6, 1.65 and 1.14 Million Pakistani Rupees

    Cars from which release years are most cheapest (on average) in Pakistan for the release years beyond 2000?

    The cars from the release year 2001 are the cheapest one's among the cars which were released after Year 2000

    Which brand cars have covered most kilometers on the roads?

    Suzuki, Toyota and Honda i.e. Japenese brand cars have been driven the most in Pakistan

    Which fuel type cars are cheapest on average?

    Although LPG is not available in abundant in Pakistan but still LPG cars are on avergae cheapest in Pakistan

    Which city has the highest registered Mercedes cars?

    Merecedes cars are registered most in Karachi, which is the biggest city of Pakistan according to the population

    Conclusion

    Following inferences and conclusions can be drawn from the the exploratory analysis of the data:

  • Most cars sold are from Japenese brands as these have manufacturing setups in Pakistan.
  • Most of the cars in Pakistan are the Petrol cars.
  • In the late late 90's CNG cars got famous in Pakistan as this is the cheapest fuel available in Pakistan but after Year 2000, the trend is again shifting to Petrol cars.
  • Most expensive cars in Pakistan are from European brands as the cars from these brands are the imported ones.
  • Cities with highest population have the most registered cars.
  • Elite cars from European brands are mostly registered in biggest cities of Pakistan.
  • Pakistani people mostly like to buy used cars.
  • Most cars being driven in Pakistan are from the last 15 years.