top of page

Data Cleaning

Python 3.8

Overview

The data always comes in messy and data cleaning is the most crucial scenario of data analytics. Here, up to 95% of this data is cleaned by using Python 3.8. The tool used is Jupyter Notebook from Anaconda. The data or excel file is loaded to the interface for further data cleaning. In python two main libraries like Pandas and NumPy are being used.

The Whole code is posted on GitHub : https://github.com/smzahir/Portfolio-Project

Code

The coding has been done in Jupyter notebook and will shown as screen shots or images below. 

In the above code we have uploaded data to interface and called it using pandas and converted into dataframe.

In the above coding we have used null function to know the empty rows or null  which is useful for our visualization and also we can see so many columns has null values. The total row is 11200 so null values close to this are eliminated.

In the above coding we have used slicing to to get rid of null columns and also selected columns which are useful for our visualization. There is column which we need in future. The total purchase column which is the summation total billed quantity and unit purchase rate. So, we added an extra column for that named as total.

In the above code we are the unique item under group so that it could be separated into five another groups.

In the above code we are grouping the items based on five '' Car, Bike, Scooter, Auto and Commercial ". The items are manually selected from the unique array of the above code. 

In the above data after filtering we can see some items like tubes and flaps. After studying the we figured out that it doesn't cause any impact to our data. So we should get rid of that rows which will be done in next steps.

So logically we need to make a list and call the items which match the items. After filtering we will get the item that only fall under the five groups.

So finally after all cleaning the data still contains some null value which we will sort out in next session. Now the data contains 8062 rows which was 11200  earlier. Now we will read and download data using excel function and then load and transform in Power BI. So that is the end of Cleaning session. 

bottom of page