A New Perspective On Vegan Diet And The Vitamin B12

The goal of this story is that you will gain a new perspective on how we classify diets as complete or incomplete. It really depends on how you look at it. I have tried every conceivable diet on…

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转

Normalization vs. Standardization

Stop using it interchangeably

If you are getting started in your data science journey or have spent some time with some dataset, then you definitely understand the struggle of keeping up with the terminology of data pre-processing.

Real-world data is often incomplete, inconsistent, and lacking in certain behaviors or trends, and is likely to contain many errors. i.e

Data preprocessing is a proven method of resolving such issues.

Data preprocessing, a technique that focuses more on Data cleaning, Data integration, Data transformation, Data dimensionality reduction, Data discretization.

So, today we’ll only focus on two major data preprocessing techniques so that it’ll be easier for us to understand the dimensionality reduction technique.

Data Normalization and Data Standardization come under the data dimensionality reduction task. And these are the most important technique to understand before going to implement a classical Machine Learning model.

Iris dataset has 4 features

‘Species’ column contents all the class levels. i.e Setosa, Versicolor and Virginica.

Here is a sample of the dataset. We’ll see how we can perform Normalization and Standardization by using this sample dataset.

Data Normalization usually means to scale a variable to have a value between 0 and 1, and we can archive that by using the following formula:

Let’s take one feature of the data set “Sepal width”.

Sepal Width = [3.5, 3.0, 3.2, 3.1, 3.6, 3.7, 3.4, 3.4, 2.9, 2.7]

From this we need to normalize the Sepal Width column.

By using the normalization formula we can compute our new Sepal Width as follows

max(Sepal Width) = 3.7

min(Sepal Width) = 2.7

(Sepal Width)new = [0.7, 0.2, 0.5, 0.3, 0.8, 1.0, 0.6, 0.6, 0.1, 0.0]

Now you can see all our Speal Width values are transformed such that all values lie between 0 and 1. We can do the same thing for the rest of the data features like SepalLength, PetalLength, and PetalWidth

As we are normalizing the columns, people named it as Column Normalization.

Now you may ask me a question, why should we learn this concept?

Well, the answer is very simple, whatever the way the data is collected we can go to new space, through column normalization and we can create new features in such a way that all the values can lie between 0 and 1.

Standardization transforms data to have a mean of zero and a standard deviation of 1. Data points can be standardized with the following formula:

Let me tell you Data Standardization is more often use in practice as compared to data normalization.

Now let’s take our case to understand,

xi = Sepal Width = [3.5, 3.0, 3.2, 3.1, 3.6, 3.7, 3.4, 3.4, 2.9, 2.7]

x̄ = mean(Sepal Width) = 3.25

S = standard deviation(Sepal Width) = 0.32403703492039304

By using the normalization formula we can compute our new Sepal Width as follows:

(Sepal Width)new =[0.77, -0.77, -0.15, -0.46, 1.08, 1.38, 0.46, 0.46, -1.0, -1.6]

We can do the same thing for the rest of the data features like SepalLength, PetalLength, and PetalWidth.

Let’s plot it,

Here we can say that one standard deviation is ling between -1.6 and 1.6.

Column Standardization is also called as Mean Centering. Sometimes it’s also known as z-score.

Congrats you have come to the end of the blog. I hope you are convinced now.

If you’ve got something on your mind you think this article is missing, leave a response below.

Thanks for reading so far.

Will see you on my next post!!!

A New Perspective On Vegan Diet And The Vitamin B12

Normalization vs. Standardization

Stop using it interchangeably

Add a comment

Related posts:

360 Video

Key Ways Call Tracking Can Help Boost Marketing Efforts

Keycloak on OpenShift Origin