python tip

Checking if Two Columns Are Identical

Since we've practiced joining and splitting columns, you might have noticed that we now have two columns with the first name (first_name and f_name) and two columns with the last name (last_name and l_name). Let's quickly check if these columns are identical. First, note that you can use equals() to check the equality of columns or even entire datasets:

# Checking if two columns are identical with .equals()
clients['first_name'].equals(clients['f_name'])

True

You'll get a True or False answer. But what if you get False and want to know how many entries don't match? Here's a simple way to get this information:

# Checking how many entries in the initial column match the entries in the new column
(clients['first_name'] == clients['f_name']).sum()

500

We've started with getting the number of entries that do match. Here, we again utilize the fact that True is considered as 1 in our calculations. We see that 500 entries from the first_name column match the entries in the f_name column. You may recall that 500 is the total number of rows in our dataset, so this means all entries match. However, you may not always remember (or know) the total number of entries in your dataset. So, for our second example, we get the number of entries that do not match by subtracting the number of matching entries from the total number of entries:

# Checking how many entries in the initial column DO NOT match the entries in the new column
clients['last_name'].count() - (clients['last_name'] == clients['l_name']).sum()

0