How To Find Duplicates in a Python List

Share on facebook
Share on google
Share on twitter
Share on linkedin

Finding duplicates in a Python List and Removing duplicates from a Python list variable are quite common tasks. And that’s because Python Lists are prone to collecting duplicates in them. Checking if there are duplicates or not in a list variable is a common task for Python programmers.

Fortunately it is relatively easy to check for duplicates in Python. And once you spot them, you can do several action items

  • List Duplicate Values only
  • Remove Duplicates Values and create a new list without any duplicates
  • Change the current list by removing only the duplicates, essentially deduplicating the existing list.
  • Just evaluate the list for duplicates, and report if there are duplicates in this list.
  • Count the duplicates in the list.

But before we delve deeper into each of these tasks, it is better to quickly understand what are lists, and why duplicates can exist in Python lists.

I also want you to know about the Set data type in the Python programming language. Once you know their unique points and their differences, you will better appreciate the methods used to identify and remove duplicates from a Python list.

What is a List in Python

A list in Python is like an array. It is a collection of objects, stored in a single variable. A list is changeable. You can add or remove elements from Python lists. A list can be sorted too. But by default, a list is not sorted.

A Python list can also contain duplicates, and it can also contain multiple elements of different data types. This way, you can store integers, floating point numbers, positive or negatives, strings, and even boolean values in a list.

Python lists can also contain other lists within it, and can grow to any size. But lists are considered slower in accessing elements, as compared to Tuples. So some methods are more suited for small lists, and others are better for large lists. It largely depends on the list size.

You define a list by enclosing the elements in square brackets. Each element is separated by commas within the list.

What is a Set in Python?

A Set is another data type available in Python. Here also you can store multiple items in a Set. But a set differs from a python list in that a Set can not contain duplicates.

You can define a Set with curly braces, as compared to a list, which is defined by using square brackets.

A Set in Python is not ordered or indexed. It is possible that every time you access a particular index from a set, you get a different value.

Once you have create a Set in Python, you can add elements to it, but you can’t change the existing elements.

Now that you have a basic list comprehension, and Set datatype understanding in Python, we will explore the identification and removal of duplicates in Python Lists.

Multiple Ways To Check if duplicates exist in a Python list

  • Length of List & length of Set are different
  • Check each element in set. if yes, dup, if not, append.
  • Check for list.count() for each element

We will be using Python 3 as the language. So as long as you have any version of Python 3 compiler, you are good to go.

Method 1: Using the length of a list to identify if it contains duplicate elements.

Let’s write the Python program to check this.

# this input list contains duplicates
mylist = [5, 3, 5, 2, 1, 6, 6, 4] # 5 & 6 are duplicate numbers.

# find the length of the list
print(len(mylist))
8

# create a set from the list
myset = set(mylist)

# find the length of the Python set variable myset
print(len(myset))
6
# create a set from the list
myset = set(mylist)

# find the length of the Python set variable myset
print(len(myset))
6

As you can see, the length of the mylist variable is 8, and the myset length is 6.

# create a set from the list
myset = set(mylist)

# find the length of the Python set variable myset
print(len(myset))

Output:

6

Here’s the final Python program – the full code can be copied and pasted into a Python program and used to check if identical items exist in a list or not.

# this input list contains duplicates
mylist = [5, 3, 5, 2, 1, 6, 6, 4] # 5 & 6 are duplicate numbers.

# find the length of the list
print(len(mylist))

# create a set from the list
myset = set(mylist)

# find the length of the Python set variable myset
print(len(myset))

# compare the length and print if the list contains duplicates
if len(mylist) != len(myset):
    print("duplicates found in the list")
else:
    print("No duplicates found in the list")

Output:

8
6
duplicates found in the list

Alternatively, we can create a function that will check if duplicate items exist, and will return a True or a False to alert us of duplicates.

Here the complete function to check if duplicates exist in Python list

def is_duplicate(anylist):
    if type(anylist) != 'list':
        return("Error. Passed parameter is Not a list")
    if len(anylist) != len(set(anylist)):
        return True
    else:
        return False

mylist = [5, 3, 5, 2, 1, 6, 6, 4] # you can see some repeated number in the list.
if is_duplicate(mylist):
    print("duplicates found in list")
else:
    print("no duplicates found in list")

The output of this Python code is:

duplicates found in list

Method 2: Listing Duplicates in a List & Listing Unique Values – Sorted

In this method, we will create different lists for different use – one to have the duplicate keys or repeated values, and different lists for the unique keys. A few lines of code can do magic in a Python program.

# the given list contains duplicates
mylist = [5, 3, 5, 2, 1, 6, 6, 4] # the original list of integers with duplicates

newlist = [] # empty list to hold unique elements from the list
duplist = [] # empty list to hold the duplicate elements from the list
for i in mylist:
    if i not in newlist:
        newlist.append(i)
    else:
        duplist.append(i) # this method catches the first duplicate entries, and appends them to the list

# The next step is to print the duplicate entries, and the unique entries
print("List of duplicates", duplist)
print("Unique Item List", newlist) # prints the final list of unique items

Output:

List of duplicates [5, 6]
Unique Item List [5, 3, 2, 1, 6, 4]

And if you want to sort the list items after removing the duplicates, you can use the inbuilt function called sort on the list of numbers.

# sorting the list
newlist.sort() # the sort method sorts all the values
print("The sorted list", newlist) # this prints the sorted list

Output:

The sorted list [1, 2, 3, 4, 5, 6]

Method 3: Listing only Duplicate values with the Count Method

This method iterates over each element of the entire list, and checks if the count of each element is greater than 1. If yes, that item is added to a set. If your remember, a set cannot contain any duplicates, by design. In the following code, for items that exist more than once, only those repeated element are added to the set.

# the mylist variable represents a duplicate list.
mylist = [5, 3, 5, 2, 1, 6, 6, 4] # the original input list with repeated elements.

dup = {x for x in mylist if mylist.count(x) > 1}
print(dup)

#To count the number of list elements that were duplicated, you can run
print(len(dup))

Output:

{5, 6}
2

Keep in mind that the listed duplicate values might have existed once, or eve

The fastest way to Remove Duplicates From Python Lists

One of the fastest ways to remove duplicates is to create a set from the list variable. All this can be done in just a single Python statement. This is the fastest method, so it is more suited for large lists.

Here’s the final code in Python – probably the best way…

# this list contains duplicate number 5 & 6
mylist = [5, 3, 5, 2, 1, 6, 6, 4]
myunique = set(mylist) # prints the final list without any duplicates
print(myunique)

Output:

{1, 2, 3, 4, 5, 6}

How to Avoid Duplicates in a Python List

The first thing you must think of is – Why am I using a list in Python?

Because it can collect duplicates. If you are absolutely clear that duplicates don’t exist in whatever you are collecting or storing, then don’t use a list. Instead a better way is to use a Set. A set is built to reject duplicates, so this is a better solution. You should explore sets a bit more to gain a better set comprehension. It can be a real time saver as this is a more efficient way.

If you don’t care about the order then just using set(mylist) will do the job of removing any duplicates. This is what I use, even in the worst case scenario where the incoming entire list is a dirty list of multiple duplicate elements.

Alternatively, if you really must use a list because of the things you can do with a list data type, then do a simple check before you add any element.

For example, you can sort a list, but not a Set in Python. It can be useful for large lists.

So before you add any new element in a list, just do a quick check for the existence of the value. If the element exists, then don’t store it. Simple!

The methods discussed above work on any list of elements. So if you want to find duplicate strings or duplicate integers or duplicate floating numbers or any kind of duplicate objects, you can use these Python programs.

Hope the different ways to find duplicates, list them, and finally remove duplicate elements altogether from any Python list using simple programs and methods will come in handy for your processing and list comprehension.

Vinai Prakash

Vinai Prakash

Vinai is the Founder & Master Trainer at Intellisoft Training. He writes about technology, skills upgrading and loves to share his tips and tricks to improve everyday productivity, and get more done. Intellisoft provides ICDL Certifications, Adobe CC, Microsoft Office training in Singapore. We are an ATO of SSG & an authorized ICDL testing center.

Leave a Reply

Follow Us

Recent Posts

Weekly Tutorial

Sign up for our Newsletter

We’ll send you some tips &  tutorials, plus Training News & Updates to your email periodically.