Can you sort a list of strings in Python?

It seems its been awhile since Ive written a Python article, but the series has been fairly successful. So, I figured I dive back in with an article on how to sort a list of strings in Python. Lets get to it!

Table of Contents

Problem Introduction

Recently, I discovered a bug in my Sample Programs Wiki Generator

code which caused the output wiki to occasionally display a list of strings in the wrong order. The expected list looked something like:

[A, B, C, ...., X, Y, Z]

For whatever reason, the list was instead scrambled:

[H, G, A, ..., Q, B, C]

As I dug into the code a bit, I discovered the following line of code:

alphabetical_list = os.listdir[self.repo.source_dir]

As we can see, were banking on the OS library to produce a list of directories in alphabetical order. I guess thats not always the case. To be sure, I took a peek at theos.listdirdocumentation

, and it did not disappoint:

Return a list containing the names of the entries in the directory given bypath. The list is in arbitrary order, and does not include the special entries'.'and'..'even if they are present in the directory.

Naturally, I decided I wanted to sort this list to avoid any future issues. In this article, well take a look at a few ways to sort a list of strings in Python.

Solutions

When it comes to sorting, theres no shortage of solutions. In this section, well cover three of my favorite ways to sort a list of strings in Python.

Sort a List of Strings in Python by Brute Force

As always, we can try to implement our own sorting method. For simplicity, well leverage selection sort:

my_list = [7, 10, -3, 5] size = len[my_list] for i in range[size]: min_index = i for j in range[i + 1, size]: if my_list[j] < my_list[min_index]: min_index = j temp = my_list[i] my_list[i] = my_list[min_index] my_list[min_index] = temp print[my_list]

It works by comparing the characters of each string directly from their ASCII values in Python 2 or their Unicode values in Python 3. Dont believe me? Try it out yourself:

"hello" > "the" # returns false "the" > "hello" # returns true

The boolean operators work on strings directly in Python, so we dont have to worry about writing our own loops to perform the comparison.

Naturally, this solution has its drawbacks. For instance, sorting is almost meaningless for Non-English character sets. In addition, with this method, wed be performing a case-sensitive sort, so a list like["abs", "Apple", "apple"]will look something like['Apple', 'abs', 'apple']after sorting.

Notice how two of the words are exactly the same but separated in the list. Wed need to use something like thecasefold function for better results.

Sort a List of Strings in Python Using the Sort Function

Why sort by hand when we can leverage the high-level power of python? Naturally, python has a built in sort functionality that works by accepting a list and sorting it in place. Lets see what it does for a list of strings:

my_list=["leaf","cherry","Fish"] my_list.sort[] print[my_list]#prints["Fish", "cherry","leaf"]

As we can see, using the predefined sort function, we get the same case-sensitive sorting issue as before. If thats not problem, feel free to use this solution.

Luckily, sort has a special parameter called key which we can use to specify the ordering:

my_list=["leaf","cherry","Fish"] my_list.sort[key=str.casefold] print[my_list]#prints["cherry","Fish","leaf"]

In the next section, well discuss this key parameter more deeply.

Sort a List of Strings in Python Using the Sorted Function

While lists have their own sort functionality, Python exposes the sort functionality with a separate function called sorted which accepts an iterable. In other words, this new function allows us to sort any collection for which we can obtain an iterablenot just lists. The only difference is that the sorted functionality doesnt perform a sort in place, so well have to save the result back into our variable. Lets try it:

my_list=["leaf","cherry","Fish"] my_list = sorted[my_list] print[my_list]#prints["Fish", "cherry","leaf"]

Here we can see that we get the same problem as the previous two implementations. So, how do we fix it? Well, fortunately, were able to pass a key to the sorted function which defines how to sort the iterable. Take a look:

my_list=["leaf","cherry","Fish"] my_list = sorted[my_list, key=str.casefold] print[my_list]#prints["cherry","Fish","leaf"]

Here weve defined a key which leverages the casefold function from earlier. Feel free to read up on Pythons documentation to learn more about how it works. But to summarize, its basically a more aggressive lowercase function which can handle many different character sets.

Of course, there are other keys we can leverage such ascmp_to_key[locale.strcoll]which works for the current locale. If you have any keys youd recommend, let us know in the comments. As it turns out, manipulating strings isnt always easy. I learned that the hard way when I started the Reverse a String in Every Language series.

Sort a List of Strings in Python in Descending Order

At this point, were able to sort properly, but lets take things a step further. Lets sort the list backwards. In other words, the word that normally comes last alphabetically will come first:

my_list=["leaf","cherry","fish"] my_list = sorted[my_list, key=str.casefold, reverse=True] print[my_list]#prints["leaf", "fish", "cherry"]

Fortunately, the python developers thought ahead and added this functionality right into the sorted method. Using the reverse keyword, we can specify which direction sorting should occur.

And with that, we have everything we need to know to begin sorting.

Performance

To test the performance of each solution, well want to set them up in strings:

setup = """ import locale from functools import cmp_to_key my_list = ["leaf", "cherry", "fish"] """ brute_force = """ size = len[my_list] for i in range[size]: for j in range[size]: if my_list[i] < my_list[j]: temp = my_list[i] my_list[i] = my_list[j] my_list[j] = temp """ generic_sort = """ my_list.sort[] """ case_fold_sort = """ my_list.sort[key=str.casefold] """ generic_sorted = """ my_list = sorted[my_list] """ case_fold_sorted = """ my_list = sorted[my_list,key=str.casefold] """ locale_sorted = """ my_list = sorted[my_list,key=cmp_to_key[locale.strcoll]] """ reverse_case_fold_sorted = """ my_list = sorted[my_list,key=str.casefold,reverse=True] """

Next, we can test each solution using the timeit library:

>>> import timeit >>> min[timeit.repeat[stmt=brute_force, setup=setup, repeat=10]] 2.4897978000003604 >>> min[timeit.repeat[stmt=generic_sort, setup=setup, repeat=10]] 0.08845160000009855 >>> min[timeit.repeat[stmt=case_fold_sort, setup=setup, repeat=10]] 0.40834640000002764 >>> min[timeit.repeat[stmt=generic_sorted, setup=setup, repeat=10]] 0.1804069999998319 >>> min[timeit.repeat[stmt=case_fold_sorted, setup=setup, repeat=10]] 0.5034002000002147 >>> min[timeit.repeat[stmt=locale_sorted, setup=setup, repeat=10]] 1.0272592000001168 >>> min[timeit.repeat[stmt=reverse_case_fold_sorted, setup=setup, repeat=10]] 0.5373070999999072

And, there we have it! Apparently, the generic sort method is quite fast. If youre comfortable with the natural ordering of strings, thats definitely the way to go.

Of course, dont try to write your own sorting algorithm! Look how slow our brute force implementation is in comparison to all other solutions. Were talking two orders of magnitude slower than the built in sort method. Now, thats slow.

A Little Recap

At this point, weve covered several ways to sort a list of strings. Lets take another look:

my_list = ["leaf", "cherry", "fish"] # Brute force method using bubble sort my_list = ["leaf", "cherry", "fish"] size = len[my_list] for i in range[size]: for j in range[size]: if my_list[i] < my_list[j]: temp = my_list[i] my_list[i] = my_list[j] my_list[j] = temp # Generic list sort *fastest* my_list.sort[] # Casefold list sort my_list.sort[key=str.casefold] # Generic list sorted my_list = sorted[my_list] # Custom list sort using casefold [>= Python 3.3] my_list = sorted[my_list,key=str.casefold] # Custom list sort using current locale import locale from functools import cmp_to_key my_list = sorted[my_list,key=cmp_to_key[locale.strcoll]] # Custom reverse list sort using casefold [>=Python3.3] my_list = sorted[my_list,key=str.casefold,reverse=True]

And, thats it! I hope you enjoyed this article, and perhaps you even found it useful. If so, why not become a member? That way, youll always be up to date with the latest The Renegade Coder content.

Once again, you can also support the site by making purchases on Amazon through the following affiliate links:

  • Python Crash Course: A Hands-On, Project-Based Introduction to Programming
    by Eric Matthes
  • Python Programming: A Smart Approach For Absolute Beginners
    by Steve Manson

While I havent personally used these resources, I can say that I put quite a bit of research into finding products that I believe will benefit you.

While youre here, check out some of these other Python articles:

  • Rock Paper Scissors Using Modular Arithmetic
  • How to Write a List Comprehension in Python

As always, thanks for taking time to support the site. Also, special thanks to all my patrons who continue to support my work. See ya next time!

How to Python [41 Articles]Series Navigation

The How to Python tutorial series strays from the usual in-depth coding articles by exploring byte-sized problems in Python. In this series, students will dive into unique topics such as How to Invert a Dictionary, How to Sum Elements of Two Lists, and How to Check if a File Exists.

Each problem is explored from the naive approach to the ideal solution. Occasionally, therell be some just-for-fun solutions too. At the end of every article, youll find a recap full of code snippets for your own use. Dont be afraid to take what you need!

If youre not sure where to start, I recommend checking out our list of Python Code Snippets for Everyday Problems. In addition, you can find some of the snippets in a Jupyter notebook format on GitHub

,

If you have a problem of your own, feel free to ask. Someone else probably has the same problem. Enjoy How to Python!

Previous Post: [#9] [#11]: Next Post
Tweet

Video liên quan

Chủ Đề