Can you sort a list of strings in Python?
It seems its been awhile since Ive written a Python article, but the series has been fairly successful. So, I figured I dive back in with an article on how to sort a list of strings in Python. Lets get to it! Show Table of ContentsProblem IntroductionRecently, I discovered a bug in my Sample Programs Wiki Generator For whatever reason, the list was instead scrambled: [H, G, A, ..., Q, B, C]As I dug into the code a bit, I discovered the following line of code: alphabetical_list = os.listdir(self.repo.source_dir)As we can see, were banking on the OS library to produce a list of directories in alphabetical order. I guess thats not always the case. To be sure, I took a peek at theos.listdirdocumentation
Naturally, I decided I wanted to sort this list to avoid any future issues. In this article, well take a look at a few ways to sort a list of strings in Python. SolutionsWhen it comes to sorting, theres no shortage of solutions. In this section, well cover three of my favorite ways to sort a list of strings in Python. Sort a List of Strings in Python by Brute ForceAs always, we can try to implement our own sorting method. For simplicity, well leverage selection sort: It works by comparing the characters of each string directly from their ASCII values in Python 2 or their Unicode values in Python 3. Dont believe me? Try it out yourself: "hello" > "the" # returns false "the" > "hello" # returns trueThe boolean operators work on strings directly in Python, so we dont have to worry about writing our own loops to perform the comparison. Naturally, this solution has its drawbacks. For instance, sorting is almost meaningless for Non-English character sets. In addition, with this method, wed be performing a case-sensitive sort, so a list like["abs", "Apple", "apple"]will look something like['Apple', 'abs', 'apple']after sorting. Notice how two of the words are exactly the same but separated in the list. Wed need to use something like thecasefold function for better results. Sort a List of Strings in Python Using the Sort FunctionWhy sort by hand when we can leverage the high-level power of python? Naturally, python has a built in sort functionality that works by accepting a list and sorting it in place. Lets see what it does for a list of strings: my_list=["leaf","cherry","Fish"] my_list.sort() print(my_list)#prints["Fish", "cherry","leaf"]As we can see, using the predefined sort function, we get the same case-sensitive sorting issue as before. If thats not problem, feel free to use this solution. Luckily, sort has a special parameter called key which we can use to specify the ordering: my_list=["leaf","cherry","Fish"] my_list.sort(key=str.casefold) print(my_list)#prints["cherry","Fish","leaf"]In the next section, well discuss this key parameter more deeply. Sort a List of Strings in Python Using the Sorted FunctionWhile lists have their own sort functionality, Python exposes the sort functionality with a separate function called sorted which accepts an iterable. In other words, this new function allows us to sort any collection for which we can obtain an iterablenot just lists. The only difference is that the sorted functionality doesnt perform a sort in place, so well have to save the result back into our variable. Lets try it: Here we can see that we get the same problem as the previous two implementations. So, how do we fix it? Well, fortunately, were able to pass a key to the sorted function which defines how to sort the iterable. Take a look: my_list=["leaf","cherry","Fish"] my_list = sorted(my_list, key=str.casefold) print(my_list)#prints["cherry","Fish","leaf"]Here weve defined a key which leverages the casefold function from earlier. Feel free to read up on Pythons documentation to learn more about how it works. But to summarize, its basically a more aggressive lowercase function which can handle many different character sets. Of course, there are other keys we can leverage such ascmp_to_key(locale.strcoll)which works for the current locale. If you have any keys youd recommend, let us know in the comments. As it turns out, manipulating strings isnt always easy. I learned that the hard way when I started the Reverse a String in Every Language series. Sort a List of Strings in Python in Descending OrderAt this point, were able to sort properly, but lets take things a step further. Lets sort the list backwards. In other words, the word that normally comes last alphabetically will come first: my_list=["leaf","cherry","fish"] my_list = sorted(my_list, key=str.casefold, reverse=True) print(my_list)#prints["leaf", "fish", "cherry"]Fortunately, the python developers thought ahead and added this functionality right into the sorted method. Using the reverse keyword, we can specify which direction sorting should occur. And with that, we have everything we need to know to begin sorting. PerformanceTo test the performance of each solution, well want to set them up in strings: setup = """ import locale from functools import cmp_to_key my_list = ["leaf", "cherry", "fish"] """ brute_force = """ size = len(my_list) for i in range(size): for j in range(size): if my_list[i] < my_list[j]: temp = my_list[i] my_list[i] = my_list[j] my_list[j] = temp """ generic_sort = """ my_list.sort() """ case_fold_sort = """ my_list.sort(key=str.casefold) """ generic_sorted = """ my_list = sorted(my_list) """ case_fold_sorted = """ my_list = sorted(my_list,key=str.casefold) """ locale_sorted = """ my_list = sorted(my_list,key=cmp_to_key(locale.strcoll)) """ reverse_case_fold_sorted = """ my_list = sorted(my_list,key=str.casefold,reverse=True) """Next, we can test each solution using the timeit library: >>> import timeit >>> min(timeit.repeat(stmt=brute_force, setup=setup, repeat=10)) 2.4897978000003604 >>> min(timeit.repeat(stmt=generic_sort, setup=setup, repeat=10)) 0.08845160000009855 >>> min(timeit.repeat(stmt=case_fold_sort, setup=setup, repeat=10)) 0.40834640000002764 >>> min(timeit.repeat(stmt=generic_sorted, setup=setup, repeat=10)) 0.1804069999998319 >>> min(timeit.repeat(stmt=case_fold_sorted, setup=setup, repeat=10)) 0.5034002000002147 >>> min(timeit.repeat(stmt=locale_sorted, setup=setup, repeat=10)) 1.0272592000001168 >>> min(timeit.repeat(stmt=reverse_case_fold_sorted, setup=setup, repeat=10)) 0.5373070999999072And, there we have it! Apparently, the generic sort method is quite fast. If youre comfortable with the natural ordering of strings, thats definitely the way to go. Of course, dont try to write your own sorting algorithm! Look how slow our brute force implementation is in comparison to all other solutions. Were talking two orders of magnitude slower than the built in sort method. Now, thats slow. A Little RecapAt this point, weve covered several ways to sort a list of strings. Lets take another look: my_list = ["leaf", "cherry", "fish"] # Brute force method using bubble sort my_list = ["leaf", "cherry", "fish"] size = len(my_list) for i in range(size): for j in range(size): if my_list[i] < my_list[j]: temp = my_list[i] my_list[i] = my_list[j] my_list[j] = temp # Generic list sort *fastest* my_list.sort() # Casefold list sort my_list.sort(key=str.casefold) # Generic list sorted my_list = sorted(my_list) # Custom list sort using casefold (>= Python 3.3) my_list = sorted(my_list,key=str.casefold) # Custom list sort using current locale import locale from functools import cmp_to_key my_list = sorted(my_list,key=cmp_to_key(locale.strcoll)) # Custom reverse list sort using casefold (>=Python3.3) my_list = sorted(my_list,key=str.casefold,reverse=True)And, thats it! I hope you enjoyed this article, and perhaps you even found it useful. If so, why not become a member? That way, youll always be up to date with the latest The Renegade Coder content. Once again, you can also support the site by making purchases on Amazon through the following affiliate links:
While I havent personally used these resources, I can say that I put quite a bit of research into finding products that I believe will benefit you. While youre here, check out some of these other Python articles:
As always, thanks for taking time to support the site. Also, special thanks to all my patrons who continue to support my work. See ya next time! How to Python (41 Articles)Series NavigationThe How to Python tutorial series strays from the usual in-depth coding articles by exploring byte-sized problems in Python. In this series, students will dive into unique topics such as How to Invert a Dictionary, How to Sum Elements of Two Lists, and How to Check if a File Exists. Each problem is explored from the naive approach to the ideal solution. Occasionally, therell be some just-for-fun solutions too. At the end of every article, youll find a recap full of code snippets for your own use. Dont be afraid to take what you need! If youre not sure where to start, I recommend checking out our list of Python Code Snippets for Everyday Problems. In addition, you can find some of the snippets in a Jupyter notebook format on GitHub If you have a problem of your own, feel free to ask. Someone else probably has the same problem. Enjoy How to Python! Previous Post: [#9] [#11]: Next Post Tweet |