Top List List

Is list comprehension a good practice?

Many simple for loops in Python can be replaced with list comprehensions. You can often hear that list comprehension is more Pythonic [almost as if there was a scale for comparing how Pythonic something is, compared to something else ]. In this article, I will compare their performance and discuss when a list comprehension is a good idea, and when its not.

Filter a list with a for loop

Lets use a simple scenario for a loop operation - we have a list of numbers, and we want to remove the odd ones. One important thing to keep in mind is that we cant remove items from a list as we iterate over it. Instead, we have to create a new one containing only the even numbers:

# filter_list.py MILLION_NUMBERS = list[range[1_000_000]] def for_loop[]: output = [] for element in MILLION_NUMBERS: if not element % 2: output.append[element] return output

if not element % 2 is equivalent to if element % 2 == 0, but its slightly faster. I will write a separate article about comparing boolean values soon.

Lets measure the execution time of this function. Im using Python 3.8 for benchmarks [you can read about the whole setup in the Introduction article]:

$ python -m timeit -s "from filter_list import for_loop" "for_loop[]" 5 loops, best of 5: 65.4 msec per loop

It takes 65 milliseconds to filter a list of one million elements. How fast will a list comprehension deal with the same task?

Filter a list with list comprehension

# filter_list.py MILLION_NUMBERS = list[range[1_000_000]] def list_comprehension[]: return [number for number in MILLION_NUMBERS if not number % 2]

$ python -m timeit -s "from filter_list import list_comprehension" "list_comprehension[]" 5 loops, best of 5: 44.5 msec per loop

For loop is around 50% slower than a list comprehension [65.4/44.51.47]. And we just reduced five lines of code to one line! Cleaner and faster code? Great!

Can we make it better?

Filter a list with the filter function

Python has a built-in filter function for filtering collections of elements. This sounds like a perfect use case for our problem, so lets see how fast it will be.

# filter_list.py MILLION_NUMBERS = list[range[1_000_000]] def filter_function[]: return filter[lambda x: not x % 2, MILLION_NUMBERS]

$ python -m timeit -s "from filter_list import filter_function" "filter_function[]" 1000000 loops, best of 5: 284 nsec per loop

284 nanoseconds?! Thats suspiciously fast! It turns out that the filter function returns an iterator. It doesnt immediately go over one million elements, but it will return the next value when we ask for it. To get all the results at once, we can convert this iterator to a list.

# filter_list.py MILLION_NUMBERS = list[range[1_000_000]] def filter_return_list[]: return list[filter[lambda x: not x % 2, MILLION_NUMBERS]]

$ python -m timeit -s "from filter_list import filter_return_list" "filter_return_list[]" 2 loops, best of 5: 104 msec per loop

Now, its performance is not so great anymore. Its 133% slower than the list comprehension [104/44.52.337] and 60% slower than the for loop [104/65.41.590].

While, in this case, its not the best solution, an iterator is an excellent alternative to a list comprehension when we dont need to have all the results at once. If it turns out that we only need to get a few elements from the filtered list, an iterator will be a few orders of magnitude faster than other non-lazy solutions.

filterfalse[]

We could use the filterfalse[] function from the itertools library to simplify the filtering condition. filterfalse returns the opposite elements than filter. It picks those elements that evaluate to False. Unfortunately, it doesn't make any difference when it comes to performance:

from itertools import filterfalse def filterfalse_list[]: return list[filterfalse[lambda x: x % 2, MILLION_NUMBERS]]$ python -m timeit -s "from filter_list import filterfalse_list" "filterfalse_list[]" 2 loops, best of 5: 103 msec per loop

More than one operation in the loop

List comprehensions are often faster and easier to read, but they have one significant limitation. What happens if you want to execute more than one simple instruction? List comprehension cant accept multiple statements [without sacrificing readability]. But in many cases, you can wrap those multiple statements in a function.

Lets use a slightly modified version of the famous Fizz Buzz program as an example. We want to iterate over a list of elements and for each of them return:

fizzbuzz if the number can be divided by 3 and 5
fizz if the number can be divided by 3
buzz if the number can be divided by 5
the number itself, if it cant be divided by 3 or 5

Here is a simple solution:

# filter_list.py def fizz_buzz[]: output = [] for number in MILLION_NUMBERS: if number % 3 == 0 and number % 5 == 0: output.append['fizzbuzz'] elif number % 3 == 0: output.append['fizz'] elif number % 5 == 0: output.append['buzz'] else: output.append[number] return output

Here is the list comprehension equivalent of the fizz_buzz[]:

['fizzbuzz' if x % 3 == 0 and x % 5 == 0 else 'fizz' if x % 3 == 0 else 'buzz' if x % 5 == 0 else x for x in MILLION_NUMBERS]

Its not easy to read - at least for me. It gets better if we split it into multiple lines:

[ "fizzbuzz" if x % 3 == 0 and x % 5 == 0 else "fizz" if x % 3 == 0 else "buzz" if x % 5 == 0 else x for x in MILLION_NUMBERS ]

But if I see a list comprehension that spans multiple lines, I try to refactor it. We can extract the if statements into a separate function:

# filter_list.py def transform[number]: if number % 3 == 0 and number % 5 == 0: return 'fizzbuzz' elif number % 3 == 0: return 'fizz' elif number % 5 == 0: return 'buzz' return number def fizz_buzz2[]: output = [] for number in MILLION_NUMBERS: output.append[transform[number]] return output

Now its trivial to turn it into a list comprehension. And we get the additional benefit of a nice separation of logic into a function that does the fizz buzz check and a function that actually iterates over a list of numbers and applies the fizz buzz transformation.

Here is the improved list comprehension:

def fizz_buzz2_comprehension[]: return [transform[number] for number in MILLION_NUMBERS]

Lets compare all three versions:

$ python -m timeit -s "from filter_list import fizz_buzz" "fizz_buzz[]" 2 loops, best of 5: 191 msec per loop $ python -m timeit -s "from filter_list import fizz_buzz2" "fizz_buzz2[]" 1 loop, best of 5: 285 msec per loop $ python -m timeit -s "from filter_list import fizz_buzz2_comprehension" "fizz_buzz2_comprehension[]" 1 loop, best of 5: 224 msec per loop

Extracting a separate function adds some overhead. List comprehension with a separate transform[] function is around 17% slower than the initial for loop-based version [224/1911.173]. But its much more readable, so I prefer it over the other solutions.

And, if you are curious, the one-line list comprehension mentioned before is the fastest solution:

def fizz_buzz_comprehension[]: return [ "fizzbuzz" if x % 3 == 0 and x % 5 == 0 else "fizz" if x % 3 == 0 else "buzz" if x % 5 == 0 else x for x in MILLION_NUMBERS ]

$ python -m timeit -s "from filter_list import fizz_buzz_comprehension" "fizz_buzz_comprehension[]" 2 loops, best of 5: 147 msec per loop

Fastest, but also harder to read. If you run this code through a code formatter like black [which is a common practice in many projects], it will further obfuscate this function:

[ "fizzbuzz" if x % 3 == 0 and x % 5 == 0 else "fizz" if x % 3 == 0 else "buzz" if x % 5 == 0 else x for x in MILLION_NUMBERS ]

There is nothing wrong with black here - we are simply putting too much logic inside the list comprehension. If I had to say what the above code does, it would take me much longer to figure it out than if I had two separate functions. Saving a few hundred milliseconds of execution time and adding a few seconds of reading time doesnt sound like a good trade-off .

Clever one-liners can impress some recruiters during code interviews. But in real life, separating logic into different functions makes it much easier to read and document your code. And, statistically, we read more code than we write.

Conclusions

List comprehensions are often not only more readable but also faster than using for loops. They can simplify your code, but if you put too much logic inside, they will instead become harder to read and understand.

Even though list comprehensions are popular in Python, they have a specific use case: when you want to perform some operations on a list and return another list. And they have limitations - you cant break out of a list comprehension or put comments inside. In many cases, for loops will be your only choice.

I only scratched the surface of how useful list comprehension [or any other type of comprehension in Python] can be. If you want to learn more, Trey Hunner has many excellent articles and talks on this subject [for example, this one for beginners].

Filter a list with a for loop

Filter a list with list comprehension

Filter a list with the filter function

More than one operation in the loop

Conclusions

Video liên quan

Bài Viết Liên Quan

Toplist mới

Bài mới nhất

Chủ Đề