Iterables, Iterators and Generators: part 1

Ian Ward · February 12, 2013

This is part one of a talk I gave January 24, 2013 at the Ottawa Python Authors Group

Part Two is now also available.

Both parts of this presentation are also available as a single IPython Notebook which you can download and run locally, or view with nbviewer.ipython.org. The complete source is available at https://github.com/wardi/iterables-iterators-generators

A Gentle Introduction

The first few examples are from Ned Bachelder’s Python Iteration Talk http://bit.ly/pyiter

Coming from another language you might find it natural to create a counter and increment it to iterate over a list in Python

my_list = [17, 23, 47, 51, 101, 173, 999, 1001]

i = 0
while i < len(my_list):
    v = my_list[i]
    print(v)
    i += 1
17
23
47
51
101
173
999
1001

You might also have been told about range() which almost lets you write a C-style for loop

for i in range(len(my_list)):
    v = my_list[i]
    print(v)
17
23
47
51
101
173
999
1001

But neither of the above are natural ways of iterating in python. We do this instead:

for v in my_list:
    print(v)
17
23
47
51
101
173
999
1001

Many types of objects may be iterated over this way. Iterating over strings produces single characters:

for v in "Hello":
    print(v)
H
e
l
l
o

Iterating over a dict produces its keys (in no particular order):

d = {
    'a': 1,
    'b': 2,
    'c': 3,
    }

for v in d:
    print(v)
# Note the strange order!
a
c
b

Iterating over a file object produces lines from that file, including the line termination:

f = open("suzuki.txt")
for line in f:
    print(">", line)
> On education

> "Education has failed in a very serious way to convey the most important lesson science can teach: skepticism."

> "An educational system isn't worth a great deal if it teaches young people how to make a living but doesn't teach them how to make a life."

Objects that can be iterated over in python are called “Iterables”, and a for loop isn’t the only thing that accepts iterables.

The list constructor takes any iterable. We can use this to make a list of the keys in a dict:

list(d)
['a', 'c', 'b']

Or the characters in a string:

list("Hello")
['H', 'e', 'l', 'l', 'o']

List comprehensions take iterables.

ascii = [ord(x) for x in "Hello"]
ascii
[72, 101, 108, 108, 111]

The sum() function takes any iterable that produces numbers.

sum(ascii)
500

The str.join() method takes any iterable that produces strings.

"-".join(d)
'a-c-b'

Iterables can produce any python object. The re.finditer() function returns an iterable that produces re.match objects.

import re
suzuki = open("suzuki.txt").read()
for match in re.finditer(r'\bs\w+', suzuki):
    print(match.group(0))
serious
science
skepticism
system

A “classic” iterable class

Very old versions of Python supported iteration through the __getitem__() method, and this is still supported.

class Lucky(object):
    def __getitem__(self, index):
        if index > 3:
            raise IndexError
        return 7

Lucky is a class that will return 7 for any index less than or equal to 3:

lucky = Lucky()

lucky[0]
7

And raise an IndexError for larger indexes:

lucky[6]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/home/ian/git/iterables-iterators-generators/<ipython-input-15-5c0a87559915> in <module>()
----> 1 lucky[6]

/home/ian/git/iterables-iterators-generators/<ipython-input-13-bd578dbfbead> in __getitem__(self, index)
      2     def __getitem__(self, index):
      3         if index > 3:
----> 4             raise IndexError
      5         return 7

IndexError:

This is a perfectly well-behaved python iterable. We can loop over it:

for number in lucky:
    print(number)
7
7
7
7

Or pass it to functions that take iterables:

list(lucky)
[7, 7, 7, 7]

Even the in operator works with it:

7 in lucky
True

But writing this sort of class it difficult. You need to be able to return a value for any index passed to __getitem__() but most of the time you really only want to produce items in order from first to last.

Enter “Iterators”.

Iterators

The naming is confusingly similar here, but it’s important to understand the difference between iterables and iterators.

Iterators are iterables with some kind of ‘position’ state and a next() method. The next() method may be called to produce the next item and update the internal state.

Iterables are objects that produce an iterator when they are passed to the iter() builtin.

Iterators are Iterables with next()

Calling iter() on our “classic” iterable object produces a plain iterator instance

i = iter(lucky)
i
<iterator at 0x288cc10>

This plain iterator has a counter and the original object as its internal state. Calling next() advances the counter and calls our “classic” iterable’s .__getitem__() method.

print(next(i))
print(next(i))
print(next(i))
print(next(i))
7
7
7
7

When we get to the end however, our IndexError exception is turned into a StopIteration exception. This is the iterator protocol: when next() raises StopIteration there are no more items to be produced.

print(next(i)) # raises StopIteration, *not* IndexError
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
/home/ian/git/iterables-iterators-generators/<ipython-input-21-b3e04e043095> in <module>()
----> 1 print(next(i)) # raises StopIteration, *not* IndexError

StopIteration:

Remember that an iterator is an iterable, so it can be passed to anything that takes an iterable.

Be careful, though. Iterators may only be iterated over once, since they are updating their internal state as they go. If we try to iterate twice, the second time will produce no more items:

i = iter(lucky)
print(list(i))
print(list(i))
[7, 7, 7, 7]
[]

Calling on iter() on an iterable will produce a different iterator object each time.

i is iter(lucky)
False

Also, like other iterables, calling iter() on an iterator works, but it behaves differently!

Calling iter() on an iterator typically returns the exact same iterator. If you think about it, that’s all that can be done because you can’t rewind or duplicate an iterator in the general case.

i is iter(i)
True

Iterators come in all shapes and sizes.

xrange() has a rangeiterator:

iter(xrange(20))
<rangeiterator at 0x2896900>

dict has a dictionary-keyiterator:

iter({'a': 1, 'b': 2})
<dictionary-keyiterator at 0x2894aa0>

list doesn’t even use the plain iterator type, and instead uses its own more efficient listiterator:

iter([4, 5, 6])
<listiterator at 0x28973d0>

And some have names that provide no clue what they iterate over:

re.finditer(r'\bs\w+', "some text with swords")
<callable-iterator at 0x28971d0>

A better iterable class

You can choose the iterator that will be returned by iter() by defining your own .__iter__() method:

class Countdown(object):
    def __iter__(self): # must return an iterator!
        return iter([5, 4, 3, 2, 1, 'launch'])

The for loop and other places that take iterables internally use iter(), which calls our new .__iter__() method to create an iterator:

for n in Countdown():
    print(n)
5
4
3
2
1
launch

Iterators the hard way

The example above is fine if we want to reuse an existing iterator (like the listiterator above), but what if we want to write a new iterator?

We know the protocol, so one approach is to just implement it:

class CountdownIterator(object):
    def __init__(self):
        self._remaining = [5, 4, 3, 2, 1, 'launch']

    def __iter__(self):
        return self

    def __next__(self):
        if not self._remaining:
            raise StopIteration
        return self._remaining.pop(0)

Our internal ‘position’ state is a list of items we pop one at a time when .next() is called. We implement .__iter__() in the normal way for an iterator: "return self". We raise StopIteration when we have nothing left to produce.

This works as expected, but it’s rather a lot of code for a simple result.

for n in CountdownIterator():
    print(n)
5
4
3
2
1
launch

Generators

A generator function is a simpler way to create an iterator.

Generator functions let you use local variables and the position of the program counter as state for a generator object. A new generator object is created and returned each time you call a generator function. The generator object is an iterator.

Generators are Iterators created with a generator function or expression

Here is a generator function that only uses the program counter for state:

def countdown_generator():
    yield 5
    yield 4
    yield 3
    yield 2
    yield 1
    yield 'launch'

When we call the generator function it does nothing except create a new generator object. None of the code in the generator function has been executed yet.

countdown_generator()
<generator object countdown_generator at 0x289d0f0>

As the generator object is iterated over execution starts, following the generator function definition until the next yield statement.

When it reaches the yield statement execution is paused (the program counter is stored) and the value on the right of the yield statement is produced as a value from the generator object. Execution is resumed from the stored program counter position when iteration continues.

When the generator function reaches the end, the generator raises a StopIteration exception just like a normal iterator. And it behaves just like a normal iterator:

for n in countdown_generator():
    print(n)
5
4
3
2
1
launch

Now we have a much more concise way of defining our own iterator for an iterable class. The .__iter__() method of our class can be written as a generator function:

class Countdown(object):
    def __iter__(self):
        for n in [5, 4, 3, 2, 1, 'launch']:
            yield n
for n in Countdown():
    print(n)
5
4
3
2
1
launch

But, enough about classes. Let’s dig further into how these generators work.

Recall that execution of the code in a generator function does not proceed until the generator object returned is iterated over. That lets us put things in a generator that might be expensive, knowing that we will only have to pay that cost when we actually ask it to produce the next item.

This generator causes a for loop to slow down between iterations. First waiting 5 seconds, then counting down from “5” to “1” with 1 seconds intervals in between:

import time

def slow_generator():
    time.sleep(5)
    yield 5
    time.sleep(1)
    yield 4
    time.sleep(1)
    yield 3
    time.sleep(1)
    yield 2
    time.sleep(1)
    yield 1
    time.sleep(1)

print("starting")
for n in slow_generator():
    print n
print("done")
starting
5
4
3
2
1
done

Another way of writing this code is to turn the generator inside-out.

Instead of sleeping inside the generator we can yield the amount of time we want to sleep. And instead of yield-ing the countdown we can use a function passed in to display values to the user.

def countdown_generator(fn):
    yield 5
    fn(5)
    yield 1
    fn(4)
    yield 1
    fn(3)
    yield 1
    fn(2)
    yield 1
    fn(1)
    yield 1

A show() function takes the place of the print inside the loop, and the time.sleep() call is done by the code iterating over the generator. This puts the code driving the generator in charge of how (or if) it sleeps for the given time.

def show(n):
    print(n)

print("starting")
for s in countdown_generator(show):
    time.sleep(s)
print("done")
starting
5
4
3
2
1
done

Generators as coroutines

While a generator object is an iterator, it can also be used for much more.

When paused at a yield statement generator objects can receive data by using .send() instead of next().

When we use yield as an expression or assign it to a variable, the value passed to .send() is available inside the generator.

def knock_knock():
    name = yield "Who's there?"
    yield "%s who?" % name
    yield "That's not funny at all"

We have to switch to manually calling next() on our generator object, because a for loop or function that takes an iterable won’t be able to call .send() when we need to.

k = knock_knock()
next(k)
"Who's there?"

At this point execution is paused at the first yield. The assignment to the variable name hasn’t happened yet. But when we .send() a value execution continues:

k.send("David")
'David who?'

And in the generator object we are at the second yield with "David" assigned to name.

If we send something to a yield that isn’t being used as an expression, the value we send will be ignored:

k.send("David the environmentalist")
"That's not funny at all"

But execution continues the same as if we called next().

This is the end of part 1.

In part 2 we build a simple interactive network game with 90% of the code written as generators. I will show how breaking down asynchronous code into generators can make it easy to test, easy to reuse, and (with some practice) easy to understand.