This is part one of a talk I gave January 24, 2013 at the Ottawa Python Authors Group
Part Two is now also available.
Both parts of this presentation are also available as a single IPython Notebook which you can download and run locally, or view with nbviewer.ipython.org. The complete source is available at https://github.com/wardi/iterables-iterators-generators
A Gentle Introduction
The first few examples are from Ned Bachelder’s Python Iteration Talk http://bit.ly/pyiter
Coming from another language you might find it natural to create a counter and increment it to iterate over a list
in Python
my_list = [17, 23, 47, 51, 101, 173, 999, 1001]
i = 0
while i < len(my_list):
v = my_list[i]
print(v)
i += 1
17
23
47
51
101
173
999
1001
You might also have been told about range()
which almost lets you write a C-style for loop
for i in range(len(my_list)):
v = my_list[i]
print(v)
17
23
47
51
101
173
999
1001
But neither of the above are natural ways of iterating in python. We do this instead:
for v in my_list:
print(v)
17
23
47
51
101
173
999
1001
Many types of objects may be iterated over this way. Iterating over strings produces single characters:
for v in "Hello":
print(v)
H
e
l
l
o
Iterating over a dict
produces its keys (in no particular order):
d = {
'a': 1,
'b': 2,
'c': 3,
}
for v in d:
print(v)
# Note the strange order!
a
c
b
Iterating over a file object produces lines from that file, including the line termination:
f = open("suzuki.txt")
for line in f:
print(">", line)
> On education
> "Education has failed in a very serious way to convey the most important lesson science can teach: skepticism."
> "An educational system isn't worth a great deal if it teaches young people how to make a living but doesn't teach them how to make a life."
Objects that can be iterated over in python are called “Iterables”, and a for loop isn’t the only thing that accepts iterables.
The list constructor takes any iterable. We can use this to make a list of the keys in a dict
:
list(d)
['a', 'c', 'b']
Or the characters in a string:
list("Hello")
['H', 'e', 'l', 'l', 'o']
List comprehensions take iterables.
ascii = [ord(x) for x in "Hello"]
ascii
[72, 101, 108, 108, 111]
The sum()
function takes any iterable that produces numbers.
sum(ascii)
500
The str.join()
method takes any iterable that produces strings.
"-".join(d)
'a-c-b'
Iterables can produce any python object. The re.finditer()
function returns an iterable that produces re.match
objects.
import re
suzuki = open("suzuki.txt").read()
for match in re.finditer(r'\bs\w+', suzuki):
print(match.group(0))
serious
science
skepticism
system
A “classic” iterable class
Very old versions of Python supported iteration through the __getitem__()
method, and this is still supported.
class Lucky(object):
def __getitem__(self, index):
if index > 3:
raise IndexError
return 7
Lucky
is a class that will return 7
for any index less than or equal to 3
:
lucky = Lucky()
lucky[0]
7
And raise an IndexError
for larger indexes:
lucky[6]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
/home/ian/git/iterables-iterators-generators/<ipython-input-15-5c0a87559915> in <module>()
----> 1 lucky[6]
/home/ian/git/iterables-iterators-generators/<ipython-input-13-bd578dbfbead> in __getitem__(self, index)
2 def __getitem__(self, index):
3 if index > 3:
----> 4 raise IndexError
5 return 7
IndexError:
This is a perfectly well-behaved python iterable. We can loop over it:
for number in lucky:
print(number)
7
7
7
7
Or pass it to functions that take iterables:
list(lucky)
[7, 7, 7, 7]
Even the in
operator works with it:
7 in lucky
True
But writing this sort of class it difficult. You need to be able to return a value for any index passed to __getitem__()
but most of the time you really only want to produce items in order from first to last.
Enter “Iterators”.
Iterators
The naming is confusingly similar here, but it’s important to understand the difference between iterables and iterators.
Iterators are iterables with some kind of ‘position’ state and a next()
method. The next()
method may be called to produce the next item and update the internal state.
Iterables are objects that produce an iterator when they are passed to the iter()
builtin.
Calling iter()
on our “classic” iterable object produces a plain iterator
instance
i = iter(lucky)
i
<iterator at 0x288cc10>
This plain iterator
has a counter and the original object as its internal state. Calling next()
advances the counter and calls our “classic” iterable’s .__getitem__()
method.
print(next(i))
print(next(i))
print(next(i))
print(next(i))
7
7
7
7
When we get to the end however, our IndexError
exception is turned into a StopIteration
exception. This is the iterator protocol: when next()
raises StopIteration
there are no more items to be produced.
print(next(i)) # raises StopIteration, *not* IndexError
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
/home/ian/git/iterables-iterators-generators/<ipython-input-21-b3e04e043095> in <module>()
----> 1 print(next(i)) # raises StopIteration, *not* IndexError
StopIteration:
Remember that an iterator is an iterable, so it can be passed to anything that takes an iterable.
Be careful, though. Iterators may only be iterated over once, since they are updating their internal state as they go. If we try to iterate twice, the second time will produce no more items:
i = iter(lucky)
print(list(i))
print(list(i))
[7, 7, 7, 7]
[]
Calling on iter()
on an iterable will produce a different iterator object each time.
i is iter(lucky)
False
Also, like other iterables, calling iter()
on an iterator works, but it behaves differently!
Calling iter()
on an iterator typically returns the exact same iterator. If you think about it, that’s all that can be done because you can’t rewind or duplicate an iterator in the general case.
i is iter(i)
True
Iterators come in all shapes and sizes.
xrange()
has a rangeiterator
:
iter(xrange(20))
<rangeiterator at 0x2896900>
dict
has a dictionary-keyiterator
:
iter({'a': 1, 'b': 2})
<dictionary-keyiterator at 0x2894aa0>
list
doesn’t even use the plain iterator
type, and instead uses its own more efficient listiterator
:
iter([4, 5, 6])
<listiterator at 0x28973d0>
And some have names that provide no clue what they iterate over:
re.finditer(r'\bs\w+', "some text with swords")
<callable-iterator at 0x28971d0>
A better iterable class
You can choose the iterator that will be returned by iter()
by defining your own .__iter__()
method:
class Countdown(object):
def __iter__(self): # must return an iterator!
return iter([5, 4, 3, 2, 1, 'launch'])
The for
loop and other places that take iterables internally use iter()
, which calls our new .__iter__()
method to create an iterator:
for n in Countdown():
print(n)
5
4
3
2
1
launch
Iterators the hard way
The example above is fine if we want to reuse an existing iterator (like the listiterator
above), but what if we want to write a new iterator?
We know the protocol, so one approach is to just implement it:
class CountdownIterator(object):
def __init__(self):
self._remaining = [5, 4, 3, 2, 1, 'launch']
def __iter__(self):
return self
def __next__(self):
if not self._remaining:
raise StopIteration
return self._remaining.pop(0)
Our internal ‘position’ state is a list of items we pop one at a time when .next()
is called. We implement .__iter__()
in the normal way for an iterator: "return self"
. We raise StopIteration
when we have nothing left to produce.
This works as expected, but it’s rather a lot of code for a simple result.
for n in CountdownIterator():
print(n)
5
4
3
2
1
launch
Generators
A generator function is a simpler way to create an iterator.
Generator functions let you use local variables and the position of the program counter as state for a generator object. A new generator object is created and returned each time you call a generator function. The generator object is an iterator.
Here is a generator function that only uses the program counter for state:
def countdown_generator():
yield 5
yield 4
yield 3
yield 2
yield 1
yield 'launch'
When we call the generator function it does nothing except create a new generator object. None of the code in the generator function has been executed yet.
countdown_generator()
<generator object countdown_generator at 0x289d0f0>
As the generator object is iterated over execution starts, following the generator function definition until the next yield
statement.
When it reaches the yield
statement execution is paused (the program counter is stored) and the value on the right of the yield
statement is produced as a value from the generator object. Execution is resumed from the stored program counter position when iteration continues.
When the generator function reaches the end, the generator raises a StopIteration
exception just like a normal iterator. And it behaves just like a normal iterator:
for n in countdown_generator():
print(n)
5
4
3
2
1
launch
Now we have a much more concise way of defining our own iterator for an iterable class. The .__iter__()
method of our class can be written as a generator function:
class Countdown(object):
def __iter__(self):
for n in [5, 4, 3, 2, 1, 'launch']:
yield n
for n in Countdown():
print(n)
5
4
3
2
1
launch
But, enough about classes. Let’s dig further into how these generators work.
Recall that execution of the code in a generator function does not proceed until the generator object returned is iterated over. That lets us put things in a generator that might be expensive, knowing that we will only have to pay that cost when we actually ask it to produce the next item.
This generator causes a for loop to slow down between iterations. First waiting 5 seconds, then counting down from “5” to “1” with 1 seconds intervals in between:
import time
def slow_generator():
time.sleep(5)
yield 5
time.sleep(1)
yield 4
time.sleep(1)
yield 3
time.sleep(1)
yield 2
time.sleep(1)
yield 1
time.sleep(1)
print("starting")
for n in slow_generator():
print n
print("done")
starting
5
4
3
2
1
done
Another way of writing this code is to turn the generator inside-out.
Instead of sleeping inside the generator we can yield
the amount of time we want to sleep. And instead of yield
-ing the countdown we can use a function passed in to display values to the user.
def countdown_generator(fn):
yield 5
fn(5)
yield 1
fn(4)
yield 1
fn(3)
yield 1
fn(2)
yield 1
fn(1)
yield 1
A show()
function takes the place of the print inside the loop, and the time.sleep()
call is done by the code iterating over the generator. This puts the code driving the generator in charge of how (or if) it sleeps for the given time.
def show(n):
print(n)
print("starting")
for s in countdown_generator(show):
time.sleep(s)
print("done")
starting
5
4
3
2
1
done
Generators as coroutines
While a generator object is an iterator, it can also be used for much more.
When paused at a yield
statement generator objects can receive data by using .send()
instead of next()
.
When we use yield
as an expression or assign it to a variable, the value passed to .send()
is available inside the generator.
def knock_knock():
name = yield "Who's there?"
yield "%s who?" % name
yield "That's not funny at all"
We have to switch to manually calling next()
on our generator object, because a for
loop or function that takes an iterable won’t be able to call .send()
when we need to.
k = knock_knock()
next(k)
"Who's there?"
At this point execution is paused at the first yield
. The assignment to the variable name
hasn’t happened yet. But when we .send()
a value execution continues:
k.send("David")
'David who?'
And in the generator object we are at the second yield
with "David"
assigned to name
.
If we send something to a yield
that isn’t being used as an expression, the value we send will be ignored:
k.send("David the environmentalist")
"That's not funny at all"
But execution continues the same as if we called next()
.
This is the end of part 1.
In part 2 we build a simple interactive network game with 90% of the code written as generators. I will show how breaking down asynchronous code into generators can make it easy to test, easy to reuse, and (with some practice) easy to understand.