excess.org

Ian Ward

Software
CKAN contributor/tech lead
Urwid author
PyRF contributor
Speedometer author

Presentations
Contributing to Open Source
IASA E-Summit, 2014-05-16
Urwid Applications
2012-11-14
Urwid Intro
2012-01-22
Unfortunate Python
2011-12-19
Django 1.1
2009-05-16

Writing
Moving to Python 3
2011-02-17
Article Tags

Home

Ian Ward's email:
first name at this domain

wardi on OFTC, freenode and github

Locations of visitors to this page

The Light and Dark sides of Python name binding

Posted on 2014-04-04, last modified 2014-04-09.

What does bar.foo do? - An exploration of Python assignment, objects, attributes and descriptors

This is a talk I gave March 27, 2014 at the Ottawa Python Authors Group meetup.

Follow along with the IPython notebook version if you would like to play with the code examples.

1. Names and values

First, what does foo do?

Better question: how can we make foo do something we want?

1.1. Simple assignment

Before assigment accessing a name produces a NameError.

foo
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/home/ian/git/bar_foo/<ipython-input-1-d3b07384d113> in <module>()
----> 1 foo

NameError: name 'foo' is not defined

Let's fix the error with a simple assignment statement.

foo = 'tesla'
foo
'tesla'

As expected the value we assigned is now available when using the name. To return to our original state we can unbind the name using del.

del foo
foo
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/home/ian/git/bar_foo/<ipython-input-137-16af1079f68c> in <module>()
      1 del foo
----> 2 foo

NameError: name 'foo' is not defined

1.2. def

Python includes many assignment statements. The next most common is used when defining functions.

def foo():
    return 42

foo
<function __main__.foo>

A function object was created and assigned to a name with the def keyword. This function object is just like any other value in Python. It can be assigned to another name with a simple assignment statement. This new name now works identically to the original.

foo2 = foo
foo2()
42

We can verify that the two values are the exact same object.

foo2 is foo
True

1.3. class

Class creation in python assigns a class to a name.

class Foo(object):
    pass

Foo
__main__.Foo

We used a capitalized name because that is the standard way to name classes in Python code. There is nothing significant about the capitalization but following the standard makes our code easier to understand.

1.4. for..in

For loops bind names that continue to be available outside the for block. This is often useful when searching for the first object that passes a test. It is also easy to accidentally override a name defined earlier, so be careful.

for foo in [1, 2, 3]:
    pass

foo
3

1.5. import

Importing modules or values from modules binds those modules or values to names.

import sys as foo

foo
<module 'sys' (built-in)>

1.6. try..except

Exception variables are available outside the exception block. This is rarely useful, but be careful about accidentally overriding existing names.

try:
    1/0
except ZeroDivisionError as foo:
    pass

foo
ZeroDivisionError('integer division or modulo by zero')

1.7. with

Context manager variables are available outside their blocks as well.

with open('/tmp/things', 'w') as foo:
    pass

foo
<closed file '/tmp/things', mode 'w' at 0x2ca39c0>

1.8. Decorators

Function and class decorators replace the value assigned by def and class blocks.

This (unusual) example replaces a function with a string.

def serious_decorator(f):
    return str(f) + ' is much too silly'

@serious_decorator
def foo():
    return 42

foo
'<function foo at 0x2d68758> is much too silly'

1.9. Evil ways to bind names

There are ways of binding names that should never be used in normal Python programs.

1.9.1. Evil use of vars, locals, globals

Local and global variables may be accessed and assigned through dictionaries returned by vars, locals or globals. Except for debugging, don't ever do this.

vars()['foo'] = 'tesla'

foo
'tesla'

1.9.2. Evil use of list comprehensions (does not work in Python 3)

List comprehensions in Python 2 bind names like for statements. Be careful about the variable names you choose. Don't write code that relies on this behavior.

[None for foo in [1,2,3]]
[None, None, None]
foo
3

1.9.3. Evil stack frame manipulation

There are ways of accessing local variables from other stack frames. Never do this.

def im_in_ur_frame():
    import sys
    frame = sys._getframe(1)
    frame.f_locals['foo'] = 'writin ur vars'

im_in_ur_frame()

foo
'writin ur vars'

1.9.4. Evil use of import

Possibly worst of all: never import all the names from one module into another. This can make debugging and even just understanding code extremely difficult. It can also cause a new name added to that other module to break the module importing it.

from os.path import *

ismount
<function posixpath.ismount>

1.10. Name binding recap

We have eight non-evil ways to bind values to names in Python.

  1. assignment with =
  2. def
  3. class
  4. for..in
  5. import
  6. try..except
  7. with
  8. function/class decorators
  9. evil

2. Attribute access

Now, how can we make bar.foo do something we want?

2.1. Builtin objects with attributes

Some standard python objects have attributes we can access. Unfortunately, none are named foo.

bar = 2 + 3j

bar.imag
3.0
bar = open("/tmp/things", "w")

bar.closed
False
bar.close()

2.2. collections.namedtuple

The simplest choice for an object with a custom attribute is namedtuple. These objects are immutable, so they make it easier to reason about code. You can always pass them safely to other functions without worrying about them being modified. They also work well in with asynchronous and threaded programming.

from collections import namedtuple
Hotel = namedtuple('Hotel', 'foo baz')

bar = Hotel(42, 'shoe')

bar.foo
42
bar
Hotel(foo=42, baz='shoe')
bar[1]
'shoe'
bar.foo = 'tesla'
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/home/ian/git/bar_foo/<ipython-input-42-463460b3ac05> in <module>()
----> 1 bar.foo = 'tesla'

AttributeError: can't set attribute

2.3. Custom object attributes

If you can't use an immutable object like namedtuple, use a normal Python object instead.

2.3.1. Assigning object attributes (slightly unusual)

Simple assignment works with normal python objects. This example is unusual because it is more common to assign object attributes within a method of the class.

class Hotel(object):
    pass

bar = Hotel()

bar
<__main__.Hotel at 0x2d8d5d0>
bar.foo = 'tesla'

bar.foo
'tesla'

2.3.2. Assign attributes within a method

Assigning attributes during the initialization of an object is a common pattern.

class Motel(object):
    def __init__(self):
        self.foo = 'vespa'

bar = Motel()

bar
<__main__.Motel at 0x2d9e2d0>
bar.foo
'vespa'

2.3.3. Evil attribute assignment with setattr

Somewhat like accessing the dictionary of local variables, you can assign to attributes with names given as strings.

This is rarely a good idea. If you need an object with arbitrary keys it's better to use a dictionary.

class Martini(object):
    pass

bar = Martini()

setattr(bar, 'foo', 'twenty bucks')

bar.foo
'twenty bucks'

2.3.4. Evil attribute assignment via __dict__

Accessing __dict__ to set attributes is a bad idea for the same reasons.

class Hangout(object):
    pass

bar = Hangout()

bar.__dict__['foo'] = 'hipster'

bar.foo
'hipster'
bar.__dict__
{'foo': 'hipster'}

2.4. Class attributes

Python classes are objects too

2.4.1. Assigning class attributes (monkey patch)

Simple assignment to python classes can be used to change the behavior of a class and all its instances. This should be done with extreme care as it can lead to problems that are very hard to debug.

class Bar(object):
    pass

Bar
__main__.Bar
Bar.foo = 'Blanche de Chambly'

Bar.foo
'Blanche de Chambly'

2.4.2. Assign within the class definition

Simple assignment to names within the class definition is one of the normal ways to create class attributes.

class Bar(object):
    foo = 'Porter Baltique'

Bar
__main__.Bar
Bar.foo
'Porter Baltique'

2.4.3. Object attributes > class attributes

When accessing attributes of an object the object attribute hides the class attribute.

class Club(object):
    pass

bar = Club()

bar.foo = 'techno'
Club.foo = 'chocololate'

bar.foo
'techno'
del bar.foo

bar.foo
'chocololate'

2.4.4. Evil class assignment with setattr

We can use setattr to assign class attributes, but this will make your program much harder to understand.

class Dive(object):
    pass

setattr(Dive, 'foo', 'rusty nail')

Dive.foo
'rusty nail'

2.4.5. Assignment of class attributes, the wacky ways

All the other methods of binding a name work within a class definition as well. This is very strange Python code, but it does show that the language is consistent, at least.

class Bestiary(object):
    import sys as mod
    class Klass(object):
        pass
    for looper in [1, 2, 3]:
        pass
    try:
        1/0
    except ZeroDivisionError as exc:
        pass
    with open('/tmp/wat', 'w') as closedfile:
        pass

Bestiary.mod
<module 'sys' (built-in)>
Bestiary.Klass
__main__.Klass
Bestiary.looper
3
Bestiary.exc
ZeroDivisionError('integer division or modulo by zero')
Bestiary.closedfile
<closed file '/tmp/wat', mode 'w' at 0x2ca3a50>

2.5. Catch-all __getattr__ method

If we want to provide a value when an attribute name has not been defined we can use __getattr__. This can be useful for implementing proxies of objects or services. We don't need to list methods and attributes explicitly, but we do need to handle genuinely missing attributes by raising AttributeError.

class Lounge(object):
    def __getattr__(self, name):
        if name == 'foo':
            return 'lizard'
        raise AttributeError("%r object has no attribute %r"
            % (self.__class__.__name__, name))

bar = Lounge()

bar.foo
'lizard'
hasattr(bar, 'baz')
False

2.5.1. Object attribute > Class attribute > __getattr__

When accessing attributes both existing object and class attributes prevent calls to __getattr__.

Lounge.foo = 'larry'

bar.foo
'larry'
del Lounge.foo

bar.foo
'lizard'

2.6. MRO: Method (attribute) resolution order

Python classes support multiple inheritance. When searching for class attributes the method resolution order determines which attribute will be used.

class Structure(object):
    pass
class Dancing(Structure):
    pass
class Drinking(Structure):
    pass
class Bar(Dancing, Drinking):
    pass

Bar.__mro__
(__main__.Bar, __main__.Dancing, __main__.Drinking, __main__.Structure, object)
Structure.foo = 'bricks'
Drinking.foo = 'drinks'

Bar.foo
'drinks'

2.7. Order so far

  1. object attributes
  2. class attributes and non-data descriptors in type's MRO
  3. __getattr__ in type's MRO

3. Run code in response

We can run code in response to attribute access with more precision than a __getattr__ method.

3.1. Methods

What about Python methods? There is something going on here:

class Bar(object):
    def foo(self):
        pass

Bar.foo
<unbound method Bar.foo>
bar = Bar()

bar.foo
<bound method Bar.foo of <__main__.Bar object at 0x2d9e550>>

3.1.1. Methods, a closer look (very unusual code)

We can get a better idea of what is happening by creating our method outside of the class.

class Bar(object):
    pass

def foo(thing):
    pass

Bar.foo = foo

foo
<function __main__.foo>
Bar.foo
<unbound method Bar.foo>
Bar().foo
<bound method Bar.foo of <__main__.Bar object at 0x2d9eed0>>
foo is Bar.__dict__['foo']
True

This function is not being modified when it is assigned to the class, but when we access it as an attribute of an object or class we get a different value.

3.2. Non-data descriptors (won't be on the test)

The descriptor protocol is what makes this happen. We creating a non-data descriptor (a class with a __get__ method) and assign an instance of this class as an attribute of a second class. Now our non-data descriptor can decide what is returned when the attribute of the second class is accessed.

Don't worry if this seems complicated, it's extremely rare to need this feature of Python in normal programs. This feature exists just to avoid having to make bound methods and the staticmethod and classmethod decorators a core part of the Python language.

class AboutFoo(object):
    def __get__(self, instance, owner):
        return instance, owner

class Bar(object):
    foo = AboutFoo()

Bar.foo
(None, __main__.Bar)
bar = Bar()
bar.foo
(<__main__.Bar at 0x2e05110>, __main__.Bar)
bar.foo = 42
bar.foo
42

3.3. property

The standard way to execute code in response to an attribute access is to use a property.

class ReadOnly(object):
    def _get_foo(self):
        print("accessed foo!")
        return 42

    foo = property(_get_foo)

ReadOnly.foo
<property at 0x2d72f70>
bar = ReadOnly()
bar.foo
accessed foo!
42
bar.foo = 9
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/home/ian/git/bar_foo/<ipython-input-253-df4128335d67> in <module>()
----> 1 bar.foo = 9

AttributeError: can't set attribute

3.3.1. Setter

Properties can define setters as well, so we can implement read/write attribute access with custom function calls.

class ReadWrite(object):
    def _get_foo(self):
        return self._foo

    def _set_foo(self, value):
        print("just set foo to %r!" % value)
        self._foo = value

    foo = property(_get_foo, _set_foo)

bar = ReadWrite()
bar.foo
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/home/ian/git/bar_foo/<ipython-input-257-d0aa0af9ef79> in <module>()
     10
     11 bar = ReadWrite()
---> 12 bar.foo

/home/ian/git/bar_foo/<ipython-input-257-d0aa0af9ef79> in _get_foo(self)
      1 class ReadWrite(object):
      2     def _get_foo(self):
----> 3         return self._foo
      4
      5     def _set_foo(self, value):

AttributeError: 'ReadWrite' object has no attribute '_foo'
bar.foo = 9
just set foo to 9!
bar.foo
9

3.3.2. Prettier?

If you prefer, getters and setters also may be written as function decorators.

class Propertify(object):
    @property
    def foo(self):
        return self._foo + 1

    @foo.setter
    def foo(self, value):
        self._foo = value

bar = Propertify()
bar.foo = 2
bar.foo
3

3.3.2. Overriding del (for completeness, I guess)

If that wasn't enough you can also customize unbinding of attributes.

It's very strange to create an interface that involves unbinding an object's attributes as part of normal operation, so I suggest you avoid this one.

class Oddball(object):
    excuses = ["No", "I refuse", "Go away"]

    def _del_foo(self):
        print(self.excuses.pop(0))

    foo = property(fdel=_del_foo)

bar = Oddball()
del bar.foo
No
del bar.foo
I refuse
del bar.foo
Go away

3.4. Data descriptors (also not on the test)

Properties fall under the descriptor protocol as "data descriptors". Data descriptors are instances of classes that have __get__ and __set__ methods, assigned as class attributes of another class.

class ThisIsMyFoo(object):
    def __get__(self, instance, owner):
        return instance, owner
    def __set__(self, instance, value):
        print("tried to set %r value to %r" % (instance, value))

class Bar(object):
    foo = ThisIsMyFoo()

bar = Bar()
bar.foo
(<__main__.Bar at 0x2d9efd0>, __main__.Bar)
bar.foo = 42
tried to set <__main__.Bar object at 0x2d9efd0> value to 42

property is implemented as a data descriptor.

3.5. Order recap

Non-data descriptors are accessed just like class attributes. Data descriptors take a new position in the priority just before object attributes, although it is tricky to create an object that has both (we will see how this can be done later).

  1. data descriptors in type's MRO
  2. object attributes
  3. class attributes and non-data descriptors in type's MRO
  4. __getattr__ in type's MRO

4. Complete control

Occasionally it's useful to have a little more control.

4.1. Preemptive __getattribute__ method

Most of the attribute access logic, including all the special descriptor behavior is implemented in object.__getattribute__ and type.__getattribute__ (yes, confusingly similar to __getattr__).

If we override this method we get complete control of what happens when attributes are accessed. There is no similar method for complete control of assignment and unbinding, though.

Extra care must be taken in __getattribute__ implementation to fall back to the default implementation if you want your object to behave somewhat normally.

class Casino(object):
    def __getattribute__(self, name):
        if name == 'foo':
            return 'rat'
        return object.__getattribute__(self, name)

bar = Casino()
bar.foo
'rat'
bar.foo = 'mouse'
bar.foo
'rat'

4.2. PyPy's transparent proxy

In PyPy we can go even further. We can catch and modify every read/write/delete to an object attribute with a transparent proxy.

from tputil import make_proxy

history = []
def recorder(operation):
    history.append(operation)
    return operation.delegate()

o = make_proxy(recorder, obj=[])

This proxy appears as the exact same type as the original object but every operation may be intercepted and potentially modified.

4.3. Final order

  1. PyPy transparent proxy
  2. __getattribute__ in type's MRO
  3. data descriptors in type's MRO
  4. object attributes
  5. class attributes and non-data descriptors in type's MRO
  6. __getattr__ in type's MRO

4.4. All the foo

We can test the attribute access order with a custom class, and one parent to stand in for the method resolution order.

class Parent(object):
    def __getattribute__(self, name):
        if name == 'foo': return 'Second'
        return object.__getattribute__(self, name)
    foo = property(lambda self:'Fourth')
    def __getattr__(self, name):
        if name == 'foo': return 'Ninth'
        raise AttributeError("%r object has no attribute %r"
            % (self.__class__.__name__, name))

class Child(Parent):
    def __getattribute__(self, name):
        if name == 'foo': return 'First'
        return object.__getattribute__(self, name)
    foo = property(lambda self:'Third')
    def __getattr__(self, name):
        if name == 'foo': return 'Eighth'
        raise AttributeError("%r object has no attribute %r"
            % (self.__class__.__name__, name))

We can create an object attribute with the same name as a data descriptor by sneaking past to the object's __dict__.

The first and second values come from __getattribute__ methods in the child and parent classes.

bar = Child()
bar.__dict__['foo'] = 'Fifth'

bar.foo
'First'
del Child.__getattribute__
bar.foo
'Second'

The third and fourth values come from the child and parent class properties (data descriptors).

del Parent.__getattribute__
bar.foo
'Third'
del Child.foo
bar.foo
'Fourth'

The fifth value is the object attribute.

del Parent.foo
bar.foo
'Fifth'

We can now create class attributes with the same names as the class properties, but these do not override the object attribute value.

Child.foo = 'Sixth'
Parent.foo = 'Seventh'
bar.foo
'Fifth'

The sixth and seventh values are the class attributes in the child and parent classes.

del bar.foo
bar.foo
'Sixth'
del Child.foo
bar.foo
'Seventh'

Finally we reach the values returned from the child and parent __getattr__ methods.

del Parent.foo
bar.foo
'Eighth'
del Child.__getattr__
bar.foo
'Ninth'

5. Terrible ideas

There are some... other... ways of setting bar.foo.

5.1. Meta-Evil

Metaclasses can modify our class definitions and constructors in arbitrary ways.

This is powerful, but can be very surprising to people reading and trying to debug code. Make sure the the benefit of adding a metaclass is worth the cost.

class YouGetAFooEveryoneGetsAFoo(type):
    def __call__(cls):
        obj = type.__call__(cls)
        obj.foo = 'under your seat'
        return obj

class Bar(object):
    __metaclass__ = YouGetAFooEveryoneGetsAFoo

Bar().foo
'under your seat'

5.2. gc Evil

For objects that are reference counted (does not include small integers and strings) we can use the gc module to find and modify references, changing values globally.

Possibly useful for debugging. Never use in production code.

target = ('not a string',)

class Thing(object):
    pass

bar = Thing()
baz = Thing()
bar.foo = target
baz.foo = target

import gc
for d in gc.get_referrers(target):
    for k, v in d.items():
        if v == target:
            d[k] = 86

bar.foo
86
baz.foo
86
target
86

5.3. Bytecode Evil

It is difficult but possible to modify python code objects themselves after they have been created.

This is a terrible idea, of course.

class Bar(object):
    foo = 'steak'
    baz = 'pogo'

def i_think_you_meant_baz(f):
    import marshal
    code = marshal.dumps(f.func_code)
    f.func_code = marshal.loads(code.replace('foo', 'baz'))
    return f

@i_think_you_meant_baz
def dinner():
    print(Bar.foo)

dinner()
pogo

6. Coping strategies

We have seen that in Python attribute access can become very complicated. Here are some suggestions to reduce the complexity of your code.

  1. Don't over-use classes (namedtuple is quite nice!)
  2. When you use classes, don't use inheritance
  3. When you use inheritance, don't nest deeply
  4. Use __getattr__ only if it removes lots of duplication
  5. Don't use __getattribute__, you probably don't need it
  6. Use metaclasses only when the result is awesome
  7. Use gc only for debugging
  8. Never mess with bytecode

Tags: Ottawa Software Python OPAG