What does bar.foo do? - An exploration of Python assignment, objects, attributes and descriptors
This is a talk I gave March 27, 2014 at the Ottawa Python Authors Group meetup.
Follow along with the IPython notebook version if you would like to play with the code examples.
Overview
- Names and values
- Attribute access
- Selectively run code
- Complete control
- Terrible ideas
- Coping strategies
1. Names and values
First, what does foo do?
Better question: how can we make foo do something we want?
1.1. Simple assignment
Before assigment accessing a name produces a NameError.
foo
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
/home/ian/git/bar_foo/<ipython-input-1-d3b07384d113> in <module>()
----> 1 foo
NameError: name 'foo' is not defined
Let’s fix the error with a simple assignment statement.
foo = 'tesla'
foo
'tesla'
As expected the value we assigned is now available when using the name. To return to our original state we can unbind the name using del.
del foo
foo
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
/home/ian/git/bar_foo/<ipython-input-137-16af1079f68c> in <module>()
1 del foo
----> 2 foo
NameError: name 'foo' is not defined
1.2. def
Python includes many assignment statements. The next most common is used when defining functions.
def foo():
return 42
foo
A function object was created and assigned to a name with the def keyword. This function object is just like any other value in Python. It can be assigned to another name with a simple assignment statement. This new name now works identically to the original.
foo2 = foo
foo2()
42
We can verify that the two values are the exact same object.
foo2 is foo
True
1.3. class
Class creation in python assigns a class to a name.
class Foo(object):
pass
Foo
__main__.Foo
We used a capitalized name because that is the standard way to name classes in Python code. There is nothing significant about the capitalization but following the standard makes our code easier to understand.
1.4. for..in
For loops bind names that continue to be available outside the for block. This is often useful when searching for the first object that passes a test. It is also easy to accidentally override a name defined earlier, so be careful.
for foo in [1, 2, 3]:
pass
foo
3
1.5. import
Importing modules or values from modules binds those modules or values to names.
import sys as foo
foo
<module 'sys' (built-in)>
1.6. try..except
Exception variables are available outside the exception block. This is rarely useful, but be careful about accidentally overriding existing names.
try:
1/0
except ZeroDivisionError as foo:
pass
foo
ZeroDivisionError('integer division or modulo by zero')
1.7. with
Context manager variables are available outside their blocks as well.
with open('/tmp/things', 'w') as foo:
pass
foo
<closed file '/tmp/things', mode 'w' at 0x2ca39c0>
1.8. Decorators
Function and class decorators replace the value assigned by def and class blocks.
This (unusual) example replaces a function with a string.
def serious_decorator(f):
return str(f) + ' is much too silly'
@serious_decorator
def foo():
return 42
foo
'<function foo at 0x2d68758> is much too silly'
1.9. Evil ways to bind names
There are ways of binding names that should never be used in normal Python programs.
1.9.1. Evil use of vars, locals, globals
Local and global variables may be accessed and assigned through dictionaries returned by vars, locals or globals. Except for debugging, don’t ever do this.
vars()['foo'] = 'tesla'
foo
'tesla'
1.9.2. Evil use of list comprehensions (does not work in Python 3)
List comprehensions in Python 2 bind names like for statements. Be careful about the variable names you choose. Don’t write code that relies on this behavior.
[None for foo in [1,2,3]]
[None, None, None]
foo
3
1.9.3. Evil stack frame manipulation
There are ways of accessing local variables from other stack frames. Never do this.
def im_in_ur_frame():
import sys
frame = sys._getframe(1)
frame.f_locals['foo'] = 'writin ur vars'
im_in_ur_frame()
foo
'writin ur vars'
1.9.4. Evil use of import
Possibly worst of all: never import all the names from one module into another. This can make debugging and even just understanding code extremely difficult. It can also cause a new name added to that other module to break the module importing it.
from os.path import *
ismount
<function posixpath.ismount>
1.10. Name binding recap
We have eight non-evil ways to bind values to names in Python.
- assignment with =
- def
- class
- for..in
- import
- try..except
- with
- function/class decorators
- evil
2. Attribute access
Now, how can we make bar.foo do something we want?
2.1. Builtin objects with attributes
Some standard python objects have attributes we can access. Unfortunately, none are named foo.
bar = 2 + 3j
bar.imag
3.0
bar = open("/tmp/things", "w")
bar.closed
False
bar.close()
2.2. collections.namedtuple
The simplest choice for an object with a custom attribute is namedtuple. These objects are immutable, so they make it easier to reason about code. You can always pass them safely to other functions without worrying about them being modified. They also work well in with asynchronous and threaded programming.
from collections import namedtuple
Hotel = namedtuple('Hotel', 'foo baz')
bar = Hotel(42, 'shoe')
bar.foo
42
bar
Hotel(foo=42, baz='shoe')
bar[1]
'shoe'
bar.foo = 'tesla'
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/home/ian/git/bar_foo/<ipython-input-42-463460b3ac05> in <module>()
----> 1 bar.foo = 'tesla'
AttributeError: can't set attribute
2.3. Custom object attributes
If you can’t use an immutable object like namedtuple, use a normal Python object instead.
2.3.1. Assigning object attributes (slightly unusual)
Simple assignment works with normal python objects. This example is unusual because it is more common to assign object attributes within a method of the class.
class Hotel(object):
pass
bar = Hotel()
bar
<__main__.Hotel at 0x2d8d5d0>
bar.foo = 'tesla'
bar.foo
'tesla'
2.3.2. Assign attributes within a method
Assigning attributes during the initialization of an object is a common pattern.
class Motel(object):
def __init__(self):
self.foo = 'vespa'
bar = Motel()
bar
<__main__.Motel at 0x2d9e2d0>
bar.foo
'vespa'
2.3.3. Evil attribute assignment with setattr
Somewhat like accessing the dictionary of local variables, you can assign to attributes with names given as strings.
This is rarely a good idea. If you need an object with arbitrary keys it’s better to use a dictionary.
class Martini(object):
pass
bar = Martini()
setattr(bar, 'foo', 'twenty bucks')
bar.foo
'twenty bucks'
2.3.4. Evil attribute assignment via dict
Accessing dict to set attributes is a bad idea for the same reasons.
class Hangout(object):
pass
bar = Hangout()
bar.__dict__['foo'] = 'hipster'
bar.foo
'hipster'
bar.__dict__
{'foo': 'hipster'}
2.4. Class attributes
Python classes are objects too
2.4.1. Assigning class attributes (monkey patch)
Simple assignment to python classes can be used to change the behavior of a class and all its instances. This should be done with extreme care as it can lead to problems that are very hard to debug.
class Bar(object):
pass
Bar
__main__.Bar
Bar.foo = 'Blanche de Chambly'
Bar.foo
'Blanche de Chambly'
2.4.2. Assign within the class definition
Simple assignment to names within the class definition is one of the normal ways to create class attributes.
class Bar(object):
foo = 'Porter Baltique'
Bar
__main__.Bar
Bar.foo
'Porter Baltique'
2.4.3. Object attributes > class attributes
When accessing attributes of an object the object attribute hides the class attribute.
class Club(object):
pass
bar = Club()
bar.foo = 'techno'
Club.foo = 'chocololate'
bar.foo
'techno'
del bar.foo
bar.foo
'chocololate'
2.4.4. Evil class assignment with setattr
We can use setattr to assign class attributes, but this will make your program much harder to understand.
class Dive(object):
pass
setattr(Dive, 'foo', 'rusty nail')
Dive.foo
'rusty nail'
2.4.5. Assignment of class attributes, the wacky ways
All the other methods of binding a name work within a class definition as well. This is very strange Python code, but it does show that the language is consistent, at least.
class Bestiary(object):
import sys as mod
class Klass(object):
pass
for looper in [1, 2, 3]:
pass
try:
1/0
except ZeroDivisionError as exc:
pass
with open('/tmp/wat', 'w') as closedfile:
pass
Bestiary.mod
<module 'sys' (built-in)>
Bestiary.Klass
__main__.Klass
Bestiary.looper
3
Bestiary.exc
ZeroDivisionError('integer division or modulo by zero')
Bestiary.closedfile
<closed file '/tmp/wat', mode 'w' at 0x2ca3a50>
2.5. Catch-all __getattr__
method
If we want to provide a value when an attribute name has not been defined we can use __getattr__
. This can be useful for implementing proxies of objects or services. We don’t need to list methods and attributes explicitly, but we do need to handle genuinely missing attributes by raising AttributeError.
class Lounge(object):
def __getattr__(self, name):
if name == 'foo':
return 'lizard'
raise AttributeError("%r object has no attribute %r"
% (self.__class__.__name__, name))
bar = Lounge()
bar.foo
'lizard'
hasattr(bar, 'baz')
False
2.5.1. Object attribute > Class attribute > __getattr__
When accessing attributes both existing object and class attributes prevent calls to getattr.
Lounge.foo = 'larry'
bar.foo
'larry'
del Lounge.foo
bar.foo
'lizard'
2.6. MRO: Method (attribute) resolution order
Python classes support multiple inheritance. When searching for class attributes the method resolution order determines which attribute will be used.
class Structure(object):
pass
class Dancing(Structure):
pass
class Drinking(Structure):
pass
class Bar(Dancing, Drinking):
pass
Bar.__mro__
(__main__.Bar, __main__.Dancing, __main__.Drinking, __main__.Structure, object)
Structure.foo = 'bricks'
Drinking.foo = 'drinks'
Bar.foo
'drinks'
2.7. Order so far
- object attributes
- class attributes and non-data descriptors in type’s MRO
__getattr__
in type’s MRO
3. Run code in response
We can run code in response to attribute access with more precision than a __getattr__
method.
3.1. Methods
What about Python methods? There is something going on here:
class Bar(object):
def foo(self):
pass
Bar.foo
<unbound method Bar.foo>
bar = Bar()
bar.foo
<bound method Bar.foo of <__main__.Bar object at 0x2d9e550>>
3.1.1. Methods, a closer look (very unusual code)
We can get a better idea of what is happening by creating our method outside of the class.
class Bar(object):
pass
def foo(thing):
pass
Bar.foo = foo
foo
<function __main__.foo>
Bar.foo
<unbound method Bar.foo>
Bar().foo
<bound method Bar.foo of <__main__.Bar object at 0x2d9eed0>>
foo is Bar.__dict__['foo']
True
This function is not being modified when it is assigned to the class, but when we access it as an attribute of an object or class we get a different value.
3.2. Non-data descriptors (won’t be on the test)
The descriptor protocol is what makes this happen. We creating a non-data descriptor (a class with a __get__
method) and assign an instance of this class as an attribute of a second class. Now our non-data descriptor can decide what is returned when the attribute of the second class is accessed.
Don’t worry if this seems complicated, it’s extremely rare to need this feature of Python in normal programs. This feature exists just to avoid having to make bound methods and the staticmethod and classmethod decorators a core part of the Python language.
class AboutFoo(object):
def __get__(self, instance, owner):
return instance, owner
class Bar(object):
foo = AboutFoo()
Bar.foo
(None, __main__.Bar)
bar = Bar()
bar.foo
(, __main__.Bar)
bar.foo = 42
bar.foo
42
3.3. property
The standard way to execute code in response to an attribute access is to use a property.
class ReadOnly(object):
def _get_foo(self):
print("accessed foo!")
return 42
foo = property(_get_foo)
ReadOnly.foo
<property at 0x2d72f70>
bar = ReadOnly()
bar.foo
accessed foo!
42
bar.foo = 9
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/home/ian/git/bar_foo/<ipython-input-253-df4128335d67> in <module>()
----> 1 bar.foo = 9
AttributeError: can't set attribute
3.3.1. Setter
Properties can define setters as well, so we can implement read/write attribute access with custom function calls.
class ReadWrite(object):
def _get_foo(self):
return self._foo
def _set_foo(self, value):
print("just set foo to %r!" % value)
self._foo = value
foo = property(_get_foo, _set_foo)
bar = ReadWrite()
bar.foo
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/home/ian/git/bar_foo/<ipython-input-257-d0aa0af9ef79> in <module>()
10
11 bar = ReadWrite()
---> 12 bar.foo
/home/ian/git/bar_foo/<ipython-input-257-d0aa0af9ef79> in _get_foo(self)
1 class ReadWrite(object):
2 def _get_foo(self):
----> 3 return self._foo
4
5 def _set_foo(self, value):
AttributeError: 'ReadWrite' object has no attribute '_foo'
bar.foo = 9
just set foo to 9!
bar.foo
9
3.3.2. Prettier?
If you prefer, getters and setters also may be written as function decorators.
class Propertify(object):
@property
def foo(self):
return self._foo + 1
@foo.setter
def foo(self, value):
self._foo = value
bar = Propertify()
bar.foo = 2
bar.foo
3
3.3.2. Overriding del (for completeness, I guess)
If that wasn’t enough you can also customize unbinding of attributes.
It’s very strange to create an interface that involves unbinding an object’s attributes as part of normal operation, so I suggest you avoid this one.
class Oddball(object):
excuses = ["No", "I refuse", "Go away"]
def _del_foo(self):
print(self.excuses.pop(0))
foo = property(fdel=_del_foo)
bar = Oddball()
del bar.foo
No
del bar.foo
I refuse
del bar.foo
Go away
3.4. Data descriptors (also not on the test)
Properties fall under the descriptor protocol as “data descriptors”. Data descriptors are instances of classes that have __get__
and __set__
methods, assigned as class attributes of another class.
class ThisIsMyFoo(object):
def __get__(self, instance, owner):
return instance, owner
def __set__(self, instance, value):
print("tried to set %r value to %r" % (instance, value))
class Bar(object):
foo = ThisIsMyFoo()
bar = Bar()
bar.foo
(<__main__.Bar at 0x2d9efd0>, __main__.Bar)
bar.foo = 42
tried to set <__main__.Bar at 0x2d9efd0> value to 42
property is implemented as a data descriptor.
3.5. Order recap
Non-data descriptors are accessed just like class attributes. Data descriptors take a new position in the priority just before object attributes, although it is tricky to create an object that has both (we will see how this can be done later).
- data descriptors in type’s MRO
- object attributes
- class attributes and non-data descriptors in type’s MRO
__getattr__
in type’s MRO
4. Complete control
Occasionally it’s useful to have a little more control.
4.1. Preemptive __getattribute__
method
Most of the attribute access logic, including all the special descriptor behavior is implemented in object.__getattribute__
and type.__getattribute__
(yes, confusingly similar to __getattr__
).
If we override this method we get complete control of what happens when attributes are accessed. There is no similar method for complete control of assignment and unbinding, though.
Extra care must be taken in __getattribute__
implementation to fall back to the default implementation if you want your object to behave somewhat normally.
class Casino(object):
def __getattribute__(self, name):
if name == 'foo':
return 'rat'
return object.__getattribute__(self, name)
bar = Casino()
bar.foo
'rat'
bar.foo = 'mouse'
bar.foo
'rat'
4.2. PyPy’s transparent proxy
In PyPy we can go even further. We can catch and modify every read/write/delete to an object attribute with a transparent proxy.
from tputil import make_proxy
history = []
def recorder(operation):
history.append(operation)
return operation.delegate()
o = make_proxy(recorder, obj=[])
This proxy appears as the exact same type as the original object but every operation may be intercepted and potentially modified.
4.3. Final order
- PyPy transparent proxy
__getattribute__
in type’s MRO- data descriptors in type’s MRO
- object attributes
- class attributes and non-data descriptors in type’s MRO
__getattr__
in type’s MRO
4.4. All the foo
We can test the attribute access order with a custom class, and one parent to stand in for the method resolution order.
class Parent(object):
def __getattribute__(self, name):
if name == 'foo': return 'Second'
return object.__getattribute__(self, name)
foo = property(lambda self:'Fourth')
def __getattr__(self, name):
if name == 'foo': return 'Ninth'
raise AttributeError("%r object has no attribute %r"
% (self.__class__.__name__, name))
class Child(Parent):
def __getattribute__(self, name):
if name == 'foo': return 'First'
return object.__getattribute__(self, name)
foo = property(lambda self:'Third')
def __getattr__(self, name):
if name == 'foo': return 'Eighth'
raise AttributeError("%r object has no attribute %r"
% (self.__class__.__name__, name))
We can create an object attribute with the same name as a data descriptor by sneaking past to the object’s __dict__
.
The first and second values come from __getattribute__
methods in the child and parent classes.
bar = Child()
bar.__dict__['foo'] = 'Fifth'
bar.foo
'First'
del Child.__getattribute__
bar.foo
'Second'
The third and fourth values come from the child and parent class properties (data descriptors).
del Parent.__getattribute__
bar.foo
'Third'
del Child.foo
bar.foo
'Fourth'
The fifth value is the object attribute.
del Parent.foo
bar.foo
'Fifth'
We can now create class attributes with the same names as the class properties, but these do not override the object attribute value.
Child.foo = 'Sixth'
Parent.foo = 'Seventh'
bar.foo
'Fifth'
The sixth and seventh values are the class attributes in the child and parent classes.
del bar.foo
bar.foo
'Sixth'
del Child.foo
bar.foo
'Seventh'
Finally we reach the values returned from the child and parent __getattr__
methods.
del Parent.foo
bar.foo
'Eighth'
del Child.__getattr__
bar.foo
'Ninth'
5. Terrible ideas
There are some… other… ways of setting bar.foo.
5.1. Meta-Evil
Metaclasses can modify our class definitions and constructors in arbitrary ways.
This is powerful, but can be very surprising to people reading and trying to debug code. Make sure the the benefit of adding a metaclass is worth the cost.
class YouGetAFooEveryoneGetsAFoo(type):
def __call__(cls):
obj = type.__call__(cls)
obj.foo = 'under your seat'
return obj
class Bar(object):
__metaclass__ = YouGetAFooEveryoneGetsAFoo
Bar().foo
'under your seat'
5.2. gc Evil
For objects that are reference counted (does not include small integers and strings) we can use the gc module to find and modify references, changing values globally.
Possibly useful for debugging. Never use in production code.
target = ('not a string',)
class Thing(object):
pass
bar = Thing()
baz = Thing()
bar.foo = target
baz.foo = target
import gc
for d in gc.get_referrers(target):
for k, v in d.items():
if v == target:
d[k] = 86
bar.foo
86
baz.foo
86
target
86
5.3. Bytecode Evil
It is difficult but possible to modify python code objects themselves after they have been created.
This is a terrible idea, of course.
class Bar(object):
foo = 'steak'
baz = 'pogo'
def i_think_you_meant_baz(f):
import marshal
code = marshal.dumps(f.func_code)
f.func_code = marshal.loads(code.replace('foo', 'baz'))
return f
@i_think_you_meant_baz
def dinner():
print(Bar.foo)
dinner()
pogo
6. Coping strategies
We have seen that in Python attribute access can become very complicated. Here are some suggestions to reduce the complexity of your code.
- Don’t over-use classes (namedtuple is quite nice!)
- When you use classes, don’t use inheritance
- When you use inheritance, don’t nest deeply
- Use
__getattr__
only if it removes lots of duplication - Don’t use
__getattribute__
, you probably don’t need it - Use metaclasses only when the result is awesome
- Use gc only for debugging
- Never mess with bytecode