The memory footprint of Python objects

Background

Of late, I was exploring Python again after a while and looking at how classes and instances keep state. Let us look at a simple example of a Python class.

class Person:
    def __init__(self, name, age, gender):
        self.name = name
        self.age = age
        self.gender = gender

Let me try creating an instance of this on the prompt.

p1 = Person('Anand', 49, 'M')

So how much memory does this instance take ? Python provides a function named getsizeof for this in the sys module. Let us try and use this.

>>> import sys
>>> p1 = Person('Anand', 49, 'M')
>>> sys.getsizeof(p1)
56

Okay... it does seem a bit small for the object, doesn't it ?

Let me try and add a variable to the instance dynamically.

>>> p1.job = 'Engineer'
>>> sys.getsizeof(p1)
56

The size remains the same. How is this possible ? Definitely the addition of a new string attribute should increase the size correct ?

Let us check the help for sys.getsizeof.

>>> help(sys.getsizeof)

Help on built-in function getsizeof in module sys:

getsizeof(...)
    getsizeof(object [, default]) -> int

    Return the size of object in bytes.
(END)

That doesn't help much.

However, the module documentation of the function does. I recommend you read it yourself in whole, but here is an important detail I want to reproduce.

... Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to...

So sys.getsizeof only fetches the memory footprint of the object itself, not the objects it contains or refers to - i.e its "referents". In other words, sys.getsizeof returns the flat size of the object.

Measuring nested memory footprint

Surprisingly, there is no function or module in the Python standard library which gives you a better idea of the detailed memory footprint of a Python object. One can write one which does this recursively, but a library is available which already does this for us.

The library is named pympler and you can install it using pip.

pip install pympler

pympler provides the asizeof module which has a few methods of interest.

The first one is named asizeof itself.

>>> from pympler import asizeof
>>> p1 = Person('Anand', 49, 'M')
>>> asizeof.asizeof(p1)
648

That seems to make better sense. Let us see if this changes if I add a dynamic variable.

>>> p1.job = 'Engineer'
>>> asizeof.asizeof(p1)
768

Well it does.

Let us investigate further. Let us start a new Python session.

>>> p2 = Person('Anand', 49, 'M')
>>> asizeof.asizeof(p2)
664

Why does it report a different size now ? The reason is there is no definite answer for exact size of a Python object, since a Python object is made up of a few core attributes plus a large number of referrents - objects it refer to.

Why does the object size vary ?

There is no single true value of an object's size in Python. What pympler asizeof method does is, traverse the referrring objects to a certain default limit and add up their counts to arrive at an overall size. It make sure that any objects referred multiple times is included only once in the size calculation.

Pympler uses an internal memo for caching objects it has already seen and to avoid infinite recursion in case of cycles in object graphs. When the interpreter is restarted, the state of its internal caches, and the memo of pympler is reset. So one may see slightly different numbers reported for the same object across different Python sessions.

The default value for recursion limit is 100. The larger the value, the less accurate the calculation becomes. One can also try a smaller limit, which usually reports a smaller value.

To find details of what is being calculated, one can pass the param stats with varying integer values. The higher the value, more the details reported.

>>> p2 = Person('Anand', 49, 'M')
>>> asizeof.asizeof(p2, stats=2)
asizeof((<person.Person object at 0x7fad4a634a90>,), stats=2) ...
 656 bytes
   8 byte aligned
   8 byte sizeof(void*)
   1 object given
   9 objects sized
   9 objects seen
   0 objects missed
   0 duplicates
   2 deepest recursion

   4 profiles:  total (% of grand total), average, and largest flat size:  largest object
   1 class dict object:  288 (44%), 288, 288:  {'name': 'Anand', 'age': 49, 'gender': 'M'} leng 0
   5 class str objects:  280 (43%), 56, 56:  'name' leng 5!
   1 class person.Person object:  56 (9%), 56, 56:  <person.Person object at 0x7fad4a634a90>
   1 class int object:  32 (5%), 32, 32:  49 leng 1!
656

Adding up the various numbers gives us,

>>> 288 + 280 + 56 + 32
656

So, in the case of a class instance, pympler is adding up the flat size of the instance (56) plus the total size of the __dict__ of the instance and its members (600 in this case)

Let us try it for the smaller value of limit.

>>> asizeof.asizeof(p2, limit=1, stats=2)

asizeof((<person.Person object at 0x7fad4a634a90>,), limit=1, stats=2) ...
 352 bytes
   8 byte aligned
   8 byte sizeof(void*)
   1 object given
   3 objects sized
   3 objects seen
   0 objects missed
   0 duplicates
   1 deepest recursion

   2 profiles:  total (% of grand total), average, and largest flat size:  largest object
   1 class dict object:  296 (84%), 296, 296:  {'name': 'Anand', 'age': 49, 'gender': 'M'} leng 0
   1 class person.Person object:  56 (16%), 56, 56:  <person.Person object at 0x7f0f37e7a090>
352

It reports a smaller number as it skips adding the sizes for the individual elements of the instance but just adds the size of the instance plus its __dict__. Note that for the dict itself it adds 296 bytes. Let us try and confirm this with sys.getsizeof, since it is a flat size anyway.

>>> sys.getsizeof(p2.__dict__)
296

How to check this against asizeof itself ? Since we are looking for the flat size, pass limit value as 0.

>>> asizeof.asizeof(p2.__dict__, limit=0, stats=2)

asizeof(({'name': 'Anand', 'age': 49, 'gender': 'M'},), limit=0, stats=2) ...
 296 bytes
   8 byte aligned
   8 byte sizeof(void*)
   1 object given
   1 object sized
   1 object seen
   0 objects missed
   0 duplicates

   1 profile:  total (% of grand total), average, and largest flat size:  largest object
   1 class dict object:  296 (100%), 296, 296:  {'name': 'Anand', 'age': 49, 'gender': 'M'} leng 0
296

Let us look at another example where the size varies because of interpreter caching. Let us do this on a new interpreter session. (Assume the imports are done)

>>> p1 = Person('Anand', 49, 'M')
>>> asizeof.asizeof(p1)
664
>>> p2 = Person('Anand', 49, 'M')
>>> asizeof.asizeof(p2)
656

So why is asizeof reporting p2 as 8 bytes shorter than p1 ? Let us check the detail.

>>> p1 = Person('Anand', 49, 'M')
>>> asizeof.asizeof(p1, stats=2)

asizeof((<person.Person object at 0x7f9242b4a590>,), stats=2) ...
 664 bytes
   8 byte aligned
   8 byte sizeof(void*)
   1 object given
   9 objects sized
   9 objects seen
   0 objects missed
   0 duplicates
   2 deepest recursion

   4 profiles:  total (% of grand total), average, and largest flat size:  largest object
   1 class dict object:  296 (45%), 296, 296:  {'name': 'Anand', 'age': 49, 'gender': 'M'} leng 0
   5 class str objects:  280 (42%), 56, 56:  'name' leng 5!
   1 class person.Person object:  56 (8%), 56, 56:  <person.Person object at 0x7f9242b4a590>
   1 class int object:  32 (5%), 32, 32:  49 leng 1!
664
>>> p2 = Person('Anand', 49, 'M')
>>> asizeof.asizeof(p2, stats=2)

asizeof((<person.Person object at 0x7f9242dfc490>,), stats=2) ...
 656 bytes
   8 byte aligned
   8 byte sizeof(void*)
   1 object given
   9 objects sized
   9 objects seen
   0 objects missed
   0 duplicates
   2 deepest recursion

   4 profiles:  total (% of grand total), average, and largest flat size:  largest object
   1 class dict object:  288 (44%), 288, 288:  {'name': 'Anand', 'age': 49, 'gender': 'M'} leng 0
   5 class str objects:  280 (43%), 56, 56:  'name' leng 5!
   1 class person.Person object:  56 (9%), 56, 56:  <person.Person object at 0x7f9242dfc490>
   1 class int object:  32 (5%), 32, 32:  49 leng 1!
656

You can see the difference of 8 bytes is from the class __dict__ object (288 vs 296). This happens due to a magic with Python dictionaries which adds an optimization for instance dictionaries called "key sharing dictionaries".

When we created p1 it is a single instance and allocates its own __dict__ with a key table for the attributes. The second instance (p2) reuses the shared keys object for attribute names. That reduces memory, since only the values array needs to be unique.

Every new instances does a bit more optimization, so as we add more instances, the optimization gets more aggressive and instead of increasing, the overall instance __dict__ size reduces!

Now if you try to check this for p1 again,

>>> asizeof.asizeof(p1, stats=2)

asizeof((<person.Person object at 0x7f9242b4a590>,), stats=2) ...
 656 bytes
   8 byte aligned
   8 byte sizeof(void*)
   1 object given
   9 objects sized
   9 objects seen
   0 objects missed
   0 duplicates
   2 deepest recursion

   4 profiles:  total (% of grand total), average, and largest flat size:  largest object
   1 class dict object:  288 (44%), 288, 288:  {'name': 'Anand', 'age': 49, 'gender': 'M'} leng 0
   5 class str objects:  280 (43%), 56, 56:  'name' leng 5!
   1 class person.Person object:  56 (9%), 56, 56:  <person.Person object at 0x7f9242b4a590>
   1 class int object:  32 (5%), 32, 32:  49 leng 1!
656

It reports the same value due to the caching.

In fact, one can see more interesting "size reductions" with the class dictionary as one creates more objects. Here is another example with a 3rd object.

>>> p3 = Person('Anand', 49, 'M')
>>> asizeof.asizeof(p1, stats=2)

asizeof((<person.Person object at 0x7f46b56d2090>,), stats=2) ...
 648 bytes
   8 byte aligned
   8 byte sizeof(void*)
   1 object given
   9 objects sized
   9 objects seen
   0 objects missed
   0 duplicates
   2 deepest recursion

   4 profiles:  total (% of grand total), average, and largest flat size:  largest object
   5 class str objects:  280 (43%), 56, 56:  'name' leng 5!
   1 class dict object:  280 (43%), 280, 280:  {'name': 'Anand', 'age': 49, 'gender': 'M'} leng 0
   1 class person.Person object:  56 (9%), 56, 56:  <person.Person object at 0x7f46b56d2090>
   1 class int object:  32 (5%), 32, 32:  49 leng 1!
648

As you can see, adding a 3rd object reduced the size of the class __dict__ reference further from 288 to 280 bytes, dropping another 8 bytes in computed size. This now holds true for all the 3 objects.

>>> asizeof.asizeof(p1)
648
>>> asizeof.asizeof(p2)
648
>>> asizeof.asizeof(p3)
648

Wait, is this an anomaly in pympler implementation ?

To verify this, We can always rely on sys.getsizeof which always accurately reports the flat size.

>>> sys.getsizeof(p1.__dict__)
280
>>> sys.getsizeof(p2.__dict__)
280
>>> sys.getsizeof(p3.__dict__)
280

Well there you are. Clearly calculating the sizes of objects in Python dynamically is a tricky business and is almost never idempotent for objects holding a dynamic dictionary (a __dict__) reference, which are ... a lot of objects.

You can read more about key sharing dictionaries in PEP 412

Let me conclude this article with a brief reference to the asized method.

Using the "asized" method

The asizeof module provides a handy asized method which is more useful for getting a quick glimpse into the gory details of the memory footprint than the very verbose asizeof(object, stats=n).

Just call it using its .format() method and passing the param detail. It returns a string representation which one can print to the console.

>>> print(asizeof.asized(p1, detail=2).format())
<person.Person object at 0x7f46b56d2090> size=648 flat=56
    __dict__ size=592 flat=280
        [K] name size=56 flat=56
        [V] name: 'Anand' size=56 flat=56
        [K] age size=56 flat=56
        [K] gender size=56 flat=56
        [V] gender: 'M' size=56 flat=56
        [V] age: 49 size=32 flat=32
    __class__ size=0 flat=0

Again one can add up the individual sizes and verify it matches.

>>> 280 + 56 + 56 + 56 + 56 + 56 + 56 + 32
648

Or, it is easier to add the instance flat size (56) plus the instance dictionary (plus its referrants) size (592 in this case)

>>> 592 + 56
648

Thank you for reading. I will conclude this article here, but this being an interesting topic, will try and delve deeper into memory footprints of Python objects in the future with a discussion on how __dict__ works, where we can touch upon related topics such as weak references and __slots__.

$ The Prompted Programmer

>>> notes from the command line to the cloud

Background

Measuring nested memory footprint

Why does the object size vary ?

Using the "asized" method

Comments