Make stuff to do things

Subscribe (?) Subscribe to RSS

Posts Tagged ‘memory’

Memory Profiling in Twisted Code with Heapy

Published on September 1st, 2009 in Comments

I was prototyping an HTTP server using the Python framework Twisted. After cobbling my code together, I was demoing to Igor and Dan. Igor, decided to flood the HTTP server with requests to see how it would hold over time. Turns out we were leaking memory.

A short Google later and a nice StackOverflow thread, we decided to profile the HTTP server with Heapy. Heapy is part of package called Guppy. Simple install is done like this

1
$> pip install guppy

The code below is an example of a simplified HTTP server. I blatantly put in a memory leak for the purposes of demonstrating on how I found it.

1
2
from twisted.web import resource, server
from twisted.internet import reactor
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
memory_leak = []

class RootResource(resource.Resource):
def render_GET(self, request):
# Obvious Memory Leak
memory_leak.append(request)
return "OK"

def getChild(self, name, request):
return self

if __name__ == '__main__':
port = 8080
site = server.Site(RootResource())
reactor.listenTCP(port, site)

# Dubugging Memory - outputs file to heap_profile.txt from cwd
from guppy import hpy
h = hpy()
def memory_profile(h, reactor):
f = open('heap_profile.txt', 'a')
f.write('Memory Statistics: \n%s' % h.heap())
f.write('\n\n')
f.write('Traversing Heap: \n%s' % h.heap().get_rp(40))
f.write('\n\n')
f.close()

# Print profile every 5m after execution for 1h
for increment in xrange(60, 3600, 300):
reactor.callLater(increment, memory_profile, h, reactor)
# Kill reactor a minute later to give heap time to write.
reactor.callLater(3660, reactor.stop)

# Now open the page repeatively
from twisted.web.client import getPage
from twisted.internet.task import LoopingCall
gp = LoopingCall(getPage, 'http://localhost:8080')
reactor.callLater(1, gp.start, 0.1)
1
 reactor.run()

If you look through the comments you’ll be able to see that I create a function called memory_profile(), which just spits out two profiles from heapy to a file. A few lines past this, you can see I use the reactor’s callLater() to call at 300s increments. The last thing I do is kill the reactor with reactor.callLater(3660, reactor.stop). I give heapy 60 seconds to write the last profile to the file (which is more than plenty of time).

The following output is two profiles, one at 1 minute, another 5 minutes after.

Memory Statistics:
Partition of a set of 104996 objects. Total size = 9162436 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  43217  41  2949300  32   2949300  32 str
     1    590   1   988840  11   3938140  43 dict of twisted.web.server.Request
     2   4369   4   848588   9   4786728  52 dict (no owner)
     3  18370  17   757704   8   5544432  61 tuple
     4   8059   8   470316   5   6014748  66 list
     5   4724   4   321232   4   6335980  69 types.CodeType
     6    590   1   309160   3   6645140  73 dict of twisted.internet.tcp.Server
     7    204   0   307344   3   6952484  76 dict of module
     8   4754   5   285240   3   7237724  79 function
     9    467   0   209740   2   7447464  81 type
<222 more rows. Type e.g. '_.more' to view.>

Traversing Heap:
Reference Pattern by <[dict of] class>.
 0: _ --- [-] 105004 (Queue.Queue | __builtin__.PyCObject | __builtin__.cell ...
 1: a      [-] 590 dict of twisted.web.server.Request: 0x15e18a0..., 0x15e1b98
 2: aa ---- [-] 590 twisted.web.server.Request: 0x1564b98, 0x15d8850...
 3: a3       [-] 1 list: 0x12634b8*590
 4: a4 ------ [S] 1 dict of module: __main__
 5: b      [-] 18370 tuple: 0x4a76f0*1, 0x4a7730*1, 0x4a77f0*1, 0x4a7810*1...
 6: ba ---- [S] 4724 types.CodeType: site.py:82:makepath...
 7: bb      [-] 590 dict of twisted.internet.tcp.Server: 0x15e3710..., 0x162f310
 8: bba ---- [-] 590 twisted.internet.tcp.Server: 0x157ef90, 0x15e3150...
 9: bbaa      [^ 1] 590 dict of twisted.web.server.Request: 0x15e1b98...

Memory Statistics:
Partition of a set of 261479 objects. Total size = 25045732 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0   3543   1  5938068  24   5938068  24 dict of twisted.web.server.Request
     1  87519  33  4638728  19  10576796  42 str
     2  19145   7  2917228  12  13494024  54 dict (no owner)
     3  46449  18  2620132  10  16114156  64 list
     4   3543   1  1856532   7  17970688  72 dict of twisted.internet.tcp.Server
     5   7086   3   992040   4  18962728  76 dict of twisted.internet.address.IPv4Address
     6   7086   3   992040   4  19954768  80 dict of twisted.web.http_headers.Headers
     7   7086   3   992040   4  20946808  84 dict of twisted.web.http_headers._DictHeaders
     8  21326   8   864116   3  21810924  87 tuple
     9   4724   2   321232   1  22132156  88 types.CodeType
<226 more rows. Type e.g. '_.more' to view.>

Traversing Heap:
Reference Pattern by <[dict of] class>.
 0: _ --- [-] 261479 (Queue.Queue | __builtin__.PyCObject | __builtin__.cell ...
 1: a      [-] 3543 dict of twisted.web.server.Request: 0x15e18a0..., 0x15e1b98
 2: aa ---- [-] 3543 twisted.web.server.Request: 0x1564b98, 0x15d8850...
 3: a3       [-] 1 list: 0x12634b8*3543
 4: a4 ------ [S] 1 dict of module: __main__
 5: b      [-] 3543 dict of twisted.internet.tcp.Server: 0x15e3710..., 0x162f310
 6: ba ---- [-] 3543 twisted.internet.tcp.Server: 0x157ef90, 0x15e3150...
 7: baa      [^ 1] 3543 dict of twisted.web.server.Request: 0x15e1b98...
 8: c ---- [-] 25140 list: 0x4a8850*1, 0x4a8878*2, 0x4a88a0*2, 0x4a8ee0*202...
 9: ca      [^ 1] 3543 dict of twisted.web.server.Request: 0x15e1b98...

I am looking for a pattern here. Specifically what I am interested in is:

Partition of a set of 104996 objects. Total size = 9162436 bytes.

Then again in the second profile:

Partition of a set of 261479 objects. Total size = 25045732 bytes.

I can see that my number of objects is rapidly increasing and I can see that we went from ~9MB to ~25MB in very short order.

Now to figure out where it is coming from, I want to “Traverse” the heap. This will help me go in the right direction. It’s pretty clear from as I look at the change from the “Traversing Heap” from the first set to the second set, that the Request object is becoming increasing an issue.

Twisted is extremely well written and does not have memory leaks “out of the box”, so the issue is in my code. Thankfully I do not have complex code and it’s very easy to see what’s going on with the Request object. Appending the Request object to a global list is pretty stupid… thank god it’s only an example :P

The major lesson learned from my experience is to profile memory often and to set up automated tests to do so. Had our development gone much longer without examining our code, we would have likely pulled our hair out. Also critical for us to find the source of our leak, was to let the test run for a length of time and try to find a pattern.

Switch to our mobile site