I was prototyping an HTTP server using the Python framework Twisted. After cobbling my code together, I was demoing to Igor and Dan. Igor, decided to flood the HTTP server with requests to see how it would hold over time. Turns out we were leaking memory.
A short Google later and a nice StackOverflow thread, we decided to profile the HTTP server with Heapy. Heapy is part of package called Guppy. Simple install is done like this
1 | $> pip install guppy |
The code below is an example of a simplified HTTP server. I blatantly put in a memory leak for the purposes of demonstrating on how I found it.
1 2 | from twisted.web import resource, server from twisted.internet import reactor |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | memory_leak = [] class RootResource(resource.Resource): def render_GET(self, request): # Obvious Memory Leak memory_leak.append(request) return "OK" def getChild(self, name, request): return self if __name__ == '__main__': port = 8080 site = server.Site(RootResource()) reactor.listenTCP(port, site) # Dubugging Memory - outputs file to heap_profile.txt from cwd from guppy import hpy h = hpy() def memory_profile(h, reactor): f = open('heap_profile.txt', 'a') f.write('Memory Statistics: \n%s' % h.heap()) f.write('\n\n') f.write('Traversing Heap: \n%s' % h.heap().get_rp(40)) f.write('\n\n') f.close() # Print profile every 5m after execution for 1h for increment in xrange(60, 3600, 300): reactor.callLater(increment, memory_profile, h, reactor) # Kill reactor a minute later to give heap time to write. reactor.callLater(3660, reactor.stop) # Now open the page repeatively from twisted.web.client import getPage from twisted.internet.task import LoopingCall gp = LoopingCall(getPage, 'http://localhost:8080') reactor.callLater(1, gp.start, 0.1) |
1 | reactor.run() |
If you look through the comments you’ll be able to see that I create a function called memory_profile(), which just spits out two profiles from heapy to a file. A few lines past this, you can see I use the reactor’s callLater() to call at 300s increments. The last thing I do is kill the reactor with reactor.callLater(3660, reactor.stop). I give heapy 60 seconds to write the last profile to the file (which is more than plenty of time).
The following output is two profiles, one at 1 minute, another 5 minutes after.
Memory Statistics:
Partition of a set of 104996 objects. Total size = 9162436 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 43217 41 2949300 32 2949300 32 str
1 590 1 988840 11 3938140 43 dict of twisted.web.server.Request
2 4369 4 848588 9 4786728 52 dict (no owner)
3 18370 17 757704 8 5544432 61 tuple
4 8059 8 470316 5 6014748 66 list
5 4724 4 321232 4 6335980 69 types.CodeType
6 590 1 309160 3 6645140 73 dict of twisted.internet.tcp.Server
7 204 0 307344 3 6952484 76 dict of module
8 4754 5 285240 3 7237724 79 function
9 467 0 209740 2 7447464 81 type
<222 more rows. Type e.g. '_.more' to view.>
Traversing Heap:
Reference Pattern by <[dict of] class>.
0: _ --- [-] 105004 (Queue.Queue | __builtin__.PyCObject | __builtin__.cell ...
1: a [-] 590 dict of twisted.web.server.Request: 0x15e18a0..., 0x15e1b98
2: aa ---- [-] 590 twisted.web.server.Request: 0x1564b98, 0x15d8850...
3: a3 [-] 1 list: 0x12634b8*590
4: a4 ------ [S] 1 dict of module: __main__
5: b [-] 18370 tuple: 0x4a76f0*1, 0x4a7730*1, 0x4a77f0*1, 0x4a7810*1...
6: ba ---- [S] 4724 types.CodeType: site.py:82:makepath...
7: bb [-] 590 dict of twisted.internet.tcp.Server: 0x15e3710..., 0x162f310
8: bba ---- [-] 590 twisted.internet.tcp.Server: 0x157ef90, 0x15e3150...
9: bbaa [^ 1] 590 dict of twisted.web.server.Request: 0x15e1b98...
Memory Statistics:
Partition of a set of 261479 objects. Total size = 25045732 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 3543 1 5938068 24 5938068 24 dict of twisted.web.server.Request
1 87519 33 4638728 19 10576796 42 str
2 19145 7 2917228 12 13494024 54 dict (no owner)
3 46449 18 2620132 10 16114156 64 list
4 3543 1 1856532 7 17970688 72 dict of twisted.internet.tcp.Server
5 7086 3 992040 4 18962728 76 dict of twisted.internet.address.IPv4Address
6 7086 3 992040 4 19954768 80 dict of twisted.web.http_headers.Headers
7 7086 3 992040 4 20946808 84 dict of twisted.web.http_headers._DictHeaders
8 21326 8 864116 3 21810924 87 tuple
9 4724 2 321232 1 22132156 88 types.CodeType
<226 more rows. Type e.g. '_.more' to view.>
Traversing Heap:
Reference Pattern by <[dict of] class>.
0: _ --- [-] 261479 (Queue.Queue | __builtin__.PyCObject | __builtin__.cell ...
1: a [-] 3543 dict of twisted.web.server.Request: 0x15e18a0..., 0x15e1b98
2: aa ---- [-] 3543 twisted.web.server.Request: 0x1564b98, 0x15d8850...
3: a3 [-] 1 list: 0x12634b8*3543
4: a4 ------ [S] 1 dict of module: __main__
5: b [-] 3543 dict of twisted.internet.tcp.Server: 0x15e3710..., 0x162f310
6: ba ---- [-] 3543 twisted.internet.tcp.Server: 0x157ef90, 0x15e3150...
7: baa [^ 1] 3543 dict of twisted.web.server.Request: 0x15e1b98...
8: c ---- [-] 25140 list: 0x4a8850*1, 0x4a8878*2, 0x4a88a0*2, 0x4a8ee0*202...
9: ca [^ 1] 3543 dict of twisted.web.server.Request: 0x15e1b98...
I am looking for a pattern here. Specifically what I am interested in is:
Partition of a set of 104996 objects. Total size = 9162436 bytes.
Then again in the second profile:
Partition of a set of 261479 objects. Total size = 25045732 bytes.
I can see that my number of objects is rapidly increasing and I can see that we went from ~9MB to ~25MB in very short order.
Now to figure out where it is coming from, I want to “Traverse” the heap. This will help me go in the right direction. It’s pretty clear from as I look at the change from the “Traversing Heap” from the first set to the second set, that the Request object is becoming increasing an issue.
Twisted is extremely well written and does not have memory leaks “out of the box”, so the issue is in my code. Thankfully I do not have complex code and it’s very easy to see what’s going on with the Request object. Appending the Request object to a global list is pretty stupid… thank god it’s only an example :P
The major lesson learned from my experience is to profile memory often and to set up automated tests to do so. Had our development gone much longer without examining our code, we would have likely pulled our hair out. Also critical for us to find the source of our leak, was to let the test run for a length of time and try to find a pattern.