Python : Memory Management and debugging
Memory Management in Python: 'Important points'
- Dynamic Allocation(on Heap Memory) -> Objects and values
- Stack Memory => variables and Methods/Functions
- Everything in Python is object.
- Python always uses heap to store the Ints, String values etc unlike the C.
- It maintains the reference counts if multiple variables are pointing to the same Objects. However for weak-refs the refcount is not incremented.
- Once the Reference count(sys.refcount=0) reaches zero python does Automatic Garbage collection immediately.
- Refcounts are not thread safe. The main reason GIL comes into play.
- In Class, we can use __slot__ = ('x','y') etc to make the class immutable i.e not to allow to have new attributes/methods.
- In Complex data structure such as doubly LinkedList, DeQueue, Trees; in some cases due to cyclical references, Ref counts never reaches to Zero. In that case, Python uses Mark & sweep(Java Uses) Algorithm periodically by marking the only objects which have ref counts greater than zero. The Approach is called as generation list algorithm.
- However, we can enforce garbage collection of such objects immediately by having override magic method __del__ in python2.x.
- As python stores, the value, Reference count and type in memory for each and every data structures ; it consumes higher memory compared to primitive static memory allocation ex.c(int -> 2/4 bytes fixed)
- Use sys.getsizeof(obj) to get the size of an object. Make sure to consider the size of nested elements while dealing with complex objects such as Dictionary and classes. The best approach here would be to write a function to recursively traverse and get the size of elements.
- There are various ways, we can measure the memory usage of python Program/objects. Here are few:
- guppy.hpy().heap() => This gives the aggregate memory of each python program/objects type.
- memory_profiler.profile => Normally used as a decorator to a function. This gives memory usage from the OS/external point of view.
- resource.getrusage(resource.RUSAGE_SELF).ru_maxrss ; This gives the Total Memory of objects, its good to track down at what stage the memory surge occurs.
- Apart from that we can debug using Unix gdb -p or Unix htop commands etc.
- Use psutils tool in Python ; psutil.Process(os.getpid()).as_dict()['memory_info'].rss
- Also feel free to Use Interactive debugger with : import pdb; pdb.set_trace()
- If you are using IDE such as Pycharm, then you can make use of PyDev debugger/breaks.
- If you wish to set memory allocation other than malloc i.e libjemalloc LD_PRELOAD=/usr/lib/libjemalloc.so.1 python <*.py>
- Use Valgrind for memory leak Detection.
- To see how objects are referenced in memory, use objgraph objgraph.show_most_common_types()
objgraph.show_backrefs(random.choice(objgraph.by_type()),
Comments