Django ORM : What to do when the memory requirement goes high?

As the title suggest this post is regarding “Memory Requirement of django ORM”.
Actually some days back I myself faced this problem and wrote “Why Django ORM Sucks : It takes a hell lot of memory in processing“. From various comments and a thread which started on google groups, I came up with few points one should check while operating on large amount of data.

For showing points i’ll use following example:

for r in UserRating.objects.all():
	ratingDict = cache.get(r.movie.id)
	if ratingDict is None:
		cache.set(r.movie.id, [(r.user.id, r.rating)], 86400)
	else:
		ratingDict.append((r.user.id, r.rating))
		cache.set(r.movie.id, ratingDict, 86400)

So here are the mistakes in previous code. Make sure such mistakes are not with you code.

Reduce DB Interaction

Use of r.movie.id, r.user.id : As you can see that here “.id” is used just to get id of movie and user object. In place of this r.user_id and r.movie_id should be used.

Always use paginator

While handling large amount of data you should always use paginator. This will keep memory size limited.
Here is the sample code for using paginator.

p = Paginator(UserRating.objects.all(), 50000)
for i in p.page_range:
	page = p.page(i)
	for r in page.object_list:
		...

So here the memory requirement will be limited to 50000 rows.

Use Iterator()

If your dataset is not really very huge, and you don’t want to use paginator, use Iterator().
QuerySet.__iter__() : this is the default method of accessing query set.
If we do UserRating.objects.all() it means UserRating.objects.all().__iter__()
QuerySet.Iterator() :
we should do UserRating.objects.all().iterator().
Both of these iteration method do chunked reads from the DB. Here __iter__ also caches results , so if you reiterate you don’t do a second db query, whereas iterator() doesn’t cache them.

As iterator() doesn’t caches them the memory consumption is kept low.

Hope these points will help you as well.
Keep Coding … and keep Djangoing

Leave a Reply