I ran into an interesting performance issue with my Google App Engine application, which I'm currently developing feverishly. It seems that if you use indexed array style access to an unfetched Query object, performance is atrocious. I'm fairly sure that it's doing a fetch for each access. It will also hit the 'DB' if you call count() on the Query object. Far superior is to explicitly fetch() from the Query to get a list, then use that. It's all fairly obvious really, but it took me a while to realise what was going on, perhaps because you can innocuously use the Query object as an iterator without this problem, which lulls you into a false sense of security.

So to give a contrived (and untested) example, this I would expect to be terrible performance wise:

special_days = SpecialDay.all().filter('date >=', start_date).order('date')
day = date.today()
end_date = day + timedelta(21)
sd_index = 0
while day <= end_date:
  # This count() call and the indexed access to special_days each cause a DB hit!
  if special_days.count() > sd_index and day == special_days[sd_index]:
    print 'special day'
    special_days_index += 1
  else:
    print 'ordinary day'
  day += timedelta(1)

It's easily fixed by simply adding fetch(1000) on the end of the first line, and using len(special_days) instead of special_days.count(). Also, having moved to working with a list, you can remove sd_index entirely and pop() items off the front of special_days until it's empty instead. It's a pain that there isn't a less skanky way to fetch 'all of them' without using that nasty 1000. I suppose at the very least I should create a MAX_FETCH_LIMIT constant for it in my own code, so I can centrally modify it when Google modify their max limit.

Why can't I just use an iterator anyway you may ask? Because my loop, which is similar in structure to the example above, is iterating over days between two dates and each time round the loop picking the item off the front of my list of data entities (which were fetched in date order) if it matches the day. It's all working very nicely now thanks, and I've added caching too, to save hitting the DB at all in many cases.

Leave a Reply

Your email address will not be published. Required fields are marked *