Programming – Page 5

Google App Engine – Query[] and Query.count()

11. February 2009 · Write a comment · Categories: Programming, Web/Tech

I ran into an interesting performance issue with my Google App Engine application, which I'm currently developing feverishly. It seems that if you use indexed array style access to an unfetched Query object, performance is atrocious. I'm fairly sure that it's doing a fetch for each access. It will also hit the 'DB' if you call count() on the Query object. Far superior is to explicitly fetch() from the Query to get a list, then use that. It's all fairly obvious really, but it took me a while to realise what was going on, perhaps because you can innocuously use the Query object as an iterator without this problem, which lulls you into a false sense of security.

So to give a contrived (and untested) example, this I would expect to be terrible performance wise:

special_days = SpecialDay.all().filter('date >=', start_date).order('date')

day = date.today()

end_date = day + timedelta(21)

sd_index = 0

while day <= end_date:

# This count() call and the indexed access to special_days each cause a DB hit!

if special_days.count() > sd_index and day == special_days[sd_index]:

print 'special day'

special_days_index += 1

else:

print 'ordinary day'

day += timedelta(1)

It's easily fixed by simply adding fetch(1000) on the end of the first line, and using len(special_days) instead of special_days.count(). Also, having moved to working with a list, you can remove sd_index entirely and pop() items off the front of special_days until it's empty instead. It's a pain that there isn't a less skanky way to fetch 'all of them' without using that nasty 1000. I suppose at the very least I should create a MAX_FETCH_LIMIT constant for it in my own code, so I can centrally modify it when Google modify their max limit.

Why can't I just use an iterator anyway you may ask? Because my loop, which is similar in structure to the example above, is iterating over days between two dates and each time round the loop picking the item off the front of my list of data entities (which were fetched in date order) if it matches the day. It's all working very nicely now thanks, and I've added caching too, to save hitting the DB at all in many cases.

Google App Engine: Data Duplicity

05. February 2009 · 4 comments · Categories: Programming, Web/Tech

I've stumbled across a real irritation with the way the Google App Engine data model works. One among many frankly, but I'll restrict myself to just this one for now.

The issue stems from the fact that db.run_in_transaction(func) insists that func is a function with no side effects, since it may be run repeatedly in an attempt to get the transaction to go through (if optimistic locking fails). Fair enough, but that means it has to freshly fetch any model objects that it wants to modify, otherwise it would have side effects to objects outside its scope. But consider this situation, in which we have an increment() function on our model object, that must use a transaction because it also modifies other related objects at the same time and require atomic behaviour:

class Person(db.Model):

count = db.IntegerProperty(default=0, required=True)

def increment(self):

def tx():

# Mess with some other related objects in data store.

# Must fetch a separate copy of self to avoid side effects.

person = db.get(self.key())

person.count += 1

person.put()

db.run_in_transaction(tx)

The problem here is that self hasn't actually been modified at all and is now out of date with respect to the data store (where the count is one bigger, assuming the transaction succeeded). This is a pain for the caller who had a Person object and called increment() on it and naturally expects their object's count to be one higher. But their object hasn't been modified at all – though the data store has, via the freshly fetched person. In case it's not obvious, we can't simply change the code above to use self instead of getting the new person object, since db.run_in_transaction(tx) may run our tx() function multiple times until it completes without an optimistic locking failure. If it did have to run multiple times, self's count would increment by one for each failed attempt, so the final successful attempt could end up with more than one added to the count. Or if the transaction eventually failed outright, self's count would still have been modified even though the data store had not been touched.

So the only solutions I can see are:

Put code after the run_in_transaction() call, that synchronises self with the data store. There isn't a sync() or refresh() method on Model objects, so you have to do this painstakingly by getting another fresh person with db.get(self.key()) and then copying across just the fields you know might have changed.
Insist that the caller is aware that certain methods on the model objects won't modify the object itself so they need to get a fresh one. This completely wrecks the idea of an object model and encapsulation though. You'd might as well just have a purely functional interface to the data store.

It all seems like madness to me, that defeats the point of trying to have a neat, simple data storage object model. As usual, I can only hope that I've missed some crucial point and that in fact the problem is easily and elegantly solved. I shall look out for that solution, unless some kind reader can enlighten me!

Python Newbie Mistake

26. January 2009 · Write a comment · Categories: Programming, Web/Tech

I've been banging away at my Google App Engine application. It'll be a little while yet, but in the meantime, a particular observation about Python. I keep getting tripped up when I refer to a member function without the parenthesis. For instance:

>>> list = [1,2,3,4,5]

>>> print list.count

<built-in method count of list object at 0x238fa8>

This doesn't print '5' because list.count is a reference to the count method itself, which (like everything else in Python) is an object, so gets printed. You need to use list.count() with those important parentheses in order to actually get the answer 5 that you wanted. I keep getting tripped up by this and taking a while to debug the problem every time. Some things like __class__ don't need parentheses, which I think is part of the confusion. Furthermore, when you use dir() to look up the public features of a class, it doesn't do anything to show you which ones are functions:

>>> dir([])

['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__delslice__', '__doc__', '__eq__', '__ge__', '__getattribute__', '__getitem__', '__getslice__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__setslice__', '__str__', 'append', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']

I rather suspect I've missed a subtle but important bit of Python understanding, having only scratched the surface so far. If someone can point that out to me I'd be very grateful.

Python Thoughts

15. January 2009 · Write a comment · Categories: Programming, Ruby, Web/Tech

I've spent a bit more time learning Python, because I've been dabbling with Google App Engine (which I'm going off rapidly) but mainly because it's an interesting language. What's particularly intriguing to me is its similarity to Ruby, and hence where it differs in syntax or approach makes for a notable point of comparison. Of course it's probably more correct to say that Ruby is similar to Python than the other way around. From my still very small exposure to Python…

Nice things about Python:

Less confusion with the way classes work. Or maybe I just haven't stumbled into Python's equivalent of meta-classes and class vs instance variables.
@classmethod is a much neater way of stating what's a class method rather than an instance method, compared to all that self gubbins, or the dreaded <<.
Optional named arguments for functions, allowing more flexibility for optional arguments and greater clarity when calling. [As an aside I like Objective-C's way of building argument labels into the method signature, but not it's square brackety syntax: [obj message:foo]. I'd much rather do obj.message(foo) and in fact with properties in Objective-C 2.0 we see more of this style.]
I think I probably prefer explicit return statements rather than the Ruby way of returning the last evaluated thing in any expression.
List comprehensions. To start with it just seems like a syntactic difference – Python: [x*2 for x in my_list] Ruby: my_list.map {|x| x*2}. But Python's party trick is excluding elements as it goes: [x*2 for x in my_list if x != 3].

Nasty things about Python:

Seriously – indentation to demarcate blocks? Apparently I'll get used to it.
Double underscores. __init__ is a complete pain to type. Why not just a single underscore at the start or something?
Fiddly module system. Why the need for an __init__.py file in a directory, just to make it a module? Why the need to explicitly define an __all__ method just to be able to import everything in a module? I want to be able to create a "model/" directory, put all the .py files for my DB classes in there, then import the whole lot from my other classes with ease. When I add a new model class, I shouldn't have to go and modify the __init__.py file. It seems to be a real pain to split code up into multiple files sensibly in Python. If I've missed a trick here – please somebody show me the light!
The string interpolation is OK I suppose, giving the full power of C style formatting, but most of the time you're just doing simple interpolation and Ruby's syntax is far more pleasant and readable. Python: "Hello %s, from %s." % (person, greeter) Ruby: "Hello {person}, from {greeter}." Ruby also has a full on formatting system for the few times when you need it.

Google App Engine: First Thoughts

10. January 2009 · Write a comment · Categories: Programming, Web/Tech

Having ploughed through much of the documentation, done the tutorial and started writing my own little web app, I have some half-formed thoughts about Google App Engine to throw out to the world.

As far as I can tell, any sort of data aggregation functionality (counting, averaging etc.) just won't be possible as the Datastore APIs don't allow for it. I've tried to think of ways to fake it but even my most elaborate machinations come up against the buffers. The only way to manage it at all is to do counting and averaging piecemeal, manually keeping the aggregate values you need up to date with each individual entity modification. Unfortunately, that means that you can't introduce new functionality requiring new aggregate values after you've already got a million users, because you've missed the chance to record those aggregates along the way.
Python's OK, but I don't like using indentation as the sole way to define blocks. But I'm sure I can get used to that.
I really don't like having to put an empty __init__.py file in any subdirectories of my python code. If I don't do that it seems I can't import foo.bar.Thingy. Breaking up code into multiple files in a sensible directory structure is surely a fairly common thing to do, so I'm amazed that Python makes it strangely difficult. I hope I've simply missed something and it's actually easier than that.
In fact all those double underscores look horrid and are a pain to type. Surely a single underscore would have been quite adequate?
The overall experience for learning GAE is very sorted. Smooth and well integrated – all you need to supply is your own decent text editor. I'm trying out TextMate, the darling of Mac OS X code editors, but I'm worried to see that it doesn't seem to have been updated for over a year.

Google App Engine Experiments

07. January 2009 · Write a comment · Categories: Me, Programming, Ruby, Web/Tech, Weblogs

It seems to be the done thing these days to learn how to use Google App Engine (and thus Python) within a couple of hours and then hack out a simple web application to prove how easy it is.

So with the missus out of town tonight, I'm staying out of trouble by doing just that. I've run through the getting started tutorial, which is a delight I have to say, especially because it's so darned simple to get going on Mac OS X. Python is already installed and there's a neat app to download from Google which installs the app engine SDK bits and puts a nice GUI front end on it for you to fire up the local test environment and various other handy things, as well as installing the command line tools if you prefer them. Start up TextMate or any other editor of your choice and you're away!

You'll have to take my word for it though, because I'm tired and won't get onto writing my own app tonight.

Footnote: Python seems OK, but so far I prefer Ruby, though they seem to share a lot in common.

Installing Ruby 1.9 on Mac OS X

15. September 2008 · 3 comments · Categories: Mac, Programming, Ruby, Web/Tech

I’ve been happily beavering away with the stock install of 1.8.6 on Mac OS X 10.5, but it seemed like everyone was moving on to Ruby 1.9 so I thought I’d make the leap. Was it a good idea? More on that later, but first here’s how I successfully installed it on my MacBook, based on instructions from http://hivelogic.com/articles/2007/02/ruby-rails-mongrel-mysql-osx. That page has a good detailed commentary of what’s going on, but didn’t actually work correctly for me, perhaps because of recent changes to the code in question. My version is short and sweet too, for those that either don’t care for the whys and wherefores, or find the command lines self-documenting enough.

Ruby 1.9 Install Instructions

Note: following these instructions will install new ruby and irb binaries in /usr/local/bin. The old system versions will still exist in /usr/bin/, but the new ones will take precedence by virtue of the PATH setup noted below.

Ensure the following is at end of your ~/.bash_login:
export PATH=”/usr/local/bin:/usr/local/sbin:/usr/local/mysql/bin:$PATH”
Reload that modified file so your environment picks it up (or just open a new terminal):
> source ~/.bash_login
Download and install libreadline.dylib as follows:
> curl -O ftp://ftp.gnu.org/gnu/readline/readline-5.1.tar.gz
> tar xzvf readline-5.1.tar.gz
> cd readline-5.1
> ./configure –prefix=/usr/local
> make CPPFLAGS=-DNEED_EXTERN_PC SHOBJ_LDFLAGS=-dynamiclib
> sudo make install
Download and install latest Ruby 1.9 (or instead of living on the bleeding edge, download a tar of a stable source tree from http://www.ruby-lang.org/en/downloads/ in place of the first step below):
> svn co http://svn.ruby-lang.org/repos/ruby/trunk ruby_trunk
> cd ruby_trunk
> autoconf
> ./configure –prefix=/usr/local –enable-pthread –with-readline-dir=/usr/local –enable-shared
> make
> sudo make install

That’s it! Now assuming your PATH is setup as above, ruby –version should report 1.9. If not, check that ‘which ruby’ reports /usr/local/bin/ruby.

So What Next?

What next indeed! I thought I’d try and run a simple ruby app of mine to see if it worked. It didn’t, but that’s because the new ruby install keeps its gems in a different location and its cupboard was bare. I set out to reinstall the gems I needed, but I ran into some problems. Some gems just won’t install, because they have C extensions that are incompatible with Ruby 1.9 so they fail to compile during gem install. Others installed fine but then failed at runtime due to Ruby compatibility issues. There’s a great page with the major porting tips at http://boga.wordpress.com/2008/04/15/ruby-19-porting-notes/ which helped me fix most of the Ruby issues. Here’s a breakdown of my struggles:

Ramaze worked great as it’s been 1.9 compatible for ages.
The mysql gem only installed successfully by downloading the 2.8pre4 code from http://tmtm.org/downloads/mysql/ruby/ and following the install instructions.
The tens of gems that constitute DataMapper (my current ORM of choice) struggled. I was able to make simple Ruby changes to fix many bits, but I ran into real issues getting the database adapter gems do_mysql and do_sqlite3 to install as their C extensions are incompatible.
I don’t think my own code ever really got to run, so I have no idea if it’s good or not. I suspect it’s fine or requires minimal Ruby mods.

So I ran out of time and gave up as I’ve never gotten into the C extension side of things before and didn’t fancy starting. Perhaps I’d be able to sort it out if I put the time and effort in.

Overall it’s shown me that Ruby 1.9 is only worth the effort if all the gems you require are already compatible. I kind of already knew that before I embarked on this mission, but hoped that I’d be pleasantly surprised, or be able to do the porting myself with minimal effort. I was a bit disappointed to find things lagging behind as I assumed the Ruby community were eager technologists always at the forefront of each new thing! Maybe people are too busy coding solutions to their real world problems to bother with 1.9 until it’s declared official (late 2008 at last mention). I also get the impression they’ve become jaded with the extremely long gestation period for this release. Maybe I’ll try 1.8.7. One step at a time.

Programming languages: Ruby – why would I bother?

13. September 2008 · Write a comment · Categories: Programming, Web/Tech

Because Ruby is capable, quick to write, quick to get running and a joy to use!

I’ve got a reasonably long and varied history of programming in many languages, and not just mainstream ones, but I’m mostly looking at Java and its ilk as my point of comparison here. So, here’s why Ruby is often a superior choice for me:

At least one .rb file and a Ruby interpreter is all you need. Edit code then run immediately. No compilation into class files then building into a jar/war/ear. But no biggie you say, my IDE hides all that building and I just edit then run with my Java code. Sure, but once your application gets big that build step is still a non-zero hit: for a big Java web app it might take a minute or more to build the eventual ear.
And then there’s the two minutes waiting for your Java web app to deploy as JBoss (or Java EE app server of your choice) grinds into action. Finally, three minutes after you edited the code you’re able to see the results in the browser! I use Ramaze for my Ruby web apps, which starts up in seconds, even for a big app. But more importantly, it reloads modified files at runtime so I don’t have to re-deploy to see the results of my actions. Zero wait between edit and test. The difference this makes when you’re hammering away at a problem is colossal.
Ruby is neat as a language, generally requiring less code to solve any given problem whilst still being readable. For a start, an endless procession of Java getters and setters can be replaced with attr_accessor :first_name, :last_name, :title. And that’s just one trivial example of Ruby goodness.
Ruby is a joy to use. This may be the single most important point here. If you’ve got to use it all day, you’ll be far more productive if you’re happy about it. When you lose track of time because you’re engrossed in what you’re doing, and making speedy progress too, you’re onto a winner.

Ruby’s not perfect (if you think you’ve found the perfect programming language I’ve got a bridge you might want to buy) but it’s a relatively young language and improving quickly right now. For me it’s way up there as the best available compromise for many types of projects. If nothing else, it’s been the single most interesting and enjoyable language to learn, having dealt with many others before. If you’re interested in learning Ruby, take a look at my review of The Ruby Programming Language – a very decent book to explain it all to you.

Book Review: The Ruby Programming Language

13. September 2008 · Write a comment · Categories: Books, Programming, Web/Tech

This definitive tome came out at just the right time for me, as I was looking to buy a Ruby reference, and this was bang up to date for 1.8.6 and 1.9 and co-written by Matz himself, along with David Flanagan. Yukihiro ‘Matz’ Matsumoto is the main man behind Ruby, so he ought to know what’s what.

I was initially surprised to find that the book didn’t conform to the ‘..in a Nutshell’ style of programming language reference book with which I’m most familiar. It doesn’t comprise endless chapters of API reference laid out in a stiffly templated fashion – it is much more prosaic, using the vast majority of its pages to explain Ruby’s syntax and workings in a narrative manner with lashings of ad hoc examples. On reflection this makes sense as Ruby is a multi-faceted beast with much intrigue lying in its extensive syntax and dynamic nature. Hence a lot of attention is paid to the manner in which it is loaded and interpreted at run time, a full understanding of which is vital to being a great Ruby programmer.

There are some chapters at the end which deal with some of the basic API stuff (String, Array, Hash etc.) but these take the form of commentary and neat examples to demonstrate each method available in a compact but complete manner.

Overall I found it to be well written and a joy to read. I didn’t skip the bits I thought I knew and came away much the wiser for it. I’ve pretty much read it cover to cover, as well as using it to look things up when I wasn’t sure – and I’d recommend anyone else do the same.

The only slight let-down is the much hyped chapter head drawings by why the lucky stiff, the unusually named Ruby guru, celebrity and wacky artist. I can’t help feel that they were included so as to be able to put his name on the cover. I also suspect that the publishers have let why down rather badly with their handling of his artwork – reproducing his pencil drawings extremely unsympathetically. If that’s all I can find to complain about though, it must be a pretty decent book, and it certainly is. Buy one today!

JustTheSam