DZUG Conference: Sebastian Wehrmann about Efficient Ad-Hoc-Requests to the ZODB

Sebastian Wehrmann of gocept was talking about how to do efficient ad-hoc queries to the ZODB which is his master thesis.

His first idea about the ZODB was: Hey, that’s easy, I derive from Persistant and create an object graph and that’s it.

But: How do I get those objects out again? How can I search for them? Usually you can only access them via the object graph and traversal.

He wrote his master theses about this and out came a product which implements a query and indexing engine for the ZODB. It has a query language, is standalone (means independant of Zope, you can query any ZODB you have) and is a python egg and thus buildout compatible.

What can it do?

  • It can automatically index a ZODB (but this is the initial indexing. Not working right now is the automatic update of this index. This is hopefully finished in October)
  • It can search for objects, like for states (class and such) or attributes.
  • The query language is XPath like

How does it work?

There is a query processor which creates a query tree and an ObjectCollection component performs the query. It builds upon a QuerySupport (has Join-Algorithms etc.) and IndexSupport (this has all the indexes, e.g. about class names etc.).

Demo

He created a demo database with three different types. It consists of classes like Library, Book and Person. As you might imagine, a Library contains Books and Books can be lent to Persons. They are all derived from persistent.Persistent

He then added some example data as you would imagine.

Now you can do queries like

parser.parse('/foo/bar')

which returns a query path. You can also search for attributes:

parser.parse('/foo[@title="foo"]/bar')

Queries can be used like this:

query('/Library')

which returns a list of Python objects.

We can also search for a location (which is an attribute of Library):

query('/Library[@location="Halle"]')

which only returns one object.

Or to get a list of all books in that library:

query('/Library[@location="Halle"]/Book')

To search for the Plone book we do

query('/Library[@location="Halle"]/Book[@title="Plone-Benutzerhandbuch"]')[0].title

This will only find those Library objects which are in the ZODB root. To find e.g. all Books we can do:

query('/_*/Book')

where _* is the wildcard operator.

Other examples:

query('/_*/Book[@available>0]')
query('/_*/Book[@available=0]')

You can also join queries. Let’s find all Libraries and all Persons:

query('(/Library[@location="Berlin"])|(/_*/Person)')<

(All libraries in Berlin and all Persons)

This will be released mid/end of October 2008 on PyPI and it can be checked out already at http://svn.gocept.com/repos/gocept/gocept.objectquery

PyPI URL: http://pypi.python.org/pypi/gocept.objectquery

Master thesis (german): http://archiv.tu-chemnitz.de/pub/2008/0081/index.html

Questions

Q: How fast is it?
A: Hard to say but it’s all done via the index and no objects need to be waken up.

Q: Right now the query returns list and no iterator, isn’t this a problem for large result sets
A: This can be implemented of course.

Q: How is the update mechanism going to work?
A: It is planned to hook it up to the transaction mechanism

Teile diesen Beitrag