Christian Theune is co-founder of gocept, one of the biggest Zope consulting companies in the german speaking area.
Today he talks about ZEORaid, a reliable storage layer for the ZODB.
The ZODB is a native object database for Python object (no Zope is needed to use it). ZODB makes it easy to make your python objects persistent without worrying much about details.
The problem we have today though is the separation of the logical and physical layer. The issue with most database system is though that they also don’t do that. With ZODB this is done though and storages can be easily be plugged in (filestorage, network based storage ZEO, demostorage etc.).
ZEO is just a proxy storage which connects to a remote filestorage. You can have multiple clients in front of one ZEO server. The advantage is that if one client fails you don’t have a problem if you do load balancing properly. But the downside is that the ZEO server is the single point of failure.
It’s also not scalable because at a certain amount of clients the ZEO server slows down.
There are 2 options to solve this problem right now:
- ZRS from Zope Corporation (it exists, is functional). But it’s proprietary and has a limiting license and is quite expensive
- DRBD (Distributed Replicated Block Device) is on the system level but only works for filestorage and has a downtime if you have to switch to a slave.
The idea of ZEORaid
1,5 years ago Christian had the idea to create a RAID like storage for ZEO. The basic idea is that you have multiple ZEO servers and connect them together as a RAID-like structure. This is done by putting one ZEO like server in front of them which distributes accesses to the backend servers. For the application nothing changes, it looks like a normal ZEO server.
Design Goals have been
- use existing phyisical storages
- be API compatible
- system should not be down for switchover
- easy installation (ZRS seems complicated we hear, DRBD as well)
- compatibility to older applications
- don’t introduce a new point of failure
This is a normal storage API. It has a three-step installation:
He then showed a little demo of how you use ZEORaid:
Configuration via buildout. Here is an example of the ZEO configuration. 8100 and 8101 are normal ZEO servers:
<zeo> address 127.0.0.1:8200 %import gocept.zeoraid <raidstorage main> <zeoclient 1> server localhost:8100 storage 1 </zeoclient> <zeoclient 2> server localhost:8101 storage 1 </zeoclient> </raidstorage> </zeo>
He then had a simple demo application running which takes objects out of the ZODB. Before running that script he started the 2 ZEO servers and the ZEORaid server.
The ZEORaid server also has a utility with which you can control the server and monitor it. Commands are „status“ for a general status, „details“ for getting more information (e.g. why it’s degraded). You then can do „recover 1“ to sync the raid again. Then the status is ok again.
If we now stop the first ZEO server the script still runs and the „details“ commands says that server 1 is degraded. You now can even put up an empty database as server1 and it will copy all data over to this one. Doing „recover“ will sync the raid again.
Theoretically you should be able to do the same with server 2 but in the demo it failed (of course ;-) ). Christian said that the reason for this might be the recent switch to ZODB trunk from their custom branch.
- The whole thing is beta. All functionality is there but since a minute ago there have been no critical problems reported. They also have it in production use on some sites.
- It’s available on svn.zope.org under a ZPL license
Really new (not stable):
- SPOF for ZEORaid itself is solved (you can run multiple ZEORaid servers)
- Parallel write accesses on backends (GSOC project)
- Random distribution of read access (GSOC project)
- Online reconfiguration of the backends (GSOC project)
- Guaranteed recovery rates (theoretically some application can write faster to the ZEO Raid as the recovery mechanism can sync the RAID. This can be solved by slowing down the write rate and e.g. saying for one item written recover 2)
- Remote/asynchronous/slow backends. Don’t wait for backends anymore. This means that you can send backups also to some secondary server at the other end of the world.
Contributors and Sponsors
Sponsors: Enfold System (USA), CIDC (USA), Headnet (DK), Proteon Vital Internet Services (NL), Chris Withers
Q: When can a stable release be expected?
A: Trunk is the problem, tests indicate that as well. They need feedback from the outside, buildout is available for download. All from Zope2.8 upwards should work.
Q: What are the limiting factors? RAM bound, CPU bound?
A: Probably network latency but he hasn’t measured it
Q: How does the packing work?
A: You pack the frontend and this will do the packing of the backends. But: Never pack and recover at the same time!