Elizabeth Leddy is talking about scalability issues with Plone and how to approach scalability: Tools, Tips and techniques for making Plone scale.
This talk is not about just installing something like CacheFu, it’s more about specialized situations where you cannot cache, have SSL, are contantly logged in and so on.
„A responsive, scalable Plone setup has little to do with optimizing code“
We also need to know that Plone is not Drupal, so we do not get the same RPS. That means that we need talent and hardware to get Plone to scale. She also says that Plone people are kinda special, compared to PHP people.
Measure!
The first important thing is to measure what really is slow. So setup your tools to do measuring.
Tools you can use:
Whale Watchers
You can also use a whale watcher which is triggered in the abnormality case and reads all statistics it gets.
Simple(r) error reporting
It’s good to have some sort of escalation. Like escalate with email, use Google Analytics for Timeouts, 4xx and 5xx pages.
4 stages of system grief
Denial – Anger – Bargaining – Acceptance.
She recommends sharding your ZEO as much as you can which means splitting the data in smaller databases. This makes moving databases around easier. You can also pack faster, backup faster. It also helps reducing the single point of failure you have with ZEO.
How much hardware
First see what components you need and how they need to be distributed on different boxes and how these boxes need to look like. Also check the tradeoff between RAM, CPU and Disk space.
With Zope and ZEO on different computers you also have network latency. With a shard ZEO it also makes sense to put the ZEO on the same computer as the Zope. But not everybody can do this.
She then explained a munin graph of the memory usage explaining the badness of Swap. You want it shared between cache and apps. With that graph you can also find out how many Zope instances you can use on one box.
How many Zopes should I have?
- 50% avg. util
- 2 zopes/CPU
- api/async instances
(she looked at Mongrel and what they found out).
You add more hardware when in any point in time the average is more than 50% utility. 2 Zopes/CPU is a good way to start. Question: How many threads per Zope instances. Answer: it depends. She uses 1 as the OS is way better at context switching than Zope is. But be careful with asynchronous requests.
What does „asynchronous“ mean?
A lot of requests in Zope take a very long time (like 2 minutes). If you can take out some time out of that, do that. Like if you have to collect data from somewhere else, do it later, pickle it up and do it separately. (Pickle, Drop, Pickup, Process, Callback).
use httpd
Serve static content from a static web server to unload plone, provide faster response time to initial requests, and enable simpler caching strategies. Esp. for non-existing images plone takes faaaar longer than an apache (404 that is).
(you might want to rewrite the access to /portal_skins/…)
CSS and sprites
use CSS sprites! less bandwidth with less response time and onle 1 request.
HAProxy
handles graceful reloads, backend health, distribution algorithms, warmup time, preserve keepalive, web based stats.
HAProxy esp. knows if a request is finished on one instance and then gives it the next request instead of going round-robin.
It also knows about Zope startup time and does not send it requests during that.
Find it here.
ZEO
It’s perfect for OO access but has problems with other data access paradigms. For this use something else.
If python <2.6: socket.settimeout(2)
If you know you have a long running request, set it higher and set it back.
Restarting
Restarting is not a sin. Let Zope eat as much as it wants but restart it regularly. Also pack your ZEOs frequently. Do it for your system, not for you.
Backups
First: Get good disks and your ZEO will fly.
- Use repozo for backups (and look out for timing, don’t do it during busy times).
- Use chunked rsync for backing up regular files (like your Plone install).
- Watch the disk
Summary
- Always setup system and error monitoring first
- choose the right softzware for YOUR hardware
- use as many zopes as possible but no more
- do it like a flickr engineer (async)
- Don’t make zope handle unnecessary requests
- Accept that zeo usually isn’t the only db in a scaled solution
- Never understimate the importance of a proper disk-RAM partnership.