The migration of the wik.is cluster to Amazon EC2 (using RightScale) has vastly improved the architecture of wik.is. Here are some of the technical challenges we faced and how we addressed them using the RightScale/EC2 platform.
Here is a diagram of our overall infrastructure (full-size)
In order to support multiple Apache and Deki API servers, we decided to use HAProxy. HAProxy is a high performance software load balancer. We configured HAProxy to do round-robin load balancing with checks to ensure the backend apache servers are alive and well.
One problem we ran into with multiple frontends for Deki was that Deki uses PHP Sessions. We don't store much in PHP sessions but one common use case within Deki is an HTTP POST, store success or failure message in session, and send 302 redirect. This flow keeps the back button from breaking, but it is a problem when using multiple servers.
Our first (and probably the simplest) way to handle this was to pin sessions to servers. HAProxy supported this scenario by injecting a cookie into each request to the backend apache server. This configuration worked well but wasn't ideal since it doesn't provide the best distribution of requests across servers.
Two other options that we investigated but decided against were storing PHP sessions in a MySQL database or an NFS filesystem. Both would work but we didn't want the overhead of using MySQL or the reliability and speed concerns of using NFS.
The approach we settled on was using memcached to store session information. More details on memcached sessions below. Using memcached to store sessions allows us to use HAProxy without pinning sessions.
HAProxy sends requests to the backend apache servers on a regular basis to check their health. If the backend server is down it is removed from the cluster until it becomes available again. The "check url" is configurable and we decided to point it at: /@api/host/test. This is a nice url because it checks both apache and the deki API (since /@api is actually a proxy to http://localhost:8081). So we kill two birds with one stone here. If either apache or the dekiwiki mono process dies, haproxy does the same thing. The frequency of the health checks can also be specified. Since requesting /@api/host/test is a little more expensive than requesting a static file on the apache server, we changed the check interval to 10 seconds (instead of the default of 2 seconds). This means we're not bombarding the deki API with requests, but we may not detect a failure as quickly either.
Here is a sample of our haproxy.cfg
global
log 127.0.0.1 local2 info
maxconn 4096
ulimit-n 8250
chroot /home/haproxy
user haproxy
group haproxy
daemon
quiet
pidfile /home/haproxy/haproxy.pid
defaults
log global
mode http
option httplog
option dontlognull
retries 3
redispatch
maxconn 2000
contimeout 5000
clitimeout 60000
srvtimeout 60000
# Configuration for one application:
# Example: listen myapp 0.0.0.0:80
listen www 0.0.0.0:80
mode http
balance roundrobin
option forwardfor
# Haproxy status page
stats uri /haproxy-status
stats auth username:password
# When internal servers support a status page
option httpchk GET /@api/host/test
server webA 10.250.1.1:8080 check inter 10000
When another apache backend is added to the cluster, we simple add another server line to the file and reload the haproxy configuration.
Two other things to notice in this conifguraiton:
The apache configuration is the same as a normal Deki apache config except that we listen on port 8080 and use a custom log (to log the X-Forwarded-For header)
LogFormat "%{HOST}i %{X-Forwarded-For}i %l %u %t \"%r\" %>s %b" vhost
CustomLog /var/log/httpd/wik.is-access.log vhost
Since our Apache/API servers handle all the application logic for Deki, they are mostly CPU bound. This makes the choice of EC2 instance type very important. To test performance numbers, we used apachebench. Here are our results:
ab -n 500 -c 10 http://wiki/
1 small instance ($.10/hr): 4.72 req/s
2 small instances ($.10/hr X 2): 8.44 req/s
1 High-CPU medium instance ($.20/hr): 16.09 req/s
As you can clearly see, the High-CPU medium instance offered nearly double the performance of 2 small instances for the same price. Also, this should scale linearly by adding multiple servers to the load balancer.
Our MySQL database is configured using a master/slave setup. We are using large EC2 instances for master and slave. If for some reason the master dies, we have scripts to promote the slave to master then launch a new slave.
We chose to store the mysql data on an XFS filesystem on top of EBS. Using XFS and EBS, we can easily snapshot our database to create a backup. When we want to take a backup, we run a script which does:
mysql -e "FLUSH TABLES WITH READ LOCK;" # Get the current position (for slave init) mysql -Nse "SHOW MASTER STATUS" > /mnt/mysql/Snapshot_Position # Lock filesystem xfs_freeze -f /mnt ec2-create-snapshot $VOLUME xfs_freeze -u /mnt mysql -e "UNLOCK TABLES;"
When we need to initialize a slave server, we simple create a new volume from the latest snapshot and tell the slave to start replication at the position where the snapshot was taken. This greatly simplifies the time required to initialize a slave.
One important thing about running on top of an EBS volume is that you pay for the number of I/O transactions ($.10/million transactions). Therefore, it's very important to make sure all your queries are hitting indexes and not doing full table scans. Since Deki has such a large number of SQL queries, we weren't able to test each one individually. We did however, do a pretty thorough test by setting our long_query_time to 1 second and running our nunit tests against a test wiki.
Configuring Deki in a clustered environment required some tweaks.
When configuring Deki for a multi-tenant installation, there are two options.
Option #1: using Deki's LocalInstanceManager
The LocalInstanceManager gets the configuration information for each wiki instance from the mindtouch.deki.startup.xml file. For example:
<wikis> <config id="wiki1"> <host>wiki1.example.org</host> <db-server>db1</db-server> <db-catalog>wiki1</db-catalog> <db-user>wikiuser</db-user> <db-password hidden="true">mypassword</db-password> </config> </wikis>
The LocalInstanceManager works great for a small number of wikis. However, it requires a restart of Deki each time a new wiki is added.
Option #2: using Deki's RemoteInstanceManager
The RemoteInstanceManager gets the config info for each wiki by sending an HTTP GET to a web service. This service is responsible for sending back an XML document very similar to the XML above. Note, the directory service is not an open source component at this time, but is fairly simple to implement using PHP or writing your own Dream REST service.
For example, when a request for http://wiki1.example.org comes into Deki, it looks at the Host header in the request and does a GET to the remote service like so:
GET http://directory.example.org/wikis/=wiki1.example.org
This would return xml like this:
<site id="1" href="http://directory.example.org/wikis/1">
<config id="1">
<host>wiki1.example.org</host>
<db-catalog hidden="true">wiki1</db-catalog>
<db-server hidden="true">db1</db-server>
<db-user hidden="true">wikiuser</db-user>
<db-password hidden="true">mypassword</db-password>
<db-options hidden="true">pooling=true; Connection Timeout=5; Protocol=socket; Min Pool Size=10; Max Pool Size=50; Connection Reset=false;character set=utf8;ProcedureCacheSize=0;Use Procedure Bodies=true;Connection Lifetime=20</db-options>
</config>
</site>
Finally, in order for Deki to use the RemoteInstanceManager, we need to point to the service in our mindtouch.deki.startup.xml like so:
<wikis src="http://directory.example.org/wikis?apikey=somekey"> <!-- note, our service requires an apikey since it contains sensitive information -->
Our directory service also has a feature which allows us to send an HTTP POST to create a new wiki. This feature simply writes a record to a database and invokes the createdb.sh shell script (in Deki's /maintenance directory) to create the wiki database and stored procedures.
As I mentioned earlier, Deki uses PHP sessions to pass some information between requests. Memcached was a good fit for the job. Installing memcached is pretty straight forward so I won't cover that here.
We added support for memcached sessions in Deki in the 8.08 release. Deki requires that you have the memcache PECL module installed (with session support enabled).
Once installed, telling Deki to store sessions in memcache is simple. Just add the following to your LocalSettings.php:
$wgMemCachedServers = array('tcp://memcache1:11211', 'tcp://memcache2:11211');
You can also add more advanced options by using a connection string like:
$wgMemCachedServers = array('tcp://memcache1:11211?persistent=1&weight=2&timeout=2&retry_interval=10');
To verify that sessions are being saved, you may want to run memcached with the -vv flag, then remove the -vv flag once you're verified everything is ok. It's very important that you verify that memcached sessions are working. If it's not working you won't see a failure or an error message, but requests will be very slow since PHP is waiting for the memcache server to time out (default 2 seconds). Another way of verifying this is to use a tool like Fiddler2 and checking your X-TTFB (time to first byte) value. If it's greater that 2000 (2 seconds), you should double-check your memcache setup.
Since the lucene index is stored on the filesystem, it must be configured to run on a single "master" server. See this page for more info on how to configure the Lucene service
How do I...Configure Lucene in a Clustered Environment?
We also store the Lucene index on an EBS filesystem. We can easily rebuild the index, but it's a very expensive and time consuming operation. Taking a daily snapshot of our EBS volume saves us a lot of rebuilding time in case the server crashes. The backup and restore process is fully scripted so if the server crashes, we simply launch a new server in it's place. The latest snapshot is then used to create a new EBS volume.
One of the great features of MindTouch Deki is the ability to embed content into your wiki pages using extensions. Because these extensions are simple REST services, they can be run anywhere in the cloud. In a standard Deki installation, these services would run locally. However, in a multi-tenant configuration where the same services are used on all sites, it's more efficient and maintainable to run the services as remote services.
For wik.is, we have a dedicated server for hosting the extension services. For example, in our data.sql file (which createdb.sh uses to create a new wiki database), we do something like this:
INSERT INTO `services` (`service_type`, `service_sid`, `service_uri`, `service_description`, `service_enabled`, `service_local`) VALUES
('ext', NULL, 'http://extensions.wik.is/@api/dekiext/feed', 'Atom/RSS Feeds', 1, 0);
This tells the wiki to find the feed service at: http://extensions.wik.is/@api/dekiext/feed.
Since extension services usually don't require a lot of horsepower, we chose to use this server to host our Lucene index as well. Here is an example of our mindtouch.deki.startup.xml file for this server
<script>
<!-- register Deki services -->
<action verb="POST" path="/host/load?name=mindtouch.deki.services" />
<action verb="POST" path="/host/load?name=mindtouch.indexservice" />
<!-- start extension services -->
<action verb="POST" path="/host/services">
<config>
<path>dekiext/digg</path>
<sid>http://services.mindtouch.com/deki/draft/2007/12/dekiscript</sid>
<manifest>http://scripts.mindtouch.com/digg.xml</manifest>
</config>
</action>
<action verb="POST" path="/host/services">
<config>
<path>dekiext/feed</path>
<sid>http://services.mindtouch.com/deki/draft/2007/06/feed</sid>
</config>
</action>
<action verb="POST" path="/host/services">
<config>
<path>deki/luceneindex</path>
<sid>sid://mindtouch.com/2007/06/luceneindex</sid>
<path.store>/var/www/dekiwiki/bin/cache/luceneindex/$1</path.store>
<apikey>lucene_service_api_key</apikey>
</config>
</action>
</script>
One of the great features of RightScale is the ability to define "elastic server arrays". By using a Server Array, we can automatically launch and decommission new servers based on some kind of alert (high load for example). We use this on our frontend servers to add another apache/API instance to the load balancer when load spikes. The new instance, once booted, registers itself with our HAProxy load balancer and starts handling requests. When load drops below a specified threshold, the new instance will de-register itself from HAProxy and shut down.
The scaling process is pretty simple and all based on voting. When a frontend instance load gets over the defined alert threshold for some time, it votes for "growth". If at some point the majority of the voting instances are voting for "growth" then a new frontend is launched to help them out.
Scaling down works exactly the same way, the instances voting for "shrink" when the load stays under the alert threshold.
Among the configurable options for you arrays, you can define :
- "Default min count" minimum number of instances launched in the array.
- "Default max count" maximum number of instances launched in the array.
- "Resize by" defines how many instances are launched or shut down on a single scaling action.
- "Resize calm time" defines how long you want to wait between 2 scaling (Setting this value too low will result in an inefficient array and too high will cost you more than needed).
You can get more details about this amazing feature on the RightScale wiki.
If you're using EC2, you might have experienced issues sending emails from your instances, the Amazon EC2 IP range being denied most of the time. If you're using Elastic (static) IPs, you can remove your address from the public lists and declare it as a valid mail server. On our EC2 wik.is infrastructure however, not all of the instances have static addresses.
A workaround is to use a SMTP relay. There are several providers for this service (like AuthSMTP) and that's a really easy task using Postfix. Here is a sample "/etc/postfix/main.cf" for local delivery and SMTP relaying with authentication/TLS :
myhostname = server.domain.com
mydomain = domain.com
myorigin = $mydomain
smtpd_banner = $myhostname ESMTP $mail_name
biff = no
append_dot_mydomain = no
alias_maps = hash:/etc/aliases
alias_database = hash:/etc/aliases
mydestination = localdomain, localhost, localhost.localdomain, localhost, $myhostname
mynetworks = 127.0.0.0/8
mailbox_size_limit = 0
recipient_delimiter = +
# SECURITY NOTE: Listening on all interfaces. Make sure your firewall is
# configured correctly
inet_interfaces = all
relayhost = smtp-relay.provider.com
smtp_connection_cache_destinations =smtp-relay.provider.com
smtp_use_tls = yes
smtp_sasl_auth_enable = yes
smtp_sasl_password_maps = static:username:password
smtp_sasl_security_options = noanonymous
default_destination_concurrency_limit = 4
soft_bounce = yes
This way your outgoing emails will be relayed by smtp-relay.provider.com, which hopefully is not blacklisted :)