So, back in 2009, we decided to try out the whole ZFS route for our storage needs (we have around 500 VMs on a mix of VMWare, Virtuozzo and now OnApp platforms). After this year, quite a bit of experience has accumulated, and I’ll try to get some of it written down over a few posts in the weeks to come.
Let’s start at the end. What we have running now is:
Primary systems:
Controller nodes
Supermicro 2U (don’t have the model number here)
Dual E5645 2.4GHz SixCore CPUs
48GB RAM
2 LSI SAS 9200-8e (dual port 6G SAS HBAs)
Intel 310 series SSDs for caching
Intel quad 1G networking (will be going for 10G “real soon now”)
Running Nexenta 3.05 (at present), most likely going to OI or Solaris soon
Storage nodes (1 per controller at present)
Supermicro SC847E26-RJBOD1
42 Seagate Constellation ES 1TB SAS drives, set up in mirrored vdevs (20 mirrored pairs, 2 spares)
Intel 311 series SSDs for logs
Secondary systems:
Supermicro SC848A
Dual E5645 2.4GHz SixCore CPUs
48GB RAM
3 LSI 6G controllers (no SAS expander on backplane, so we’re using fanout cables instead)
22 Seagate Constellation ES 3TB drives (10 mirrored pairs, 2 spares)
Space left over for two cache SSDs from the main nodes if needed
After toying around with a lot of HA, we’ve actually decided to go for standalone boxes instead. We can schedule maintenance windows with our customers if needed, and we actually had far more problems with the HA software than we’ve ever had hardware issues. We therefore decided to go the KISS route instead and simply bet on more than one horse and good replication and sparepart policies. We will be going for SC848A chassis for the next primary nodes as well (to keep it nice and simple).
The main nodes each export a number of iSCSI volumes (and a few NFS shares) for the client systems, and do snapshots each hour. The snaps are then exported to the secondary systems, whose roles are to act as DR nodes. In the event of a catastrophic failure on the primary nodes, we can replicate the scsi setup on the secondary and get back up and running quickly. The secondary systems are then hosted in a secondary datacenter, which we can use for DR purposes.
We do snapshots on hourly (72), daily (14), weekly (12) and monthly (12) schedules. I’ll post some scripts later.
We’ve actually got a tertiary system as well, in the other end of the country, running a SC848 chassis with RAIDZ2’ed disks.