NetApp: NetApp Filers

I don’t claim to be an amazing know-it-all professional NetApp consultant. I don’t know the second thing about metro clustering, and I only recently found out about “deswizzling” (not as fun or as cool as it sounds) But I have written a little intro to NetApp filers that I hope most will find insightful, some find laughable and no doubt someone somewhere will find this offensive (If you do, then rather than trying to provide an insight into NetApps I was specifically trying to annoy you, if that makes you feel any better) I welcome insight, input and constructive criticism, as we are all learning and may we never stop.

First off, filers have their own file system:

For typical use, filers create raid groups and then format them with their own file system, wafl. This is the foundation that allows NetApp implent all the funky features a NetApp filer can provide, such as snapshots and flexvols. The upside is the file level management that is native to a filer, making NFS and CIFS childsplay, while on the downside if you provision a LUN on this file system, it isn’t true block level storage and is actually a file pretending to be. (good and bad points there, you don’t get the immediate access to underlying blocks but you can utilise some of NetApps file level features as it is just a big file) NetApps can provision raw blocks, but this eats into the number of disks left to use with aggregates and the wafl file system.

So how is this data laid out on disk?

You know your standard raid set stripes across disks and has some funky algorithims that write recovery information to parity areas of your disks, reducing the space you can have your working data on. How does NetApp do this differently? Quite simply they don’t; why fix what isn’t broke? RAID works just fine as it is, they do however tweak a little for their own purposes; NetApp uses raid 4 DP which completely separates the parity disks from the data disks which should make writes faster as whatever is going to parity can be cached/written separately or whatever trick they use. On top of the raid sets sits the wafl file system which will give us our NetApp goodness and lets the filer choose where it places its data (N.B. like any other file system, filling it up means less places to write data to, takes longer to find free space and slows things down) this lets us create giant aggregates of multiple raid sets to store data. The point to note is that raid DP takes up 2 disks per raid set, not just for your aggregate. One disk in an aggregate? That’ll be three disks. six disks? eight disks total. Not to worry though; I believe that the raidset size has recently been extended to something silly like 20 disks. These aggregate disks can be on any shelf in the filer and there are a number of best practices or recommendations regarding how you spread out the raidsets across disks. As per usual it just depends. It’s a bit like laptops, no one laptop is superior to another unless everyone wants to get the same thing out of them. The mac that you use for video editing may very well be amazing, but your nephew may not think it is the best for playing video games, and gran who only looks at email and is just learning that there’s an internet out there doesn’t need anything nearly as powerful as either, just a nice big high contrast screen, but I digress. Raidsets can be striped across shelves or you can keep them sequential on shelves it all depends on I/O profiles and the data itself and how it’s used. On top of our very large aggregates we can create volumes, which are a do it all container that can hold CIFS, NFS and LUNS all at once (I wouldn’t recommend that) and within our volumes we can create Qtrees. I come from a wintel background and only have a basic understanding of Qtrees, which creates a folder within a volume like a mount point for storing different data types. Qtrees can be targets/containers for things like LUNs, snapmirroring or NDMP backups.

Protocols?

Filers are multiprotocol and you can have NFS, CIFS, iSCSI and Fibre on the same storage controller. Filers are linux under the bonnet (FreeBSD) which helps with the ol’ NFS which works as you would expect. CIFS runs as a service and registers a computer account in the domain for authentication which will show up as a Windows 2000 server, although the only windows 2000 box in your AD that supports SMB2 no doubt. ( Seriously improves performance if your client operating systems support it.) Shares and permissions can be managed from the filer or if you can’t let go of the gui; connecting thewindows computer management mmc to the filer will work as normal for permissions and session/open file info. As LUNs are just files to a filer (on flexvols) it doesn’t care what it throws them out on, fibre and iscsi use initiator groups as you would expect and it looks like you can add fibre and iscsi initaitors to the same LUN (dunno if it’s supported, can’t be done via gui at present) All the protocols perform as you would expect, as long as you expectLUNS to be abstracted and not true block level storage.

Redundancy

A handy feature is that when a filer believes that a disk may fail, rather than waiting for the failure it will proactively clone the suspect disks data and swap it out with a spare before giving it a good interrogation and deciding whether the disk is bad or not, with this feature I’d say degraded raid sets are almost a thing of the past. The controllers in a filer are active/active so you can carve up your shelves and share them between the controllers, spreading the load. Personally I’ve always wondered about the benefits of having an already loaded hot standby controller to then try and assume the identity of both in a failover situation. All the hardware is nearly N+1 and I hear even the IOMs on the disk shelves themselves will be active/active soon. Ethernet can be configured in interface groups for fail over and load balancing

Space utilisation

This one is a biggy for me, I start sweating about asking for money for more shelves. I can see why it happens, but filers eat disks Let me illustrate my viewpoint for arguments sake; let’s say you have a 6 shelf 2 controller filer with 1TB sata disks and you want to spread everything out evenly. 24 disks to a shelf means you have 144TB total right? Nope; let’s think rationally. We are going to lose disks to a number of factors

  1. Root aggregates and volumes for the controllers (they need to put their configs and logs somewhere right?) this will only need one disk, but given raid DP, that’s 6 disks gone bye bye already, acceptable as that means we essentially have double backups of our critical root volumes (it is possible to have your root volume share space with your normal aggregates but not really recommended as it’s a big headache when things go wrong).
  2. Hot spares; we now have 69 disks available per controller, not too bad. Let’s take a disk per shelf for spares as we are really paranoid about steve down in accounts that says the NetApp cabinet is in the perfect place for datacentre football goalposts. 66 disks now.
  3. RAID; We can pretty much say that’s 66 disks for our data 65TB each, or is it? Let’s raid these bad boys up, to keep raidsets even we can split them into 5 sets of 13 per controller. That’s 55 data disks and 10 parity disks, easy to work out with NetApp (you’d lose similar raw space with other raid types too) With NetApp 1TB disks you have roughly 828.6GB per disk, which brings you down to 44.5TB (everybody does this, not just NetApp, but the usable storage per disk varies between vendors).
  4. Snapshots; now, lets also assume that all of your data is CIFS/NFS and after analysing your rate of change, you need 10% of all of your volumes reserved for snapshots to meet recovery objectives. And lets assume you don’t know about aggregate snapshots and they are reserving 5%. That’s 15%, acceptable loss.
  5. File system; On top of this we need to remember that wafl is a filesystem and needs 15% of your data space to just plain not be used in order to keep the performance high (I’m being conservative here, I recently had a NetApp PS tell me it should be 20%).

For those of you following along at home congratulations on realising 30% of your usable space got written off before your data even got onto the filer (we can forgive snapshot space, everybody loves snapshots) so, take 30% away from your 44.5TB and that’s 31.15TB from 72 1TB disks, quite frightening if you ask me. But this is a worst case scenario; Snapshot reserves can be 0% if you want and with different raidset sizes you can have less parity disks, this isn’t a hard and fast rule; just an example. I wholeheartedly await someone working out how much space you can squeeze (pre de-duplication) out of 72 1TB disks across two controllers that can give a best case estimate. Did I mention de-duplication? NetApp throw this in right out of the box and depending on your data you can get some scarily good reductions using it. Obviously sequential reads may suffer but that’s the price you pay for storage efficiency. I have personally seen on average a 30% saving using dedupe on CIFS with filers so there’s a possibility to reclaim some or all of that lost space back

Some of the nice features

Right, I’ve been half bash half praising NetApp throughout this post and I think I’ll settle on some positive points (no doubt I’ll miss one or two, let me know)

  • Snapshots: the usual, keep files after they are deleted or changed
  • Snapmirror: after dumping a full volume (i.e. backup snapshot) to a filer destination, move the more recent snapshots and use them to update the destination volume, keeping them in synch
  • Synchronous snapmirror: entirely different to snapmirror, imagine filers configured for steplock and only writing data once both have it in cache.
  • Compression: does what it says on the tin, can be licensed for volumes (on top of dedupe) or used for free in snapmirror stansfers (27:1 compression anyone?)
  • Multi-protocol: get everyone on your filer, no matter what protocol they use
  • Vfilers: (I didn’t talk about this) create multiple identities on a filer, assign them different IP addresses and volumes and use them as separate filers, separate traffic, replace aging file servers with a vfiler of the same name, go wild!
  • Aggregates: want a file server that throws out the backup nightmare of 12TB of CIFS data in a single volume? NetApp can do it
  • Flexvols: expand and shrink volumes on the fly Qtrees: desperately want a giant 12TB volume split into manageable chunks? Create Qtrees in the volume (show up as folders in CIFS) and back up the Qtrees instead
  • NDMP: If you still use tape, get your backups in a snapshot leveraged NetApp only format
  • Backup to tape: Throw same NDMP backups to tape directly from the filer (over fibre)
  • Snapvault: move snapshots away from your active volumes and keep them safe or just plain long term
  • Connectivity: hasn’t been discussed, but obviously a multiprotocol filer has to be able to use different media to distribute the bits and bytes. Fibre, Ethernet, Ethernet over Fibre etc. interfaces can be grouped for load balancing and failover
  • Snapdrive/Snapmanager for XXX: interesting tools for application/platform specific snapshots
  • Dedupe: for free, out of the box, block level

Well, if you actually managed to read through this in its entirety, well done. I have no doubt missed out many great features and caveats, and welcome your input and or questions.