Friday, June 10, 2011

How to mount Mac OS X hsf+ partition (rw) in Linux

SkyHi @ Friday, June 10, 2011
Update: Posted follow up article to mount HFS+ dd images
The first time I tried to mount a Mac OS X partition in Linux, I ran into several issues. Here were the problems I had and the resolution.

HFS+ partition structure
Man computer forensic jobs are on Windows or even Linux computer which use the DOS partition structure. So, the first time you come across a Apple partition, things look a bit different. The following are the partitions, as seen by mmls and parted, on a HFS+ USB drive:

mmls output
wintermute:/mnt# mmls /dev/sdc
MAC Partition Map
Offset Sector: 0
Units are in 512-byte sectors

     Slot    Start        End          Length       Description
00:  -----   0000000000   0000000000   0000000001   Unallocated
01:  00      0000000001   0000000063   0000000063   Apple_partition_map
02:  Meta    0000000001   0000000004   0000000004   Table
03:  01      0000000064   0000262207   0000262144   Apple_Free
04:  02      0000262208   0160086511   0159824304   Apple_HFS
05:  03      0160086512   0160086527   0000000016   Apple_Free

parted output
wintermute:/mnt# parted /dev/sdc
GNU Parted 1.7.1
Using /dev/sdc
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print

Disk /dev/sdc: 82.0GB
Sector size (logical/physical): 512B/512B
Partition Table: mac

Number  Start   End     Size    File system  Name                  Flags
1      0.51kB  32.8kB  32.3kB               Apple
3      134MB   82.0GB  81.8GB  hfs+         Apple_HFS_Untitled_1

HFS+ partition and journaling
When you mount the HFP+ partition, you issue the following:
wintermute:/mnt# mount -t hfsplus /dev/sdc3 /mnt/sdc/
and while everything looks good from mount:
wintermute:/mnt# mount
/dev/sdc3 on /mnt/sdc type hfsplus
the journaled filesystem has actually caused some issues. You notice this is you want to write to the drive or if you look at dmesg:
sdc1: rw=0, want=262211, limit=63
hfs: unable to find HFS+ superblock
hfs: write access to a jounaled filesystem is not supported, use the force option at your own risk, mounting read-only.

Disabling journaling on HFP+ file system
If you need to write data to this drive (helpful as I move files between Linux, Windows and OS X forensic workstations), you can remove the journaling from the file system. This is not recommended for the system partition on a Mac but is fine for a data drive. Of course, understanding the benefits of a journaled file system for data recovery is important. Anyway, to remove the journal I issued from following command on my OS X workstation per Apple’s support article:
neuromancer:/Volumes root# /usr/sbin/diskutil disableJournal /Volumes/mac-backup

On your Mac:

Open Disk Utility under Applications -> Utilities
Select the volume to disable journaling on.
Choose Disable Journaling from the File menu.

Write away
Placing the drive back in Linux and mount with the same command as above resulting in a writable HFS+ partition on my Linux workstation:
wintermute:/mnt# mount
/dev/sdc3 on /mnt/sdc type hfsplus (rw)


Wednesday, June 8, 2011

Mounting external USB drive formatted NTFS to my FreeNAS server.

SkyHi @ Wednesday, June 08, 2011
This is a quick post on how to mount an external USB NTFS formatted hard drive (or memory stick) to FreeNAS. All that’s needed is to load the fuse driver, but it had me stumped for a bit until I found it in a forum post.
First things first, let’s plug in our drive and find out the device ID FreeNAS assigns it.
# dmesg
umass1: <ASMedia AS2105, class 0/0, rev 2.10/1.00, addr 3> on uhub4
da1 at umass-sim1 bus 1 target 0 lun 0
da1: <ST950042 0AS 0002> Fixed Direct Access SCSI-0 device 
da1: 40.000MB/s transfers
da1: 476940MB (976773168 512 byte sectors: 255H 63S/T 60801C)

Now we create a mount point for the drive and try mount it like we normally would.
# mkdir /mnt/usb
# mount -t ntfs /dev/da1s1 /mnt/usb/
You’ll receive the following error:
fuse: failed to open fuse device: No such file or directory
Now lets load the fuse driver and try again.
# kldload fuse
# mount -t ntfs /dev/da1s1 /mnt/usb/
# ls -l /mnt/usb
total 4589868
drwxrwxrwx  1 root  wheel        4096 Sep 25 17:48 $RECYCLE.BIN
drwxrwxrwx  1 root  wheel           0 Nov 27 14:25 .Trashes
-rwxrwxrwx  1 root  wheel        4096 Nov 22 18:15 ._.Trashes
It’s probably easier to load the driver on boot instead of loading it manually like I have above. I haven’t tried it, but it should work if you call it from loader.conf. I can reboot my NAS at the moment, but if someone wants to give it a shot and post to the comments, that would be great. You can follow instructions on how to edit loader.conf in a previous post [1] of mine.


Tuesday, June 7, 2011

SQL Injection Attacks by Example

SkyHi @ Tuesday, June 07, 2011

SQL Injection Attacks by Example

[SQL Injection logo] A customer asked that we check out his intranet site, which was used by the company's employees and customers. This was part of a larger security review, and though we'd not actually used SQL injection to penetrate a network before, we were pretty familiar with the general concepts. We were completely successful in this engagement, and wanted to recount the steps taken as an illustration.
"SQL Injection" is subset of the an unverified/unsanitized user input vulnerability ("buffer overflows" are a different subset), and the idea is to convince the application to run SQL code that was not intended. If the application is creating SQL strings naively on the fly and then running them, it's straightforward to create some real surprises.
We'll note that this was a somewhat winding road with more than one wrong turn, and others with more experience will certainly have different -- and better -- approaches. But the fact that we were successful does suggest that we were not entirely misguided.
There have been other papers on SQL injection, including some that are much more detailed, but this one shows the rationale of discovery as much as the process of exploitation.

The Target Intranet

This appeared to be an entirely custom application, and we had no prior knowledge of the application nor access to the source code: this was a "blind" attack. A bit of poking showed that this server ran Microsoft's IIS 6 along with ASP.NET, and this suggested that the database was Microsoft's SQL server: we believe that these techniques can apply to nearly any web application backed by any SQL server.
The login page had a traditional username-and-password form, but also an email-me-my-password link; the latter proved to be the downfall of the whole system.
When entering an email address, the system presumably looked in the user database for that email address, and mailed something to that address. Since my email address is not found, it wasn't going to send me anything.
So the first test in any SQL-ish form is to enter a single quote as part of the data: the intention is to see if they construct an SQL string literally without sanitizing. When submitting the form with a quote in the email address, we get a 500 error (server failure), and this suggests that the "broken" input is actually being parsed literally. Bingo.
We speculate that the underlying SQL code looks something like this:
SELECT fieldlist
  FROM table
 WHERE field = '$EMAIL';
Here, $EMAIL is the address submitted on the form by the user, and the larger query provides the quotation marks that set it off as a literal string. We don't know the specific names of the fields or table involved, but we do know their nature, and we'll make some good guesses later.
When we enter' - note the closing quote mark - this yields constructed SQL:
SELECT fieldlist
  FROM table
 WHERE field = ''';
when this is executed, the SQL parser find the extra quote mark and aborts with a syntax error. How this manifests itself to the user depends on the application's internal error-recovery procedures, but it's usually different from "email address is unknown". This error response is a dead giveaway that user input is not being sanitized properly and that the application is ripe for exploitation.
Since the data we're filling in appears to be in the WHERE clause, let's change the nature of that clause in an SQL legal way and see what happens. By entering anything' OR 'x'='x, the resulting SQL is:
SELECT fieldlist
  FROM table
 WHERE field = 'anything' OR 'x'='x';
Because the application is not really thinking about the query - merely constructing a string - our use of quotes has turned a single-component WHERE clause into a two-component one, and the 'x'='x' clause is guaranteed to be true no matter what the first clause is (there is a better approach for this "always true" part that we'll touch on later).
But unlike the "real" query, which should return only a single item each time, this version will essentially return every item in the members database. The only way to find out what the application will do in this circumstance is to try it. Doing so, we were greeted with:

Your login information has been mailed to
Our best guess is that it's the first record returned by the query, effectively an entry taken at random. This person really did get this forgotten-password link via email, which will probably come as surprise to him and may raise warning flags somewhere.
We now know that we're able to manipulate the query to our own ends, though we still don't know much about the parts of it we cannot see. But we have observed three different responses to our various inputs:
  • "Your login information has been mailed to email"
  • "We don't recognize your email address"
  • Server error
The first two are responses to well-formed SQL, while the latter is for bad SQL: this distinction will be very useful when trying to guess the structure of the query.

Schema field mapping

The first steps are to guess some field names: we're reasonably sure that the query includes "email address" and "password", and there may be things like "US Mail address" or "userid" or "phone number". We'd dearly love to perform a SHOW TABLE, but in addition to not knowing the name of the table, there is no obvious vehicle to get the output of this command routed to us.
So we'll do it in steps. In each case, we'll show the whole query as we know it, with our own snippets shown specially. We know that the tail end of the query is a comparison with the email address, so let's guess email as the name of the field:
SELECT fieldlist
  FROM table
 WHERE field = 'x' AND email IS NULL; --';
The intent is to use a proposed field name (email) in the constructed query and find out if the SQL is valid or not. We don't care about matching the email address (which is why we use a dummy 'x'), and the -- marks the start of an SQL comment. This is an effective way to "consume" the final quote provided by application and not worry about matching them.
If we get a server error, it means our SQL is malformed and a syntax error was thrown: it's most likely due to a bad field name. If we get any kind of valid response, we guessed the name correctly. This is the case whether we get the "email unknown" or "password was sent" response.
Note, however, that we use the AND conjunction instead of OR: this is intentional. In the SQL schema mapping phase, we're not really concerned with guessing any particular email addresses, and we do not want random users inundated with "here is your password" emails from the application - this will surely raise suspicions to no good purpose. By using the AND conjunction with an email address that couldn't ever be valid, we're sure that the query will always return zero rows and never generate a password-reminder email.
Submitting the above snippet indeed gave us the "email address unknown" response, so now we know that the email address is stored in a field email. If this hadn't worked, we'd have tried email_address or mail or the like. This process will involve quite a lot of guessing.
Next we'll guess some other obvious names: password, user ID, name, and the like. These are all done one at a time, and anything other than "server failure" means we guessed the name correctly.
SELECT fieldlist
  FROM table
 WHERE email = 'x' AND userid IS NULL; --';
As a result of this process, we found several valid field names:
  • email
  • passwd
  • login_id
  • full_name
There are certainly more (and a good source of clues is the names of the fields on forms), but a bit of digging did not discover any. But we still don't know the name of the table that these fields are found in - how to find out?

Finding the table name

The application's built-in query already has the table name built into it, but we don't know what that name is: there are several approaches for finding that (and other) table names. The one we took was to rely on a subselect.
A standalone query of
Returns the number of records in that table, and of course fails if the table name is unknown. We can build this into our string to probe for the table name:
SELECT email, passwd, login_id, full_name
  FROM table
 WHERE email = 'x' AND 1=(SELECT COUNT(*) FROM tabname); --';
We don't care how many records are there, of course, only whether the table name is valid or not. By iterating over several guesses, we eventually determined that members was a valid table in the database. But is it the table used in this query? For that we need yet another test using table.field notation: it only works for tables that are actually part of this query, not merely that the table exists.
SELECT email, passwd, login_id, full_name
  FROM members
 WHERE email = 'x' AND IS NULL; --';
When this returned "Email unknown", it confirmed that our SQL was well formed and that we had properly guessed the table name. This will be important later, but we instead took a different approach in the interim.

Finding some users

At this point we have a partial idea of the structure of the members table, but we only know of one username: the random member who got our initial "Here is your password" email. Recall that we never received the message itself, only the address it was sent to. We'd like to get some more names to work with, preferably those likely to have access to more data.
The first place to start, of course, is the company's website to find who is who: the "About us" or "Contact" pages often list who's running the place. Many of these contain email addresses, but even those that don't list them can give us some clues which allow us to find them with our tool.
The idea is to submit a query that uses the LIKE clause, allowing us to do partial matches of names or email addresses in the database, each time triggering the "We sent your password" message and email. Warning: though this reveals an email address each time we run it, it also actually sends that email, which may raise suspicions. This suggests that we take it easy.
We can do the query on email name or full name (or presumably other information), each time putting in the % wildcards that LIKE supports:
SELECT email, passwd, login_id, full_name
  FROM members
 WHERE email = 'x' OR full_name LIKE '%Bob%';
Keep in mind that even though there may be more than one "Bob", we only get to see one of them: this suggests refining our LIKE clause narrowly.
Ultimately, we may only need one valid email address to leverage our way in.

Brute-force password guessing

One can certainly attempt brute-force guessing of passwords at the main login page, but many systems make an effort to detect or even prevent this. There could be logfiles, account lockouts, or other devices that would substantially impede our efforts, but because of the non-sanitized inputs, we have another avenue that is much less likely to be so protected.
We'll instead do actual password testing in our snippet by including the email name and password directly. In our example, we'll use our victim, and try multiple passwords.
SELECT email, passwd, login_id, full_name
  FROM members
 WHERE email = '' AND passwd = 'hello123';
This is clearly well-formed SQL, so we don't expect to see any server errors, and we'll know we found the password when we receive the "your password has been mailed to you" message. Our mark has now been tipped off, but we do have his password.
This procedure can be automated with scripting in perl, and though we were in the process of creating this script, we ended up going down another road before actually trying it.

The database isn't readonly

So far, we have done nothing but query the database, and even though a SELECT is readonly, that doesn't mean that SQL is. SQL uses the semicolon for statement termination, and if the input is not sanitized properly, there may be nothing that prevents us from stringing our own unrelated command at the end of the query.
The most drastic example is:
SELECT email, passwd, login_id, full_name
  FROM members
 WHERE email = 'x'; DROP TABLE members; --';  -- Boom!
The first part provides a dummy email address -- 'x' -- and we don't care what this query returns: we're just getting it out of the way so we can introduce an unrelated SQL command. This one attempts to drop (delete) the entire members table, which really doesn't seem too sporting.
This shows that not only can we run separate SQL commands, but we can also modify the database. This is promising.

Adding a new member

Given that we know the partial structure of the members table, it seems like a plausible approach to attempt adding a new record to that table: if this works, we'll simply be able to login directly with our newly-inserted credentials.
This, not surprisingly, takes a bit more SQL, and we've wrapped it over several lines for ease of presentation, but our part is still one contiguous string:
SELECT email, passwd, login_id, full_name
  FROM members
 WHERE email = 'x';
        INSERT INTO members ('email','passwd','login_id','full_name') 
        VALUES ('','hello','steve','Steve Friedl');--';
Even if we have actually gotten our field and table names right, several things could get in our way of a successful attack:
  1. We might not have enough room in the web form to enter this much text directly (though this can be worked around via scripting, it's much less convenient).
  2. The web application user might not have INSERT permission on the members table.
  3. There are undoubtedly other fields in the members table, and some may require initial values, causing the INSERT to fail.
  4. Even if we manage to insert a new record, the application itself might not behave well due to the auto-inserted NULL fields that we didn't provide values for.
  5. A valid "member" might require not only a record in the members table, but associated information in other tables (say, "accessrights"), so adding to one table alone might not be sufficient.
In the case at hand, we hit a roadblock on either #4 or #5 - we can't really be sure -- because when going to the main login page and entering in the above username + password, a server error was returned. This suggests that fields we did not populate were vital, but nevertheless not handled properly.
A possible approach here is attempting to guess the other fields, but this promises to be a long and laborious process: though we may be able to guess other "obvious" fields, it's very hard to imagine the bigger-picture organization of this application.
We ended up going down a different road.

Mail me a password

We then realized that though we are not able to add a new record to the members database, we can modify an existing one, and this proved to be the approach that gained us entry.
From a previous step, we knew that had an account on the system, and we used our SQL injection to update his database record with our email address:
SELECT email, passwd, login_id, full_name
  FROM members
 WHERE email = 'x';
      UPDATE members
      SET email = ''
      WHERE email = '';
After running this, we of course received the "we didn't know your email address", but this was expected due to the dummy email address provided. The UPDATE wouldn't have registered with the application, so it executed quietly.
We then used the regular "I lost my password" link - with the updated email address - and a minute later received this email:
Now it was now just a matter of following the standard login process to access the system as a high-ranked MIS staffer, and this was far superior to a perhaps-limited user that we might have created with our INSERT approach.
We found the intranet site to be quite comprehensive, and it included - among other things - a list of all the users. It's a fair bet that many Intranet sites also have accounts on the corporate Windows network, and perhaps some of them have used the same password in both places. Since it's clear that we have an easy way to retrieve any Intranet password, and since we had located an open PPTP VPN port on the corporate firewall, it should be straightforward to attempt this kind of access.
We had done a spot check on a few accounts without success, and we can't really know whether it's "bad password" or "the Intranet account name differs from the Windows account name". But we think that automated tools could make some of this easier.

Other Approaches

In this particular engagement, we obtained enough access that we did not feel the need to do much more, but other steps could have been taken. We'll touch on the ones that we can think of now, though we are quite certain that this is not comprehensive.
We are also aware that not all approaches work with all databases, and we can touch on some of them here.
Use xp_cmdshell
Microsoft's SQL Server supports a stored procedure xp_cmdshell that permits what amounts to arbitrary command execution, and if this is permitted to the web user, complete compromise of the webserver is inevitable.
What we had done so far was limited to the web application and the underlying database, but if we can run commands, the webserver itself cannot help but be compromised. Access to xp_cmdshell is usually limited to administrative accounts, but it's possible to grant it to lesser users.
Map out more database structure
Though this particular application provided such a rich post-login environment that it didn't really seem necessary to dig further, in other more limited environments this may not have been sufficient.
Being able to systematically map out the available schema, including tables and their field structure, can't help but provide more avenues for compromise of the application.
One could probably gather more hints about the structure from other aspects of the website (e.g., is there a "leave a comment" page? Are there "support forums"?). Clearly, this is highly dependent on the application and it relies very much on making good guesses.


We believe that web application developers often simply do not think about "surprise inputs", but security people do (including the bad guys), so there are three broad approaches that can be applied here.
Sanitize the input
It's absolutely vital to sanitize user inputs to insure that they do not contain dangerous codes, whether to the SQL server or to HTML itself. One's first idea is to strip out "bad stuff", such as quotes or semicolons or escapes, but this is a misguided attempt. Though it's easy to point out some dangerous characters, it's harder to point to all of them.
The language of the web is full of special characters and strange markup (including alternate ways of representing the same characters), and efforts to authoritatively identify all "bad stuff" are unlikely to be successful.
Instead, rather than "remove known bad data", it's better to "remove everything but known good data": this distinction is crucial. Since - in our example - an email address can contain only these characters:
There is really no benefit in allowing characters that could not be valid, and rejecting them early - presumably with an error message - not only helps forestall SQL Injection, but also catches mere typos early rather than stores them into the database.
Sidebar on email addresses
It's important to note here that email addresses in particular are troublesome to validate programmatically, because everybody seems to have his own idea about what makes one "valid", and it's a shame to exclude a good email address because it contains a character you didn't think about. The only real authority is RFC 2822 (which encompasses the more familiar RFC822), and it includes a fairly expansive definition of what's allowed. The truly pedantic may well wish to accept email addresses with ampersands and asterisks (among other things) as valid, but others - including this author - are satisfied with a reasonable subset that includes "most" email addresses. Those taking a more restrictive approach ought to be fully aware of the consequences of excluding these addresses, especially considering that better techniques (prepare/execute, stored procedures) obviate the security concerns which those "odd" characters present.
Be aware that "sanitizing the input" doesn't mean merely "remove the quotes", because even "regular" characters can be troublesome. In an example where an integer ID value is being compared against the user input (say, a numeric PIN):
SELECT fieldlist
  FROM table
 WHERE id = 23 OR 1=1;  -- Boom! Always matches!
In practice, however, this approach is highly limited because there are so few fields for which it's possible to outright exclude many of the dangerous characters. For "dates" or "email addresses" or "integers" it may have merit, but for any kind of real application, one simply cannot avoid the other mitigations.
Escape/Quotesafe the input
Even if one might be able to sanitize a phone number or email address, one cannot take this approach with a "name" field lest one wishes to exclude the likes of Bill O'Reilly from one's application: a quote is simply a valid character for this field.
One includes an actual single quote in an SQL string by putting two of them together, so this suggests the obvious - but wrong! - technique of preprocessing every string to replicate the single quotes:
SELECT fieldlist
  FROM customers
 WHERE name = 'Bill O''Reilly';  -- works OK
However, this naïve approach can be beaten because most databases support other string escape mechanisms. MySQL, for instance, also permits \' to escape a quote, so after input of \'; DROP TABLE users; -- is "protected" by doubling the quotes, we get:
SELECT fieldlist
  FROM customers
 WHERE name = '\''; DROP TABLE users; --';  -- Boom!
The expression '\'' is a complete string (containing just one single quote), and the usual SQL shenanigans follow. It doesn't stop with backslashes either: there is Unicode, other encodings, and parsing oddities all hiding in the weeds to trip up the application designer.
Getting quotes right is notoriously difficult, which is why many database interface languages provide a function that does it for you. When the same internal code is used for "string quoting" and "string parsing", it's much more likely that the process will be done properly and safely.
Some examples are the MySQL function mysql_real_escape_string() and perl DBD method $dbh->quote($value).
These methods must be used.
Use bound parameters (the PREPARE statement)
Though quotesafing is a good mechanism, we're still in the area of "considering user input as SQL", and a much better approach exists: bound parameters, which are supported by essentially all database programming interfaces. In this technique, an SQL statement string is created with placeholders - a question mark for each parameter - and it's compiled ("prepared", in SQL parlance) into an internal form.
Later, this prepared query is "executed" with a list of parameters:
Example in perl
$sth = $dbh->prepare("SELECT email, userid FROM members WHERE email = ?;");

Thanks to Stefan Wagner, this demonstrates bound parameters in Java:
Insecure version
Statement s = connection.createStatement();
ResultSet rs = s.executeQuery("SELECT email FROM member WHERE name = "
                             + formField); // *boom*
Secure version
PreparedStatement ps = connection.prepareStatement(
    "SELECT email FROM member WHERE name = ?");
ps.setString(1, formField);
ResultSet rs = ps.executeQuery();
Here, $email is the data obtained from the user's form, and it is passed as positional parameter #1 (the first question mark), and at no point do the contents of this variable have anything to do with SQL statement parsing. Quotes, semicolons, backslashes, SQL comment notation - none of this has any impact, because it's "just data". There simply is nothing to subvert, so the application is be largely immune to SQL injection attacks.
There also may be some performance benefits if this prepared query is reused multiple times (it only has to be parsed once), but this is minor compared to the enormous security benefits. This is probably the single most important step one can take to secure a web application.
Limit database permissions and segregate users
In the case at hand, we observed just two interactions that are made not in the context of a logged-in user: "log in" and "send me password". The web application ought to use a database connection with the most limited rights possible: query-only access to the members table, and no access to any other table.
The effect here is that even a "successful" SQL injection attack is going to have much more limited success. Here, we'd not have been able to do the UPDATE request that ultimately granted us access, so we'd have had to resort to other avenues.
Once the web application determined that a set of valid credentials had been passed via the login form, it would then switch that session to a database connection with more rights.
It should go almost without saying that sa rights should never be used for any web-based application.
Use stored procedures for database access
When the database server supports them, use stored procedures for performing access on the application's behalf, which can eliminate SQL entirely (assuming the stored procedures themselves are written properly).
By encapsulating the rules for a certain action - query, update, delete, etc. - into a single procedure, it can be tested and documented on a standalone basis and business rules enforced (for instance, the "add new order" procedure might reject that order if the customer were over his credit limit).
For simple queries this might be only a minor benefit, but as the operations become more complicated (or are used in more than one place), having a single definition for the operation means it's going to be more robust and easier to maintain.
Note: it's always possible to write a stored procedure that itself constructs a query dynamically: this provides no protection against SQL Injection - it's only proper binding with prepare/execute or direct SQL statements with bound variables that provide this protection.
Isolate the webserver
Even having taken all these mitigation steps, it's nevertheless still possible to miss something and leave the server open to compromise. One ought to design the network infrastructure to assume that the bad guy will have full administrator access to the machine, and then attempt to limit how that can be leveraged to compromise other things.
For instance, putting the machine in a DMZ with extremely limited pinholes "inside" the network means that even getting complete control of the webserver doesn't automatically grant full access to everything else. This won't stop everything, of course, but it makes it a lot harder.
Configure error reporting
The default error reporting for some frameworks includes developer debugging information, and this cannot be shown to outside users. Imagine how much easier a time it makes for an attacker if the full query is shown, pointing to the syntax error involved.
This information is useful to developers, but it should be restricted - if possible - to just internal users.
Note that not all databases are configured the same way, and not all even support the same dialect of SQL (the "S" stands for "Structured", not "Standard"). For instance, most versions of MySQL do not support subselects, nor do they usually allow multiple statements: these are substantially complicating factors when attempting to penetrate a network.

We'd like to emphasize that though we chose the "Forgotten password" link to attack in this particular case, it wasn't really because this particular web application feature is dangerous. It was simply one of several available features that might have been vulnerable, and it would be a mistake to focus on the "Forgotten password" aspect of the presentation.
This Tech Tip has not been intended to provide comprehensive coverage on SQL injection, or even a tutorial: it merely documents the process that evolved over several hours during a contracted engagement. We've seen other papers on SQL injection discuss the technical background, but still only provide the "money shot" that ultimately gained them access.
But that final statement required background knowledge to pull off, and the process of gathering that information has merit too. One doesn't always have access to source code for an application, and the ability to attack a custom application blindly has some value.
Thanks to David Litchfield and Randal Schwartz for their technical input to this paper, and to the great Chris Mospaw for graphic design (© 2005 by Chris Mospaw, used with permission).

Other resources

Last modified: Wed Oct 10 06:28:06 PDT 2007


Serving static files: a comparison between Apache, Nginx, Varnish and G-WAN

SkyHi @ Tuesday, June 07, 2011
Update 1 (Mar 16, 2011): Apache MPM-Event benchmark added
Update 2 (Mar 16, 2011): Second run of Varnish benchmark added
Update 3 (Mar 16, 2011): Cherokee benchmark added
Update 4 (Mar 25, 2011): New benchmark with the optimized settings is available


Apache is the de facto web server on Unix system. Nginx is nowadays a popular and performant web server for serving static files (i.e. static html pages, CSS files, Javascript files, pictures, …). On the other hand, Varnish Cache is increasingly used to make websites “fly” by caching static content in memory. Recently, I came across a new application server called G-WAN. I’m only interested here in serving static content, even if G-WAN is also able to serve dynamic content, using ANSI C scripting. Finally, I also included Cherokee in the benchmark.


The following version of the software are used for this benchmark:
  • Apache MPM-worker: 2.2.16-1ubuntu3.1 (64 bit)
  • Apache MPM-event: 2.2.16-1ubuntu3.1 (64 bit)
  • Nginx: 0.7.67-3ubuntu1 (64 bit)
  • Varnish:  2.1.3-7ubuntu0.1 (64 bit)
  • G-WAN: 2.1.20 (32 bit)
  • Cherokee: 1.2.1-1~maverick~ppa1 (64 bit)
All tests are performed on an ASUS U30JC (Intel Core i3 – 370M @ 2.4 Ghz, Hard drive 5400 rpm, Memory: 4GB DDR3 1066MHz) running Ubuntu 10.10 64 bit (kernel 2.6.35).
Benchmark setup
  • HTTP Keep-Alives: enabled
  • TCP/IP settings: OS default
  • Server settings: default
  • Concurrency: from 0 to 1’000, step 10
  • Requests: 1’000’000
The following file of 100 byte is used as static content: /var/www/100.html
Doing a correct benchmark is clearly not an easy task. There are many walls (TCP/IP stack, OS settings, the client, …) that may corrupt the results, and there is always the risk to compare apples with oranges (e.g. benchmarking the TCP/IP stack instead of the server itself).
In this benchmark, every server is tested using its default settings. The same applies for the OS. Of course, on a production environment, each setting will be optimized. This has been done in a second benchmark. If you have comments, improvements, ideas, please feel free to contact me, I’m always open to improve myself and to learn new things.


The client (available here: relies on ApacheBench (ab). The client as well as the web server tested are hosted on the same computer.

Apache (MPM-worker)


Relevant part of file /etc/apache2/apache2.conf
StartServers          2
    MinSpareThreads      25
    MaxSpareThreads      75
    ThreadLimit          64
    ThreadsPerChild      25
    MaxClients          150
    MaxRequestsPerChild   0

Benchmark results

The benchmark took 1174 seconds in total.

Apache (MPM-event)


Relevant part of file /etc/apache2/apache2.conf
StartServers          2
    MaxClients          150
    MinSpareThreads      25
    MaxSpareThreads      75
    ThreadLimit          64
    ThreadsPerChild      25
    MaxRequestsPerChild   0

Benchmark results

The benchmark took 1904 seconds in total.



File /etc/nginx/nginx.conf
user www-data;
worker_processes  1;
error_log  /var/log/nginx/error.log;
pid        /var/run/;
events {
    worker_connections  1024;
    # multi_accept on;
http {
    include       /etc/nginx/mime.types;
    access_log  /var/log/nginx/access.log;
    sendfile        on;
    #tcp_nopush     on;
    #keepalive_timeout  0;
    keepalive_timeout  65;
    tcp_nodelay        on;
    gzip  on;
    gzip_disable "MSIE [1-6]\.(?!.*SV1)";
    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
File /etc/nginx/sites-enabled/default
server {
        listen   80; ## listen for ipv4
        server_name  localhost;
        access_log  /var/log/nginx/localhost.access.log;
        location / {
                root   /var/www;
                index  index.html index.htm;

Benchmark results

The benchmark took 1048 seconds in total.


Varnish uses Nginx as backend. However, only one request every 2 minutes hits Nginx, the other requests are served directly by Varnish.


File /etc/varnish/default.vcl
backend default {
   .host = "";
   .port = "80";
File /etc/default/varnish
INSTANCE=$(uname -n)
DAEMON_OPTS="-a :6081 \
             -T localhost:6082 \
             -f /etc/varnish/default.vcl \
             -S /etc/varnish/secret \
             -s file,/var/lib/varnish/$INSTANCE/varnish_storage.bin,1G"

Benchmark results

Run: 1
The benchmark took 1297 seconds in total.

Run: 2
The benchmark took 1313 seconds in total.

As some people requested more details regarding the benchmark of Varnish, here is the output of varnishstat -1:
client_conn            504664       281.31 Client connections accepted
client_drop                 0         0.00 Connection dropped, no sess/wrk
client_req           20245482     11285.11 Client requests received
cache_hit            20245471     11285.10 Cache hits
cache_hitpass               0         0.00 Cache hits for pass
cache_miss                 11         0.01 Cache misses
backend_conn               11         0.01 Backend conn. success
backend_unhealthy            0         0.00 Backend conn. not attempted
backend_busy                0         0.00 Backend conn. too many
backend_fail                0         0.00 Backend conn. failures
backend_reuse               0         0.00 Backend conn. reuses
backend_toolate            10         0.01 Backend conn. was closed
backend_recycle            11         0.01 Backend conn. recycles
backend_unused              0         0.00 Backend conn. unused
fetch_head                  0         0.00 Fetch head
fetch_length                0         0.00 Fetch with Length
fetch_chunked              11         0.01 Fetch chunked
fetch_eof                   0         0.00 Fetch EOF
fetch_bad                   0         0.00 Fetch had bad headers
fetch_close                 0         0.00 Fetch wanted close
fetch_oldhttp               0         0.00 Fetch pre HTTP/1.1 closed
fetch_zero                  0         0.00 Fetch zero len
fetch_failed                0         0.00 Fetch failed
n_sess_mem               2963          .   N struct sess_mem
n_sess                   1980          .   N struct sess
n_object                    0          .   N struct object
n_vampireobject             0          .   N unresurrected objects
n_objectcore              393          .   N struct objectcore
n_objecthead              393          .   N struct objecthead
n_smf                       2          .   N struct smf
n_smf_frag                  0          .   N small free smf
n_smf_large                 2          .   N large free smf
n_vbe_conn                  1          .   N struct vbe_conn
n_wrk                     396          .   N worker threads
n_wrk_create              500         0.28 N worker threads created
n_wrk_failed                0         0.00 N worker threads not created
n_wrk_max              118979        66.32 N worker threads limited
n_wrk_queue                 0         0.00 N queued work requests
n_wrk_overflow         133755        74.56 N overflowed work requests
n_wrk_drop                  0         0.00 N dropped work requests
n_backend                   1          .   N backends
n_expired                  11          .   N expired objects
n_lru_nuked                 0          .   N LRU nuked objects
n_lru_saved                 0          .   N LRU saved objects
n_lru_moved               557          .   N LRU moved objects
n_deathrow                  0          .   N objects on deathrow
losthdr                  7470         4.16 HTTP header overflows
n_objsendfile               0         0.00 Objects sent with sendfile
n_objwrite           20215571     11268.43 Objects sent with write
n_objoverflow               0         0.00 Objects overflowing workspace
s_sess                 504664       281.31 Total Sessions
s_req                20245482     11285.11 Total Requests
s_pipe                      0         0.00 Total pipe
s_pass                      0         0.00 Total pass
s_fetch                    11         0.01 Total fetch
s_hdrbytes         5913383706   3296200.51 Total header bytes
s_bodybytes         526382532    293412.78 Total body bytes
sess_closed            382711       213.33 Session Closed
sess_pipeline               0         0.00 Session Pipeline
sess_readahead              0         0.00 Session Read Ahead
sess_linger          20245482     11285.11 Session Linger
sess_herd              124222        69.24 Session herd
shm_records         689986796    384608.02 SHM records
shm_writes           21885539     12199.30 SHM writes
shm_flushes                 0         0.00 SHM flushes due to overflow
shm_cont               282730       157.60 SHM MTX contention
shm_cycles                200         0.11 SHM cycles through buffer
sm_nreq                    22         0.01 allocator requests
sm_nobj                     0          .   outstanding allocations
sm_balloc                   0          .   bytes allocated
sm_bfree           1073741824          .   bytes free
sma_nreq                    0         0.00 SMA allocator requests
sma_nobj                    0          .   SMA outstanding allocations
sma_nbytes                  0          .   SMA outstanding bytes
sma_balloc                  0          .   SMA bytes allocated
sma_bfree                   0          .   SMA bytes free
sms_nreq                    0         0.00 SMS allocator requests
sms_nobj                    0          .   SMS outstanding allocations
sms_nbytes                  0          .   SMS outstanding bytes
sms_balloc                  0          .   SMS bytes allocated
sms_bfree                   0          .   SMS bytes freed
backend_req                11         0.01 Backend requests made
n_vcl                       1         0.00 N vcl total
n_vcl_avail                 1         0.00 N vcl available
n_vcl_discard               0         0.00 N vcl discarded
n_purge                     1          .   N total active purges
n_purge_add                 1         0.00 N new purges added
n_purge_retire              0         0.00 N old purges deleted
n_purge_obj_test            0         0.00 N objects tested
n_purge_re_test             0         0.00 N regexps tested against
n_purge_dups                0         0.00 N duplicate purges removed
hcb_nolock           20219699     11270.74 HCB Lookups without lock
hcb_lock                    1         0.00 HCB Lookups with lock
hcb_insert                  1         0.00 HCB Inserts
esi_parse                   0         0.00 Objects ESI parsed (unlock)
esi_errors                  0         0.00 ESI parse errors (unlock)
accept_fail                 0         0.00 Accept failures
client_drop_late            0         0.00 Connection dropped late
uptime                   1794         1.00 Client uptime



The configuration of G-WAN is done through the file hierarchy. Therefore, unzipping the G-WAN archive was enough to have a fully working server.

Benchmark results

The benchmark took 607 seconds in total.



Relevant part of file /etc/cherokee/cherokee.conf
# Server
server!bind!1!port = 80
server!timeout = 15
server!keepalive = 1
server!keepalive_max_requests = 500
server!server_tokens = full
server!panic_action = /usr/share/cherokee/cherokee-panic
server!pid_file = /var/run/
server!user = www-data
server!group = www-data

# Default virtual server
vserver!1!nick = default
vserver!1!document_root = /var/www
vserver!1!directory_index = index.html

Benchmark results

The benchmark took 1068 seconds in total.


Let’s now compare the minimum, the average and the maximum requests per second rate of each server.

Minimum RPS

Average RPS

Maximum RPS


G-WAN is the clear winner of this benchmark, while Nginx and Varnish have simliar average performance. It’s not a real surprise to see Apache at the last position.
  • G-WAN can serve 2.25 times more requests per second on average compared to Cherokee, from 4.25 to 6.5 times compared to Nginx and Varnish, and from 9 to 13.5 times more than Apache.
  • Nginx / Varnish can serve 2.1 times more requests per second on average compared to Apache.
  • Nginx needs 1.73 more time to serve the same amount of requests compared to G-WAN.
  • Varnish needs 2.14 more time to serve the same amount of requests compared to G-WAN.
  • Apache needs 1.93 more time to serve a similar amount of requests compared to G-WAN (i.e. Apache sometimes replied with an error 503 and didn’t serve the exact same amount of requests).
Again, keep in mind that this benchmark compares only the servers with their out of the box settings locally (no networking is involved), and therefore the results might be misleading.


Monday, June 6, 2011

Why does du(1) report different file sizes for ZFS and UFS? Why doesn't the space consumption that is reported by the df command and the zfs list command match?

SkyHi @ Monday, June 06, 2011

ZFS Frequently Asked Questions (FAQ)

For information about dedup, see the ZFS Dedup FAQ. 
  1. ZFS Product Release Questions
  2. ZFS Technical Questions
  3. ZFS/UFS Comparison Questions
  4. ZFS Administration Questions
  5. ZFS and Other Product Interaction Questions

ZFS Product Release Questions

  1. How can I get ZFS?
  2. When will ZFS be available for
  3. What does ZFS stand for?

How can I get ZFS?

ZFS is available in the following releases:

When will ZFS be available for

Projects are under way to port ZFS to FreeBSD and to Linux (using FUSE). For more information on CDDL, see the licensing FAQ.

What does ZFS stand for?

Originally, ZFS was an acronym for "Zettabyte File System." The largest SI prefix we liked was 'zetta' ('yotta' was out of the question). Since ZFS is a 128-bit file system, the name was a reference to the fact that ZFS can store 256 quadrillion zettabytes (where each ZB is 270 bytes). Over time, ZFS gained a lot more features besides 128-bit capacity, such as rock-solid data integrity, easy administration, and a simplified model for managing your data.

ZFS Technical Questions

  1. Why does ZFS have 128-bit capacity?
  2. What limits does ZFS have?

Why does ZFS have 128-bit capacity?

File systems have proven to have a much longer lifetime than most traditional pieces of software, due in part to the fact that the on-disk format is extremely difficult to change. Given the fact that UFS has lasted in its current form (mostly) for nearly 20 years, it's not unreasonable to expect ZFS to last at least 30 years into the future. At this point, Moore's law starts to kick in for storage, and we start to predict that we'll be storing more than 64 bits of data in a single filesystem. For a more thorough description of this topic, and why 128 bits is enough, see Jeff's blog entry.

What limits does ZFS have?

The limitations of ZFS are designed to be so large that they will never be encountered in any practical operation. ZFS can store 16 Exabytes in each storage pool, file system, file, or file attribute. ZFS can store billions of names: files or directories in a directory, file systems in a file system, or snapshots of a file system. ZFS can store trillions of items: files in a file system, file systems, volumes, or snapshots in a pool.

ZFS/UFS Comparison Questions

  1. Why doesn't ZFS have an fsck-like utility?
  2. Why does du(1) report different file sizes for ZFS and UFS? Why doesn't the space consumption that is reported by the df command and the zfs list command match?
  3. Can I set quotas on ZFS file systems? 

Why doesn't ZFS have an fsck-like utility?

There are two basic reasons to have an fsck-like utility:
  • Verify file system integrity - Many times, administrators simply want to make sure that there is no on-disk corruption within their file systems. With most file systems, this involves running fsck while the file system is offline. This can be time consuming and expensive. Instead, ZFS provides the ability to 'scrub' all data within a pool while the system is live, finding and repairing any bad data in the process. There are future plans to enhance this to enable background scrubbing.
  • Repair on-disk state - If a machine crashes, the on-disk state of some file systems will be inconsistent. The addition of journalling has solved some of these problems, but failure to roll the log may still result in a file system that needs to be repaired. In this case, there are well known pathologies of errors, such as creating a directory entry before updating the parent link, which can be reliably repaired. ZFS does not suffer from this problem because data is always consistent on disk.
    A more insidious problem occurs with faulty hardware or software. Even file systems or volume managers that have per-block checksums are vulnerable to a variety of other pathologies that result in valid but corrupt data. In this case, the failure mode is essentially random, and most file systems will panic (if it was metadata) or silently return bad data to the application. In either case, an fsck utility will be of little benefit. Since the corruption matches no known pathology, it will be likely be unrepairable. With ZFS, these errors will be (statistically) nonexistent in a redundant configuration. In an non-redundant config, these errors are correctly detected, but will result in an I/O error when trying to read the block. It is theoretically possible to write a tool to repair such corruption, though any such attempt would likely be a one-off special tool. Of course, ZFS is equally vulnerable to software bugs, but the bugs would have to result in a consistent pattern of corruption to be repaired by a generic tool. During the 5 years of ZFS development, no such pattern has been seen.

Why does du(1) report different file sizes for ZFS and UFS? Why doesn't the space consumption that is reported by the df command and the zfs list command match?

On UFS, the du command reports the size of the data blocks within the file. On ZFS, du reports the actual size of the file as stored on disk. This size includes metadata as well as compression. This reporting really helps answer the question of "how much more space will I get if I remove this file?" So, even when compression is off, you will still see different results between ZFS and UFS.
When you compare the space consumption that is reported by the df command with the zfs list command, consider that df is reporting the pool size and not just file system sizes. In addition, df doesn't understand descendent datasets or whether snapshots exist. If any ZFS properties, such as compression and quotas, are set on file systems, reconciling the space consumption that is reported by df might be difficult.
Consider the following scenarios that might also impact reported space consumption:
  • For files that are larger than recordsize, the last block of the file is generally about 1/2 full. With the default recordsize set to 128 KB, approximately 64 KB is wasted per file, which might be a large impact. The integration of RFE 6812608 would resolve this scenario. You can work around this by enabling compression. Even if your data is already compressed, the unused portion of the last block will be zero-filled, and compresses very well.
  • On a RAIDZ-2 pool, every block consumes at least 2 sectors (512-byte chunks) of parity information. The space consumed by the parity information is not reported, but because it can vary, and be a much larger percentage for small blocks, an impact to space reporting might be seen. The impact is more extreme for a recordsize set to 512 bytes, where each 512-byte logical block consumes 1.5 KB (3 times the space).
    Regardless of the data being stored, if space efficiency is your primary concern, you should leave the recordsize at the default (128 KB), and enable compression (to the default of lzjb).
  • The df command is not aware of deduplicated file data.

Can I set quotas on ZFS file systems? 

Yes, ZFS provides several different quota features:
  • File system quotas (quota property) - ZFS file systems can be used as logical administrative control points, which allow you to view usage, manage properties, perform backups, take snapshots, and so on. For home directory servers, the ZFS model enables you to easily set up one file system per user. ZFS quotas are intentionally not associated with a particular user because file systems are points of administrative control. ZFS quotas can be set on file systems that could represent users, projects, groups, and so on, as well as on entire portions of a file system hierarchy. This allows quotas to be combined in ways that traditional per-user quotas cannot. Per-user quotas were introduced because multiple users had to share the same file system. ZFS file system quotas are flexible and easy to set up. A quota can be applied when the file system is created. For example:
# zfs create -o quota=20g tank/home/users
User file systems created in this file system automatically inherit the 20-Gbyte quota set on the parent file system. For example: 
# zfs create tank/home/users/user1
# zfs create tank/home/users/user2
# zfs list -r tank/home/users
tank/home/users        76.5K  20.0G  27.5K  /tank/home/users
tank/home/users/user1  24.5K  20.0G  24.5K  /tank/home/users/user1
tank/home/users/user2  24.5K  20.0G  24.5K  /tank/home/users/user2
ZFS quotas can be increased when the disk space in the ZFS storage pools is increased while the file systems are active, without having any down time.
  • Reference file system quotas (refquota property) - File system quota that does not limit space used by descendents, including file systems and snapshots
  • User and group quotas (userquota and groupquota properties) - Limits the amount of space that is consumed by the specified user or group. The userquota or groupquota space calculation does not include space that is used by descendent datasets, such as snapshots and clones, similar to the refquota property.
In general, file system quotas are appropriate for most environments, but user/group quotas are needed in some environments, such as universities that must manage many student user accounts. RFE 6501037 has integrated into Nevada build 114 and the Solaris 10 10/09 release.
An alternative to user-based quotas for containing disk space used for mail, is using mail server software that includes a quota feature, such as the Sun Java System Messaging Server. This software provides user mail quotas, quota warning messages, and expiration and purge features.

ZFS Administration Questions

  1. Why doesn't the space that is reported by the zpool list command and the zfs list command match?
  2. What can I do if ZFS file system panics on every boot?
  3. Does ZFS support hot spares?
  4. Can devices be removed from a ZFS pool?
  5. Can I use ZFS as my root file system? What about for zones?
  6. Can I split a mirrored ZFS configuration?
  7. Why has the zpool command changed?

Why doesn't the space that is reported by the zpool list command and the zfs list command match?

The SIZE value that is reported by the zpool list command is generally the amount of physical disk space in the pool, but varies depending on the pool's redundancy level. See the examples below. The zfs list command lists the usable space that is available to file systems, which is disk space minus ZFS pool redundancy metadata overhead, if any.
  • A non-redundant storage pool created with one 136-GB disk reports SIZE and initial FREE values as 136 GB. The initial AVAIL space reported by the zfs list command is 134 GB, due to a small amount pool metadata overhead.
# zpool create tank c0t6d0
# zpool list tank
tank   136G  95.5K   136G     0%  1.00x  ONLINE  -
# zfs list tank
tank    72K   134G    21K  /tank
  • A mirrored storage pool created with two 136-GB disks reports SIZE as 136 GB and initial FREE values as 136 GB. This reporting is referred to as the deflated space value. The initial AVAIL space reported by the zfs list command is 134 GB, due to a small amount of pool metadata overhead.
# zpool create tank mirror c0t6d0 c0t7d0   
# zpool list tank
tank   136G  95.5K   136G     0%  1.00x  ONLINE  -
# zfs list tank
tank    72K   134G    21K  /tank
  • A RAIDZ-2 storage pool created with three 136-GB disks reports SIZE as 408 GB and initial FREE values as 408 GB. This reporting is referred to as the inflated disk space value, which includes redundancy overhead, such as parity information. The initial AVAIL space reported by the zfs list command is 133 GB, due to the pool redundancy overhead.
# zpool create tank raidz2 c0t6d0 c0t7d0 c0t8d0
# zpool list tank
tank   408G   286K   408G     0%  1.00x  ONLINE  -
# zfs list tank
tank  73.2K   133G  20.9K  /tank

What can I do if ZFS file system panics on every boot?

ZFS is designed to survive arbitrary hardware failures through the use of redundancy (mirroring or RAID-Z). Unfortunately, certain failures in non-replicated configurations can cause ZFS to panic when trying to load the pool. This is a bug, and will be fixed in the near future (along with several other nifty features, such as background scrubbing). In the meantime, if you find yourself in the situation where you cannot boot due to a corrupt pool, do the following:

# mount -o remount /
# rm /etc/zfs/zpool.cache
# reboot
This will remove all knowledge of pools from your system. You will have to re-create your pool and restore from backup.
If a ZFS root file system panics, then you must boot from alternate media, import the root pool, resolve the issue that is causing the failure, export the root pool, and reboot the system. For more information, see the ZFS Troubleshooting Guide.

Does ZFS support hot spares?

Yes, the ZFS hot spares feature is available in the Solaris Express Community Release, build 42, the Solaris Express July 2006 release, and the Solaris 10 11/06 release. For more information about hot spares, see the ZFS Administration Guide.

Can devices be removed from a ZFS pool?

Removal of a top-level vdev, such as an entire RAID-Z group or a disk in an unmirrored configuration, is not currently supported. This feature is planned for a future release and can be tracked with CR 4852783.
You can remove a device from a mirrored ZFS configuration by using the zpool detach command.
You can replace a device with a device of equivalent size in both a mirrored or RAID-Z configuration by using the zpool replace command.

Can I use ZFS as my root file system? What about for zones?

You can install and boot a ZFS root file system starting in the SXCE build 90 release and starting in the Solaris 10 10/08 release. For more information, see ZFS Boot.
ZFS can be used as a zone root path in the Solaris 10 10/08 release, but configurations that can be patched and upgraded are limited. Additional ZFS zone root configurations that can be patched and upgraded are supported starting in the Solaris 10 5/09 release.
For more information, see the ZFS Admin Guide.
In addition, you cannot create a cachefs cache on a ZFS file system.

Can I split a mirrored ZFS configuration?

Yes, you can split a mirrored ZFS configuration for cloning or backup purposes in the SXCE, build 131 release. The best method for cloning and backups is to use ZFS clone and snapshot features. For information about using ZFS clone and snapshot features, see the ZFS Admin Guide. See RFE 6421958 to recursively send snapshots that will improve the replication process across systems.
In addition to ZFS clone and snapshot features, remote replication of ZFS file systems is provided by the Sun StorageTek Availability Suite product. AVS/ZFS demonstrations are available here.
Keep the following cautions in mind if you attempt to split a mirrored ZFS configuration for cloning or backup purposes:
  • Support for splitting a mirrored ZFS configuration integrated with RFE 5097228.
  • You cannot remove a disk from a mirrored ZFS configuration, back up the data on the disk, and then use this data to create a cloned pool.
  • If you want to use a hardware-level backup or snapshot feature instead of the ZFS snapshot feature, then you will need to do the following steps:
    • zpool export pool-name
    • Hardware-level snapshot steps
    • zpool import pool-name
  • Any attempt to split a mirrored ZFS storage pool by removing disks or changing the hardware that is part of a live pool could cause data corruption.

Why has the zpool command changed?

Changes to the zpool command in Nevada, builds 125-129, are as follows:
  1. Device name changes due to integration of 6574286 (removing a slog doesn't work) - This change adds a top-level virtual device name to support device removal operations. The top-level device name is constructed by using the logical name (mirror, raidz2, and so on) and appends a unique numeric number. The zpool status and zpool import commands now display configuration information using this new naming convention. For example, the following configuration contains two top-level virtual devices named mirror-0 and mirror-1.
# zpool status
   pool: export
 state: ONLINE
 scrub: none requested

        NAME        STATE     READ WRITE CKSUM
        export      ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            c1t3d0  ONLINE       0     0     0
            c1t4d0  ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            c1t5d0  ONLINE       0     0     0
            c1t6d0  ONLINE       0     0     0
In this release, you could potentially remove the mirrored log device (mirror-1) as follows:
# zpool remove export mirror-1
# zpool status export
  pool: export
 state: ONLINE
 scrub: none requested

        NAME        STATE     READ WRITE CKSUM
        export      ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            c1t3d0  ONLINE       0     0     0
            c1t4d0  ONLINE       0     0     0
Currently, only cache, log, and spare devices can be removed from a pool.
2. New dedup ratio property due to integration of 6677093 (zfs should have dedup capability) - The zpool list command includes dedupratio for each pool. You can also display the value of this read-only property by using the zpool get command. For example:
# zpool list
export   928G  47.5G   881G     5%  1.77x  ONLINE  -
rpool    928G  25.7G   902G     2%  1.40x  ONLINE  -
# zpool get dedup rpool
rpool  dedupratio  1.40x  - 
3. The zpool list output has changed due to integration of 6897693 (deduplication can only go so far) - In previous releases, thezpool list command reported used and available physical block space and zfs list reports used and available space to the file system. The previous zpool list used and available columns have changed to report allocated and free physical blocks. These changes should help clarify the accounting difference reported by the zpool list and zfs list commands.
# zpool list
export   928G  47.5G   881G     5%  1.77x  ONLINE  -
rpool    928G  25.7G   902G     2%  1.40x  ONLINE  - 
Any scripts that utilized the old used and available properties of the zpool command should be updated to use the new naming conventions.

ZFS and Other Product Interaction Questions

  1. Is ZFS supported in a clustered environment?
  2. Which third party backup products support ZFS?
  3. Does ZFS work with SAN-attached devices?

Is ZFS supported in a clustered environment?

Solaris Cluster 3.2 supports a local ZFS file system as highly available (HA) in the Solaris 10 11/06 release. This support allows for live failover between systems, with automatic import of pools between systems.
If you use Solaris Cluster 3.2 to configure a local ZFS file system as highly available, review the following caution:
Do not add a configured quorum device to a ZFS storage pool. When a configured quorum device is added to a storage pool, the disk is relabeled and the quorum configuration information is lost. This means the disk no longer provides a quorum vote to the cluster. After a disk is added to a storage pool, you can configure that disk as a quorum device. Or, you can unconfigure the disk, add it to the storage pool, then reconfigure the disk as a quorum device.
Solaris Cluster 3.2 is not supported on the OpenSolaris or Nevada releases. For information about using the open-source Solaris Cluster version, go to the Open High-Availability Cluster community page.
ZFS is not a native cluster, distributed, or parallel file system and cannot provide concurrent access from multiple, different hosts. ZFS works great when shared in a distributed NFS environment.
In the long term, we plan on investigating ZFS as a native cluster file system to allow concurrent access. This work has not yet been scoped.

Which third party backup products support ZFS?

  • EMC Networker 7.3.2. backs up and restores ZFS file systems, including ZFS ACLs.
  • Veritas Netbackup 6.5 backs up and restores ZFS file systems, including ZFS ACLs.
  • IBM Tivoli Storage Manager client software ( backs up and restores ZFS file systems with both the CLI and the GUI. ZFS ACLs are also preserved.
  • Computer Associates' BrightStor ARCserve product backs up and restores ZFS file systems, but ZFS ACLs are not preserved.

Does ZFS work with SAN-attached devices?

Yes, ZFS works with either direct-attached devices or SAN-attached devices. However, if your storage pool contains no mirror or RAID-Z top-level devices, ZFS can only report checksum errors but cannot correct them. If your storage pool consists of mirror or RAID-Z devices built using storage from SAN-attached devices, ZFS can report and correct checksum errors.
For example, consider a SAN-attached hardware-RAID array, set up to present LUNs to the SAN fabric that are based on its internally mirrored disks. If you use a single LUN from this array to build a single-disk pool, the pool contains no duplicate data that ZFS needs to correct detected errors. In this case, ZFS could not correct an error introduced by the array.
If you use two LUNs from this array to construct a mirrored storage pool, or three LUNs to create a RAID-Z storage pool, ZFS then would have duplicate data available to correct detected errors. In this case, ZFS could typically correct errors introduced by the array.
In all cases where ZFS storage pools lack mirror or RAID-Z top-level virtual devices, pool viability depends entirely on the reliability of the underlying storage devices.
If your ZFS storage pool only contains a single device, whether from SAN-attached or direct-attached storage, you cannot take advantage of features such as RAID-Z, dynamic striping, I/O load balancing, and so on.
ZFS always detects silent data corruption. Some storage arrays can detect checksum errors, but might not be able to detect the following class of errors:
  • Accidental overwrites or phantom writes
  • Mis-directed reads and writes
  • Data path errors
Keep the following points in mind when using ZFS with SAN devices:
  • Overall, ZFS functions as designed with SAN-attached devices, as long as all the drives are only accessed from a single host at any given time. You cannot share SAN disks between pools on the same system or different systems. This limitation includes sharing SAN disks as shared hot spares between pools on different systems.
  • If you expose simpler devices to ZFS, you can better leverage all available features.
    In summary, if you use ZFS with SAN-attached devices, you can take advantage of the self-healing features of ZFS by configuring redundancy in your ZFS storage pools even though redundancy is available at a lower hardware level.