Tuesday, November 3, 2009

Linux – disk usage (du) human readable AND sorted by size

SkyHi @ Tuesday, November 03, 2009
This is quick tip to fix a problem that has always bugged me – When showing disk usage in a human readable form (KB, MB, GB) for each subdirectory using “du -sh *”, how can you properly sort it into size order.

If you just want the solution here it it…

alias duf='du -sk * | sort -n | perl -ne '\''($s,$f)=split(m{\t});for (qw(K M G)) {if($s<1024) {printf("%.1f",$s);print "$_\t$f"; last};$s=$s/1024}'\' Put it into ~/.bashrc to make it permanent. But if you can spare a minute or two you might get some ideas about how to write those programmatic aliases, in this case using perl. When using the linux “du” command I like to make the file size human readable, so 8709100 becomes 8.4G, This is achieved by doing this: du -sh * Now, the main problem with the K, M and G filesize suffixes is that you can’t sort them. If you try to pipe that through sort by using du -sh * | sort you’ll get something like this 8.4G Desktop 2.6G Documents 12K keys 12M Pictures 536K scripts or if we sort numerically du -sh * | sort -n you’ll get something like this 2.6G Documents 8.4G Desktop 12K keys 12M Pictures 536K scripts Obviously both these commands are not working as we are intending, because the K(ilo) M(ega) G(iga) suffixes mess up “sort”, The solution is a one liner wrapped up into an alias ‘duf’ for ‘disk usage formatted’ alias duf='du -sk * | sort -n | perl -ne '\''($s,$f)=split(m{\t});for (qw(K M G)) {if($s<1024) {printf("%.1f",$s);print "$_\t$f"; last};$s=$s/1024}'\' When expanded out, formatted and commented the code looks like this

du -sk * | sort -n | //get usage in KBytes and sort
perl -ne ' //we use perl to reformat the filesize in K M & G
($s,$f)=split(m{\t}); //splits the size/filename pair
for (qw(K M G)) { //loops for each size
if($s<1024) { //if s<1024 weve found the correct suffix printf("%.1f",$s); //display the size print "$_\t$f"; //display the filename last //line completed }; $s=$s/1024 //for each sizes suffix divide by 1024 }'


du -h --max-depth=1


This produces the output we intended like this.

12.0KB keys
536.0KB scripts
11.7MB Pictures
2.5GB Documents
8.3GB Desktop

Here some useful additions that are worth adding as an edit to my original post:

1) Purely as a shell script, without the perl overhead - source 'inataysia' reddit
du -sk * | sort -n | while read size fname; do for unit in k M G T P E Z Y; do if [ $size -lt 1024 ]; then echo -e "${size}${unit}\t${fname}"; break; fi; size=$((size/1024)); done; done

2) As a function, instead of an alias - which allows you to pass paramters to du - source 'fire'
function duf {
du -sk "$@" | sort -n | perl -ne '($s,$f)=split(/\t/,$_,2);for(qw(K M G T)){if($s<1024){$x=($s<10?"%.1f":"%3d");printf("$x$_\t%s",$s,$f);last};$s/=1024}' } Combining together would probably make the best solution so far. function duf { du -sk "$@" | sort -n | while read size fname; do for unit in k M G T P E Z Y; do if [ $size -lt 1024 ]; then echo -e "${size}${unit}\t${fname}"; break; fi; size=$((size/1024)); done; done } If this has been useful to you, and would like to buy me a coffee (via paypal) please click here. May 13th, 2009 | Tags: Command Line, Computing, Linux, Linux tips | Category: Computing, Linux Command Line Tips | 15 comments 15 comments to Linux – disk usage (du) human readable AND sorted by size * Casper May 14th, 2009 at 8:16 am Thanks for this cool script. One caveat is to watch out to use this in some top-level folder, it can take a very long time to finish. (Wish we had file systems that maintained directory size somehow.) * Twitted by metoikos May 14th, 2009 at 9:42 am [...] This post was Twitted by metoikos – Real-url.org [...] * Curtis May 15th, 2009 at 2:38 am du -s * | sort -n | sed -Ee ’s/^[0-9]+./”/’ -e ’s/$/”/’ | xargs du -sh Perl-less implementation; a little extra effort for filenames with spaces. (Yours doesn’t have to worry about that, obviously.) * Michael Speer May 15th, 2009 at 3:11 am http://www.nabble.com/Human-readable-sort-td23223205.html Never discount simply fixing the underlying problem. * Alex Shinn May 15th, 2009 at 5:04 am That’s always bugged me as well! I’ve made a few changes, though, so that it produces the same formatted output as du -sh. Also, as a function it can take arguments: function duf { du -sk "$@" | sort -n | perl -ne '($s,$f)=split(/\t/,$_,2);for(qw(K M G T)){if($s<1024){$x=($s<10?"%.1f":"%3d");printf("$x$_\t%s",$s,$f);last};$s/=1024}' } * Chris May 15th, 2009 at 5:50 am @casper I don’t. I prefer not to pay an additional cost on every write, to speed up this far-less-frequent case. * Leonid Volnitsky May 15th, 2009 at 6:08 am Latest version of sort (part of coreutils) supports -h (correct sorting of M,k,G suffixes). * fire May 15th, 2009 at 7:02 am I use the following… you see it use ‘du’ two times, but this is not really slower, ’cause the operating system caches. # sorted du -hsc function duhs() { du -s $* | sort -n | cut -f 2- | while read a; do du -sh $a; done } * DVoita May 15th, 2009 at 7:09 am If you modify du -sk * to du -sk * .??* you can see hidden dot files as well. * chris May 15th, 2009 at 8:04 am Thanks to inataysia on reddit for a bash only version du -sk * | sort -n | while read size fname; do for unit in k M G T P E Z Y; do if [ $size -lt 1024 ]; then echo -e "${size}${unit}\t${fname}"; break; fi; size=$((size/1024)); done; done * Gordon Mohr May 18th, 2009 at 12:11 am Why not promote the ‘human-readability’ step to a standalone utility? Let’s call it ‘hu’ for ‘human units’. Hypothetically, it would convert any whitespace-delimited numbers found on stdin to human-readable units when echoing to stdout. (Optional arguments could limit this conversion to just certain fields or to alternate unit systems.) Then the solution would be: du -sb * | sort -n | hu * Michael Speer May 18th, 2009 at 3:37 pm Sat Jan 20 06:00:09 1996 Jim Meyering (——@na-net.ornl.gov) —snip— * du.c (main): New options –human-readable (-h) and –megabytes (-m). (human_readable): New function. From Larry McVoy (——@sgi.com). Ever since this patch was included in fileutils, system administrators have been frustrated by finding that while they could `du -h` they could not then `sort -h` the output. -h is not posix but is now solidly a part of the gnu coreutils du and ls commands. Including a switch for sort that respects the switch for du was not my invention. It has been argued a number of times on the developers mailing list. Mine was simply the straw which broke the camels back. The additional switch is consistent with the other tools, and merely augments the purpose of sort without creating a differing utility to it. Something of the functionality of `hu` may have been the appropriate fix in ‘96, but since the ‘96 -h switch is long set, adding a corresponding switch to sort seems only too appropriate. To `promote’ -h out of du, df and ls into a separate utility would break scripts of users that depend on it. * Josh May 18th, 2009 at 6:45 pm You can also set the BLOCK_SIZE environment variable to the value human-readable and all the GNU coreutils that report sizes will respect it. * Jason Sares May 19th, 2009 at 6:54 am my solution du -s * 2>/dev/null | sort -n | cut -f2 | xargs du -sh 2>/dev/null
*
Gordon Mohr
May 19th, 2009 at 10:46 pm

I like ‘-h’ too; it doesn’t have to go away for ‘hu’ to also exist and be useful in other contexts, or when people need a sort to precision hidden by ‘-h’ rounding.