What’s GNU in Old Utilities, Part One

Utility programs like ls have new features that you may not have seen. This is the first in a series about some of the handiest enhancements.

Old-timers (such as some Linux Magazine columnists) have been using Unix and Linux since the days of. well, it’s been a while. If you’re a long-time user too, you may think you know all of the command-line options and features of common utilities like ls. However, a quick glance through the ls man or info pages, though, shows that hackers have been busy adding capabilities to ls and many other longstanding utilities.

Even if you aren’t old as the hills (and you’ve never heard of the FORTRAN carriage-control utility asa), you may still be surprised at some of the things common utilities can do. That’s especially true if you’ve just come to Linux from another system that doesn’t have GNU versions of standard Unix utilities.

This month, let’s take a tour of new features of ls that could have you brushing the cobwebs off of your documentation. Along the way, we’ll also see how “block size” is calculated in GNU utilities. Let’s dig in.

Time Styles

Long-time shell programmers have had to hack around the changing date outputs of ls -l. For example, to get and parse a file’s last-modification date, code must handle files modified less than six months ago (where ls outputs the month, date, and clock time) and files modified more than six months ago (where you’d get the month, date and year).

But in GNU ls, the default time format has changed, and it’s now consistent. (You can get the old format with the option --time-style=locale. To find the setting of LC_TIME, run the locale utility.)

Listing One shows examples of the various ls -l time formats for two files: one modified within the past month and another modified six years ago.

Listing One: GNU ls time formats

$ ls -l
total 16
-rw-r-r- 1 jpeek users 1645 2005-05-04 07:52 newfile
-rw-r-r- 1 jpeek users 8362 1999-06-07 12:34 oldfile
$ ls -l --time-style=locale
total 16
-rw-r-r- 1 jpeek users 1645 May 4 07:52 newfile
-rw-r-r- 1 jpeek users 8362 Jun 7 1999 oldfile
$ ls -l --time-style=iso
total 16
-rw-r-r- 1 jpeek users 1645 05-04 07:52 newfile
-rw-r-r- 1 jpeek users 8362 1999-06-07 oldfile
$ ls -l --time-style=long-iso
total 16
-rw-r-r- 1 jpeek users 1645 2005-05-04 07:52 newfile
-rw-r-r- 1 jpeek users 8362 1999-06-07 12:34 oldfile
$ ls -l --time-style=full-iso
total 16
-rw-r-r- 1 jpeek users 1645 2005-05-04 07:52:34.000000000 -0700
newfile
-rw-r-r- 1 jpeek users 8362 1999-06-07 12:34:00.000000000 -0700
oldfile
$ ls -l -time-style='+%Y%m%d%H%M%S'
total 16
-rw-r-r- 1 jpeek users 1645 20050504075234 newfile
-rw-r-r- 1 jpeek users 8362 19990607123400 oldfile
$ ls -l -time-style='+OLD FILE: %Y%m%d
> NEW FILE: %m%d%H%M%S'
total 16
-rw-r-r- 1 jpeek users 1645 NEW FILE: 0504075234 newfile
-rw-r-r- 1 jpeek users 8362 OLD FILE: 19990607 oldfile

Starting the --time-style value with a + lets you control the date and time using the same format specifiers as the date command. If you use a two-line argument, you can give separate formats for“ recent” and“ old” files. (The argument has an embedded newline; you’ll need to quote the argument. The shell prints its secondary prompt > while you enter the second line.)

Ignoring Files in ls Listings

The option -I tells ls to ignore files matching a shell wildcard pattern. (Be sure to quote the pattern so the shell won’t interpret it.)

For example, ls -I'[0-9]*' omits filenames starting with a digit, and ls -I'*[0-9]*' skips filenames containing a digit.

Watch out when using the ls “all” options, -A and -a, which show “hidden” filenames starting with a dot. Just as with a shell, by default, the wildcard patterns with -I don’t match filenames starting with a dot!

Luckily, though, you can use multiple -I options. So, to exclude all filenames containing a digit — whether the name starts with a dot or not — you could use the command ls -A -I'*[0-9]*'-I'.*[0-9]*'.

Some programs, like the Emacs editor, create “backup” filenames ending with a tilde (~). You can exclude these with -I'*~' or with the special option -B.

ls Default Sorting Order

With GNU ls, when you sort by filename, the default sort order depends on the locale — which you can control through the LC_COLLATE environment variable, if the system default isn’t what you want. (Again, to see the system defaults, run locale.)

Listing Two shows a directory listing with GNU ls and the en_US locale; Listing Three shows the original output of ls listing the same files.

Locale can be important when a script is reading through the output of ls and needs a consistent format. To get the same order used on systems that ignore the locale, set LC_COLLATE to C or POSIX. Bourne-type shells such as bash let you set an environment variable temporarily, just for one command, like this:

$ LC_COLLATE=POSIX ls -A1
.1NAME
.1name
.Apple
.
Listing Two: GNU ls with en_US collation

$ ls -a1
.
..
1name
.1name
1NAME
.1NAME
apple
.apple
Apple
.Apple

Listing Three: Old ls collation

$ ls -a1
.
..
.1NAME
.1name
.Apple
.apple
1NAME
1name
Apple
apple

More ls Sorting Controls

ls has let you sort listings by filename (the default), by last-modification time (the -t option), last-access time (-u), last inode change time (-c), and unsorted (the order that entries appear in the directory file, with -f, which unfortunately disabled -l). But seeing listings in another order required terrible hacks — thanks especially to the varying date formats mentioned above.

The GNU version of ls has more choices:

*-S sorts the files by size, largest-first. (You don’t have to use -l to see the sizes or for the sorting to take effect.) Add -r to sort in reverse (smallest first). -S -r is handy for cleaning big files out of directories.

*-X sorts by “extension.” For example, foo.a sorts before bar.b.

$ ls
a a.foo a.foo.gz a.gz a.tar a.tar.gz b b.foo
$ ls -X
a b a.foo b.foo a.foo.gz a.gz a.tar.gz a.tar

(The Linux filesystem doesn’t actually have extensions. To Linux, foo.tar.gz is not a file with name foo.tar and extension .gz. The kernel treats string like .xyz as just a dot and some characters. Individual applications can do otherwise.)

*-v sorts by version name and number, lowest first. For example:

$ ls -v foo-1.gz foo-100.gz bar-1.gz foo-50.gz
bar-1.gz foo-1.gz foo-50.gz foo-100.gz

The info page for ls explains. You can read it with:

$ info coreutils "More details about version sort"

*-U gives unsorted listings (in the order files appear in a directory). Unlike the old -f option, -U doesn’t override -l or other formatting options.

Color Listings

You’ve probably seen ls output in color — directory names in blue, green executable files, sockets in pink, and so on. The colors are configurable.

The option --color has three optional “when” choices: --color=always, --color=auto, and --color=none. (Simple --color is equivalent to --color=always. To understand these, you first need to understand how ls makes the color.

ls outputs ANSI escape sequences to “turn on” color before it outputs each name and to “turn off” color (go back to the default color) afterward. (You can read about escape sequences in the September 2003 column “(Not So) Stupid Shell Tricks” and its supplemental page Escape Sequences: Useful Text that You Can’t See.)

Setting --color=auto outputs the color-making escape sequences if the standard output of ls is a terminal; otherwise, it doesn’t. This “automatic” choice isn’t always right, though. If you’re piping the ls output to a program that doesn’t handle ANSI escape sequences, like the less pager, this is fine: ls sees the pipe on stdout and not output escape sequences. But if you’re doing something else — such as piping ls output to sed for editing, then showing sed output on the terminal — you’ll have to force ls to output the escape sequences by using --color=always.

To see the escape sequences in action, let’s pipe ls output to cat -v, which shows non-visible characters. The results are shown in Listing Four.

Listing Four: ls output without and with color

$ ls | cat -v
ADIR
AFILE
APROG
$ ls -color=always | cat -v
^[[0m^[[01;34mADIR^[[0m
^[[0mAFILE^[[0m
^[[01;32mAPROG^[[0m

The name ADIR appears in bright blue. (Uppercase is being used to help you see the name between escape sequences.) The sequence ^[[0m cancels any current attributes, and ^[[01;31m turns on bright blue; ^[ represents an ASCII ESC character. The ^[[0m cancels the bright blue.

The name AFILE appears as standard text; the surrounding escape sequences make sure of that. The name APROG appears in bright green.

The color settings come from the the dircolors utility, which reads them from a precompiled database. You can see the database with dircolors --print-database; the comments in it tell a lot about what ls is doing.

To override the color settings, dircolors can read a database file and output a command to set the LS_COLORS environment variable. Bourne and C shells set envariables differently, so dircolors has two shell-specific options:

csh% eval `dircolors -c databasefile`

sh$ eval $(dircolors -b databasefile)

Other ls Features

Here are more changes in the GNU ls:

*-n makes a long-format (-l) type of output, but with numeric UID and GID instead of user and group names.

*-h, together with -l, shows the total block count in “human-readable” kilo- and Megabytes. For instance, instead of 3005304, file size is shown as 2.9M (or 2,9M in many non-US locales).

This also affects the Total nnn line that’s printed first in a directory listing -- where nnn is the total number of blocks used by the directory. (For sticklers, each hard link contributes to the total block count — so a file with more than one hard link in the same directory will have a misleading disk-usage count.)

The default block size is 1k (2^10 or 1024 bytes), but you can change this by setting an environment variable (see the section “Block Sizes,” next) or by setting the --block-size option. As a special case, using --block-size=si makes K equal to decimal 1,000 (10^3) and M equal to one million (10^6) -- so a file with 3,005,304 bytes would be displayed as 3.1M.

Block Sizes

One old Unix bugaboo was different“ block size” numbers on different machines. For instance, the du command on an AT&T Unix machine might assume that a data block has 512 bytes, but a BSD Unix machine would assume blocks of 1,024 bytes. Or maybe not.

What’s a hacker to do? GNU utilities — at least df, du, and ls — look for three environment variables that specify the way you want blocks to be calculated, independent of the actual system block size. The variables are checked in the following order, and the first one found is used:

*DF_BLOCK_SIZE sets the default block size for df; DU_BLOCK_SIZE does the same for du; and LS_BLOCK_SIZE is for ls.

*BLOCK_SIZE sets the default for all commands.

*POSIXLY_CORRECT sets the block size to 512.

You can also use the --block-size command-line option on (at least) df, du, and ls. If none of those are set, the default block size is (currently) 1,024 bytes.

The block size choices are complex and can’t cover them in this short article. For more information, read the block size node in the coreutils info page: type info coreutils“block size”.

As a quick example, though, if you’re working with huge files you might set the block size to petabytes (2^50 bytes) with the format PB, as in export BLOCK_SIZE=PB or ls -l-block-size=PB. You could also force a uniform one-kilobyte size by setting the format 1K within your shell script.

Jerry Peek is a freelance writer and instructor who has used Unix and Linux for over 20 years. He’s happy to hear from readers; see https://www.jpeek.com/contact.html.

[Read previous article] [Read next article]
[Read Jerry’s other Linux Magazine articles]