On the T(r)ail of Open Files

Learn how Linux manages open files and explore a number of utilities to watch files as they grow.

If you’re more of a Linux user than a programmer, you may not have given much thought to how the operating system handles files. As a user, you simply give a filename to a program (from the “Open” command on a menu, from the command line, or however) and the file is (hopefully) accessed however it’s supposed to be.

However, programmers who perform low-level access of files — for instance, to seek to a particular point in a file or rewinding the file to its start — have to understand more about Linux file handling. That’s where we’re headed this month: seeing how Linux handles files that have been opened by a program, and learning how you can take advantage of this even if you don’t usually write programs that handle files.

Along the way, we’ll look at tail -f, MultiTail, and less +F. These three programs can show you what’s happening to a file as it grows (as data is added to end of the file). They’re handy for viewing log files and monitoring a long-running process.

Open Files

Let’s start with a quote from the Linux man page for open(), the low-level system call that programmers can use to open a file:

The open() system call is used to convert a pathname into a file descriptor (a small, non-negative integer for use in subsequent I/O as with read, write, etc.). When the call is successful, the file descriptor returned will be the lowest file descriptor not currently open for the process. This call creates a new open file, not shared with any other process. (But shared open files may arise via the fork() system call.) The new file descriptor is set to remain open across exec functions (see fcntl()). The file offset is set to the beginning of the file.

As in most man pages, there’s a lot of information packed into that paragraph!

When you give a file’s pathname (like /a/b/afile,../afile, or simply afile) to a program, the program opens the file to access its contents. A file can be opened for reading, for writing, or (in some cases) for both. When the Linux kernel opens a file, it returns a file descriptor — one of the numbers 3, 4, 5, and so on — which your program uses to refer to that file. The file stays open until the program closes it or until the process ends. (This rule — that a file stays open until it’s closed or the process ends — can be taken to extremes. See the sidebar “Removing Open Files?”)

Every process has at least three file descriptors assigned to it: the standard input (stdin) is file descriptor (fd) 0; the standard output (stdout) is fd 1, and standard error (stderr) is fd 2. So, if a program issues an open() call on file foo, and no other files were open so far, the contents of foo are accessible through fd 3.

After a file is opened, the kernel also tracks its file offset. This is the point in the file where the process is currently reading or writing. It’s kind of like putting a bookmark in a (paper) book to hold your place. Each time you read more and then stop reading for a while, you move the bookmark along toward the end of the book. Later, when you want to read some more, the bookmark has held your place. The file offset works the same way. You can move the file offset ahead by reading or writing data, or you can move it more furtively with a system call like seek().

Bourne-type shells — bash, for instance — let you open files and access them by their fd. We’ve seen this in the June 2004 column “Execution and Redirection.”

File This Under Linux

In Linux, basically all input and output (I/O) is done via “files” — streams of characters, accessed through a file descriptor with a file offset pointer — although many of those “files” are actually pipes, disk drives, and other character sources or sinks.

With that in mind, let’s look again at a redirected-input while loop from the May 2004 column, “Great Command-line Combinations”:

find /proj -type d -print |
while read dir
do
...commands...
done .

Figure One: Reading data from an open file



Figure One shows what’s happening at the input side of the loop. The find process is writing a string of characters (actually a series of pathnames and newlines) onto its standard output, which feeds a pipe. On the other side of the pipe, those characters are available to read in the order they were written.

(This isn’t an exact picture. The real story depends on how the shell implements the redirected-input loop. For instance, the loop is often run in a subprocess which would get its standard input from the pipe. The effect, though, is the same.) Each read process takes text from stdin (fd 0) until it reads a newline. Then it stops reading, and the file offset stays the same until the next read process comes along later (on the next pass of the loop).

In all processes, the file offset tracks where the next read (or write) will take place. An application like split can read data from an open file megabyte by megabyte. A pager like more can read data in chunks that precisely fill your screen, waiting to read the next chunk until you press the spacebar.

When the file offset reaches the end of the file, a program can see that and quit reading. (read, for instance, returns a nonzero exit status when that happens.)

Another interesting choice is to keep trying to read, waiting for the file to grow.

Watching a File Grow

Several Linux utilities don’t quit when they reach the end of a file. As was mentioned before and in previous columns, those utilities are very handy when you’re viewing a log file, like /var/log/messages, that grows over time.

One of these utilities is the “Swiss Army Knife of file-viewing programs:” less. With the command-line option +F (or, if you have multiple files, use ++F instead), less shows a file as it grows. Other commands are ignored until you interrupt less (typically with CTRL-C); then you can mark and return to places in the file, jump between multiple files, and more. To resume viewing new lines, type the command F while using less.

Another utility, MultiTail, can open multiple windows on a graphical display. This is similar to opening multiple terminal windows with a less process running in each, but it has several advantages. One is that text displayed can be color-coded and filtered so that only some lines are shown — eliminating “noise” lines from a log file, for instance. MultiTail can also run a program like netstat or w over and over, highlighting and filtering the program’s output in the same way as for files. MultiTail can be downloaded from its home page at http://vanheusden.com/multitail/.

Let’s dig into the the granddaddy of these programs, tail. By default, tail shows the last ten lines of a file and quits. Adding the -f option tells tail to keep trying once per second, outputting anything added to the end of the file.

The original tail (on old BSD systems, at least, where this author had the most experience) could only “watch” one file. Another problem was what happened if a file was replaced while tail was waiting for it to grow. For instance, at midnight each day, a cron job could rename /var/log/messages to messages.0 and create a new empty messages to hold the new day’s log messages. If you were running tail -f /var/log/messages, tail would show no new lines after midnight.

The GNU version of tail solved that problem by adding two options: --follow=name and --retry. Armed with our knowledge of open files and file descriptors, let’s learn how it works.

We’ve seen that, in general, a program opens a file by its pathname and the kernel returns a file descriptor. From then on, the program doesn’t need to know or care about the file’s pathname because it has direct access to the file’s contents. (In fact, as explained in the sidebar “Removing Open Files?”, an opened file can even be “removed” and the process will still have access to it.) So, in our example two paragraphs before, when the cron job executed mv messages messages.0, that changed the file’s pathname, but the file was still open. So, tail continues to watch the current open file — which, because it’s been renamed, no longer receives input from the system logging daemon.