Continuing our series on how to take full advantage of your filesystem with tips and tricks for the newbie and old pro alike.
The first article in this series showed ways to use the names of files and directories as a simple database — to organize collections of data and find them quickly from either a GUI menu, from a program, or from the command line. Although you don’t need to read that article to understand this one, it’s a good idea to review the four points in the filesystem introduction, especially the point about pathnames.
This month we’ll look at ways to build those database-like systems. Some of these techniques are so specific that they’re intended more as examples (an idea of what’s possible) than as specific solutions to everyday problems.
If you won’t be using the filesystem as a database, you might still be interested in the ways we stretch the shells and utilities. You’ll see curly-brace and arithmetic evaluation operators, along with printf(1), to build whole directory trees with just a few commands. We’ll also use the almost-forgotten (but useful!) bc and jot utilities.
When you’re planning a new project, or if you need to organize a mess of files, you’ll probably sit down and think of a logical system. The principles in the previous column can help. For instance, sorting filenames with shell wildcards may be easier if names have the same length. (So, if the names have digits, add leading zeroes as needed to give numbers the same number of digits in all filenames.)
An obvious way to make a series of directories is with a loop in the shell (or other scripting language). Build the directory names in one or a set of variables. For instance, to make directories named foo1 through foo9, bar1 through bar9, and so on, try the nested for loops below. (You can type these directly at a shell prompt, by the way; you don’t need to make a script file. And indentation isn’t required.)
for name in foo bar baz ... do for num in 1 2 3 4 5 6 7 8 9 do mkdir "$name$num" done done
Most shells have curly-brace operators that create a series of space-separated words. (The bash manpage explains these in the section “Brace Expansion”.) Braces are a great time-saver when you’re creating a series of names. You could write the previous example as:
mkdir {foo,bar,baz}{1,2,3,4,5,6,7,8,9}
Q: Wouldn’t the shell wildcard operator
[1-9]
be simpler in the previous example than the curly-brace operators{1,2,3,4,5,6,7,8,9}
?A: Wildcard operators only match existing filenames. The curly-brace operators create strings, not filenames — so you can use them to create filenames that don’t exist yet. (The shell doesn’t “know” it’s creating directories; it simply outputs space-separated names that are passed to mkdir.)
You can build trees (more than one level) using the mkdir option -p
. It creates intermediate directory levels as needed. For instance, here’s one way to build all of the structure in last month’s example Figure One:
for top in archive browsing current do let i=0 while [[ i -le 99 ]] do printf -v level2 '%02d' "$i" mkdir -p $top/$level2/{0,1,2,3,4,5,6,7,8,9}00 let i++ done done
The printf stores two-character directory names in $level2
. (%02d
adds a leading zero if $i
is less than 10.) This makes all directory names the same length —
which, as mentioned earlier, is important for sorting. So, when $top
is archive and $i
is 0, mkdir creates archive/00/000, archive/00/100, and so on, through archive/00/900.
You could actually do the same thing by replacing the inner loop with curly-brace operators, as the next example shows.
Tip: Copy the brace pattern
{0,1,2,3,4,5,6,7,8,9}
with your mouse or your editor, then paste it as many times as needed.
for top in archive browsing current
do mkdir -p $top/{0,1,2,3,4,5,6,7,8,9}{0,1,2,3,4,5,6,7,8,9}/{0,1,2,3,4,5,6,7,8,9}00
done
To see what mkdir commands the shell will run, add echo
before mkdir -p
. If you’re feeling really maniacal ;-)
, you don’t need the loop at all. Just replace $top
with {archive,browsing,current}
. You’ll create the whole tree with a single mkdir. (Don’t carry this “too far,” though. On some systems, this technique may eventually fail with “arguments too long”.)
The jot utility is great for creating series of numbers, names, and much more. It’s often used to generate a list of for-loop parameters. For example, to make empty files named aa/data00.html through aa/data62.html, bb/data00.html through bb/data62.html, ., up to zz/data00.html through zz/data62.html, try:
for d in $(jot -w '%c' 26 a) do mkdir $d$d touch $d$d/data{0,1,2,3,4,5}{0,1,2,3,4,5,6,7,8,9}.html touch $d$d/data6{0,1,2}.html done
(We used two separate touch commands to help the example fit the page. You could use just one.) jot outputs 26 letters a
through z. The argument to its -w
option uses printf-style formatting. The 26
specifies the number of repetitions and the a
is the starting value; jot increments by default. Instead of jot, you could use curly-brace operators {aa,bb,cc,...,zz}
.
Finally, here’s a way to add default text to each of those files — so you could later edit them with an HTML editor (or do a global edit with a script). Make a template file; let’s call it template.html:
<html>
<head>
<title>Data File NNN</title>
</head>
<body>
<h1>Data File NNN</h1>
<p>Data set NNN will be added soon.</p>
</body>
</html>
Then use sed to read that file and output it, with the current file number in place of each NNN
string, into each new HTML file:
for d in $(jot -w '%c' 26 a) do mkdir $d$d for n in {0,1,2,3,4,5}{0,1,2,3,4,5,6,7,8,9} 6{0,1,2} do sed "s/NNN/$n/" template.html > $d$d/data$n.html done done
Renaming existing files
Here’s an example for Bourne-type shells like bash. You have a directory full of files with random names. You’d like to add a three-digit prefix to each filename so they’ll sort in a predefined order. (Of course, if ls and shell wildcards already output the filenames in the order you want, you don’t need techniques like this.)
$ ls > /tmp/files
$ vi !$
vi /tmp/files
The history operator !$
gets the filename from the previous command line. We’re using vi, but any plain-text editor will do. If you’re on a shared system, you might want a more secure location than /tmp.
$ i=0
$ while read -r oldfile
> do
> printf -v prefix '%03d' $((i++))
> echo "mv -i '$oldfile' '${prefix}_$oldfile'"
> done </tmp/files >/tmp/renamer
This uses the bash operator $((i++))
to increment $i
after getting its value. If anything else isn’t familiar, there are details in the first part of Details of the file-renaming loops.
file_count=$(wc -l < /tmp/files) max_hex=$(echo -e "obase=16\n${file_count}-1" | bc -q) prefix_width=${#max_hex} for prefix in $(jot -w "%0${prefix_width}x" "$file_count" 0) do read -r oldfile echo "mv -i '$oldfile' '${prefix}_$oldfile'" done </tmp/files >/tmp/renamer
For instance, if there are 255 files, the prefixes will need to have two hex digits. The first file will be renamed to 00_somefile and the last will be fe_somefile (fe hex is 254 decimal).
There’s a lot of shell hackery here. (If it seems too obscure, you can always use another language.) You’ll find details in the second half of Details of the file-renaming loops.
$ sh -ve /tmp/renamer
mv -i 'foo' '000_foo'
.
mv -i 'a file' '124_a file'
$
The sh option -v
shows each command line before running it, which makes it easy to keep
track of where you are and to know which command produced any error
message. The -e
option makes the shell exit immediately
if any of the commands returns a non-zero status — for instance, if one
of the files doesn’t exist.
|| break
will terminate the while loop if an mv command returns non-zero status — as the -e
option did for the script file. That’s a good safety measure.
Also, in this case, don’t use the mv option -i
because, if mv prompts you “overwrite somefile?” you won’t be able to answer since stdin has been redirected from /tmp/files. (This “shouldn’t be a problem” after you’ve gotten some experience. The -i
“ask me first” options to mv, cp, and rm are for wimps, anyway. :)
Making complex filenames
The end of the first article in this series showed complex filenames like 0012345_04_2568x3915_q75.jpg (which holds photo 12345, version 4, tells that the photo is 2568x3915 pixels and was saved at 75% quality as a JPEG file). As we’ll see next time, complex filenames can help you find or identify files quickly without reading the file (or another database) for often-needed meta-information.
That complex filename came from a nawk(1) script that reads a directory full of digital camera files with names like DSC_0001.JPG and generates mv commands — basically like this:
photo_info=.../photo_info.nawk for oldfile in DSC_????.JPG do # ... set $basenum nawk_out=`nawk -f $photo_info "$oldfile"` mv -i "$oldfile" "${basenum}_01_${nawk_out}.jpg" done
The nawk script parses the output of the ImageMagick identify utility to get meta-information from $oldfile
.
Here’s one more idea: use the line-numbering utility nl(1) to add a four-digit number before each filename from /tmp/files. Use a shell while loop with a read command to read the number into $num
and the filename into $name
. The mv option -v
shows what’s happening:
nl -n rz -w 4 < /tmp/files | while read -r num name do mv -v "$name" "${num}_${name}" done
One last note: Linux extended file attributes can store extra data about files — and help you avoid overly-long filenames. However, not all utilities (or GUI applications!) support them. The Z shell does, though; see Extended File Attributes and ZSH for details.
To summarize: This article shows some techniques to create systems of files and directories with meta-organization to help you find the data you want quickly. Of course, there are a lot of ways to organize data; these are just a few examples. Although most examples use the shell, you may pick another scripting language to do the job better.
The third column in this series will show ways to use shells and utilities to find the data you want. Once you’ve set up a system, though, you don’t have to access it with a shell or a utility. For instance, the directories and files can be opened from a GUI menu (on an application like the GIMP photo editor, for instance).
Please note: For an extended discussion of the file-renaming loops noted above, read Details of the File-Renaming Loops.
Jerry Peek is a freelance writer and instructor who has used Unix and Linux for more than 25 years. He's happy to hear from readers; see https://www.jpeek.com/contact.html.
[Read previous article]
[Read next article]
[Read Jerry’s other Linux Magazine articles]