I find find(1) to be useful

I recently shared Tom Limoncelli’s excellent critique of the BSD find(1) man page in the documentation channel at work. One of my coworkers responded with “that’s why I just use mlocate”, and that made me very sad. Sure, mlocate is a great tool if you know there’s a file somewhere that has a particular name (assuming it was created before the last time updatedb was run), but that’s about the best you can do.

There are plenty of examples on how to use find out there, but I haven’t written a “here’s a basic thing about Linux” post in a while, so I’ll add to the pile. find takes, at a minimum, a path to find things in. For example:

find /

will find (and print) every file on the system. Probably not all that useful. You can change the path argument to narrow things down a bit, but that’s still probably not all that useful to you. So let’s throw in some additional arguments to constrain it. Maybe you want to find all the JPEG files in your home directory?

find ~ -name '*jpg'

But wait! What if some of them have an uppercase extension?

find ~ -iname '*jpg'

Aw, but I bet some of the pictures have an extension of .jpeg because 8.3 is so 1985. Well, we can combine them in a slightly ugly fashion:

find ~ \( -iname '*jpeg' -o -iname '*jpg' \)

Oh, but you have some directories that end in jpg? (Why you named a directory “bucketofjpg” instead of “pictures” is beyond me) We can modify it to only look for files!

find ~ \( -iname '*jpeg' -o -iname '*jpg' \) -type f

Or maybe you’d just like to find those directories so you can rename them later:

find ~ \( -iname '*jpeg' -o -iname '*jpg' \) -type d

It turns out you’ve been taking a lot of pictures lately, so let’s narrow this down to ones whose status has changed in the last week.

find ~ \( -iname '*jpeg' -o -iname '*jpg' \) -type f -ctime -7

You can do time filters based on file status change time (ctime), modification time (mtime), or access time (atime). These are in days, so if you want finer-grained control, you can express it in minutes instead (cmin, mmin, and amin, respectively). Unless you know exactly the time you want, you’ll probably prefix the number with a + (more than) or – (less than). The time arguments are probably the ones I use most often.

Maybe you’re running out of disk space, so you want to find all of the gigantic (let’s define that as greater than 1 gigabyte) files in the log directory:

find /var/log -size +1G

Or maybe you want to find all the files owned by bcotton in /data:

find /data -owner bcotton

You can also look for files based on permissions. Perhaps you want to find all of the world-readable files in your home directory to make sure you’re not oversharing.

find ~ -perm -o=r

So far, all we’ve done is print the file paths, which is useful, but sometimes you want to do more. find has a few built in actions (like -delete), but it’s true power comes in giving input for other commands to act on. In the simplest case, you can pipe the output to something like xargs. There’s also the -exec action, which allows you to execute more complicated actions against the output. For example, if you wanted to get the md5sum of all of your Python scripts:

find ~ -type f -name '*.py' -exec md5sum {} \;

(Yes, you could pipe to xargs here, too, but that’s not the point.) Note the \; at the end. That’s very important.

Warning! You can really cause a world of hurt if you’re not careful with the output of find. Files that contain spaces or other special characters might cause unexpected behavior when passed to another command. Be very careful. One way to mitigate your risk is to use -ok instead of -exec. This prompts you before executing each line (but it might get tedious if you have a lot of lines to process). The -ls action escapes special characters, so that might be useful when piping to another program.

This post only begins to scratch the surface of what find can do. Combining tests with boolean logic can give you incredible flexibility to find exactly the files you’re looking for. Have any favorite find expressions? Share them in the comments!

 

 

3 thoughts on “I find find(1) to be useful

  1. Honestly, I most often do find . | grep whatever. I know there’s much more power there, but never felt like I had the time to learn it when the above would do what I wanted. Thanks for breaking it down.

  2. What’s a JPED file?

    ctime is NOT the creation time.

    Perhaps a more interesting discussion is why is there a find(1) at all? Seems very contrary to the Unix philosophy — why not just have an ls -lR, grep and xargs?

  3. Greg,

    JPED is the “Jeez, Please Enter Dacharachterscorrectly” file format. 🙂 I’ve fixed that glitch.

    Shame on me for not reading the man page more closely, I always thought ctime was creation time, but it clearly is not. I’ve updated the post to correct that. Thanks (as always) for the education!

    To the main point, I agree that find is not very Unixy in the sense that it does many things. You could certainly do a lot of stuff with a chain of ls, grep, and xargs, but more complicated searches can get pretty funky. Can you imagine how awful my expression to grep the output of ls for all executable files owned by bcotton modified in the last 30 minutes but not in the last 10 minutes would be? There’s a lot to be said for the Unix philosophy, but I think this is definitely a case to treat it more like the Pirates’ Code.

Leave a Reply

Your email address will not be published. Required fields are marked *