Capturing Flags: find and file

8 min readSep 29, 2020

Welcome back to my 3rd iteration of this blog series. In this series we’re growing our cybersecurity knowledge starting from the very basics using the overthewire.org challenges as a guide. First, I’d like to thank everyone for their feedback based on the last post! I’ll do my best to implement it and as always, more feedback is always welcome. Now let’s start!

We last left off with bandit3’s password so let’s go ahead and log in with our trusty ssh command. The hint for this level says that the password is in a directory called “inhere” and inside a hidden file. So let’s start by verifying that the “inhere” directory actually exists:

Cool, so let’s check inside of it now (we’ll go ahead and use ls -al since we already know the file is hidden):

There it is, our hidden file is called “.hidden” and is using the “.” to hide itself in the directory. Let’s see what’s inside:

That was nice and easy, especially armed with knowledge about hidden files, directories, and how to find and read them. On to the next.

Once again, the password for the next level is inside the directory “inhere”. There’s the ominous hint of using reset if the terminal gets “messed up”. Let’s log in and check out the contents of the directory:

We find 10 files and all of them start with the “-” that we encountered in an earlier level. As expected, trying to directly cat any of the files doesn’t work as intended. If we cat ./-file00 we get a weird output. Let’s take a look at the level hint again. It tells us that only one of the 10 files is actually “human-readable” and that it’s the one with the password. We can also see that one of the commands included in the hints is file.

file is a command that examines the contents of files and outputs the type of data that’s written inside. Looking at the man page of file, we can see that all we have to do is pass a file name to the command and it’ll just tell us what’s inside:

If we try to read “-file00”, the output is not particularly legible or understandable: because it’s a data file. Data can be anything stored in or on anything else. Even writing gibberish in the sand can be considered data. For computers, however, data refers to information that the computer itself can understand directly. In fact, computer data is not considered “human-readable” and often requires translation into “human friendly” language. In this case, we’re dealing with binary data: 0’s and 1’s. This is the simplest form of data and is essentially the only data the computers truly understand. Any data, not matter the original format., is eventually converted into binary for the processor to be able to use and act on it. Although binary data is very simple, it’s not “screen space” efficient. For example the word “word” is “01110111 01101111 01110010 01100100” in binary. Imagine reading this post in binary?

Now that we know how to know the type of data inside of a file, we can simply check each file until we find the one “human-readable” file that contains our password. But that’s tedious so let’s talk a little about scripting. Scripting is the concept of writing a (often very simple and short) program that automates a specific job. Since the jobs are often simple but repetitive, it’s very efficient to create scripts for jobs that we would have to do repeatedly.

Lucky for us, linux commands support simple scripting in the form of “regex”. “Regex” stands for “regular expressions”. Regex is a simple way to specify types of text in a way that a computer can understand it. Let’s take our situation as an example: all of our files share the same part of the name “-file” followed by two numbers. So we can pass a regex expression to the command file and it’ll automatically examine all the files for us. We can do this by passing a wildcard “*”. A wildcard is a regex that basically means “anything”. For example:

By replacing a specific number with the “*” at the end of the file name, file understood that it should examine everything that starts with “-file”. We could have been more specific and passed “-file0/d” to specify that only one digit changes in the filenames we want but the wildcard works just as well in this case.

Now that we know which file we want we can go ahead and invoke cat on it:

Great, let’s move on to level 5.

The hints for this level are very similar to the previous one but with more “qualities” or attributes to the file we’re looking for. Let’s see what we have to work with:

Yikes! That’s a lot of directories. Good thing we know a little bit about regex. Let’s try leveraging or newly minted knowledge of file:

Not particularly what we were looking there: we already know that these are directories. Let’s check inside one of them, let’s say “maybehere00”:

We have different names and colored files here. Good thing we know regex right?

Using the wildcard before and after the word “file” let the command file know that we want all the files that have “file” in their name. We could also have just passed a solitary “*” and we’d get the same result:

Once again, the wildcard tells file to match everything. In any case, none of these files seem to be the ones we’re looking for. As we can see some of them have the ASCII text designation that we identified as “human readable” from our last challenge. One of the hints specified that our desired file only uses 1033 bytes of space. So let’s check the sizes of these:

Well, it’s none of these. Notice the “h” argument we passed to ls? That’s the “human friendly” tag that writes the sizes in a format that’s easier for us to read. Compare that result to this one without the “h”:

In this case, not using the “h” argument is actually in our benefit since we’re looking for a file that is exactly 1033 bytes large and the none “human friendly” gives us more accurate sizes. Still, none of these files are what we’re looking for: the only file in this directory that comes close is “-file1” and it’s green. A green colored file in linux usually signifies that this is an executable file and we can see this with the x’s in the first column.

So far we’ve only examined one directory and there’s 19 more to sift through to find the file we’re looking for. Oh! Let’s look at the man page of find. We see that we can specify different attributes of the file we’re looking for. To review, we’re looking for a file that is: human-readable, 1033 bytes in size, and not executable. Unfortunately, find can’t necessarily find files based on them being “human-readable” but we can at least specify that we’re looking for a file and not a directory. There is just one other problem: it can look for executables but how do we look for not executables?

Enter operators: operators are special characters or words that the computer can understand to be functional. Operators are boolean, in that the result is binary: true or false. In this case, if we tell find to look for an executable file, it’ll find files where the “executable?” question returns true. If we use the not operator then we can file the “executable?” into a “not executable?”. Operators work the same way across computer programming and scripting languages but they’re designations may change. In bash (the programing language of linux terminals), the not operator is “!”. So our command is going to be find ./ -type f -size 1033c ! -executable (the “c” after the size is simply how the man page for find tells us to designate bytes). Make sure you’re in the “inhere” directory before running that command, can you guess why?:

So according to find, our file should in “maybehere07” and is called “.file2”, a hidden file. Let’s invoke cat to check what’s inside:

Awesome, we have our flag!

We’ve lightly touched on data types, regex, and boolean operators in this post and we’ll cover them in more complexity and detail later. For now, this light introduction is sufficient.

Capture the Flag or CTF challenges are essentially puzzles in which a “flag” (like a password) is stored somewhere on a computer or device that the challengers have to find. In fact, all the challenges we’ve attempted so far have been CTFs! CTFs are a great way to practice your ethical hacking skills since essentially all hacking on both spectrums is (in a way) a CTF challenge. There is some data of some kind on some machine that the hacker is trying to access. In our case, our “flags” have been the passwords for the next levels, which are authorized users of these machines.

See how easy it was to find our flag in this haystack of directories and files? This was all due to the fact that we knew exactly what attributes we were looking for. Without this knowledge, or intelligence, of the file’s attributes, it would have taken a more significant amount of effort to find the flag we’re looking for. This is another way security through obscurity really helps mitigate threats to our data. Now imagine storing sensitive data in a file called “sensitive data” or your passwords in a file called “passwords”.