File Formats

File formats, or the standardized way that information is encoded for storage, is represented in a file in many ways. Without this information, we wouldn't know easily how to decode the files (at least not without a lot of guesswork) and the files would become meaningless jumbles of data. Therefor determining the file format is often the first step when investigating the file. This prevents bad actors from slipping evidence under our noses as investigators.

Are file extensions trustworthy?

We can look at the file extension of the file to make a guess about its format. If it is a .jpg file then it is probably an image file in JPEG format. However, file extensions can outright lie to us. Nothing out there is stopping us, or a bad actor, from renaming a text file document.txt file to become document.jpg. So, what other ways are there to detect file formats?

Magic numbers

A different approach is to look at the binary representation of a file. At the start of the file, there is a section called the file header. In the file header, the first few bytes often are the magic number of the file. These represent the file format of the file. For example, a GIF file may have the magic number (when decoded to ASCII) GIF87a. These numbers can be used to guess the file format of the file, and require more technical skills to manipulate.

Take a guess

Finally, if the previous methods failed, it's time to make a best guess. Aside from using computer programs, you can also make a guess manually based on the files size, data, and other metadata. If you are a UNIX user, you can also use the terminal file command to check the file format. For example, you can run file mysteryFile to see its understanding of mysteryFile. It is available by default on most if not all UNIX and UNIX-like operating systems.

Tools