Inside FAT: File Search
In 2013, there are plenty of file systems around. There are FAT, NTFS, HFS and many other file systems used by the many different operating systems. And yet, the oldest and simplest file system of them all is still going strong. The FAT system is aged, and has many limitations on maximum volume size and the size of a single file.
This file system is rather simplistic by today’s standards. It does not offer any kind of permission management nor built-in transaction roll-back and recovery mechanisms. No built-in compression or encryption either. And yet it is very popular for many applications. The FAT system is so simple to implement, requires so little resources and imposes such a small overhead that it becomes irreplaceable for a wide range mobile applications.
The FAT is used in most digital cameras. The majority of memory cards (for example miniSD, microSD) used in media players, smartphones and tablets are formatted with the FAT. Even Android devices take memory cards formatted with the FAT system. In other words, despite its age, FAT is alive and kicking.
Recovering Information from FAT Volumes
If the FAT system is so popular, there must be need for data recovery tools supporting that file system. In this article we’ll be sharing experience gained during the development of a data recovery tool, Hetman Partition Recovery.
Before we go talking about the internals of the file system, let’s have a brief look at why data recovery is at all possible. As a matter of fact, the operating system (Windows, Android, or whatever system that’s used in a digital camera or media player) does not actually wipe or destroy information once a file gets deleted. Instead, the system marks a record in the file system to advertise disk space previously occupied by the file as available. The record itself is marked as deleted. This way is much faster than actually wiping disk content. It also reduces wear.
As you can see, the actual content of a file remains available somewhere on the disk. This is what allows data recovery tools to work. The question now is how to identify which sectors on the disk contain information belonging to a particular file. In order to do that, a data recovery tool could either analyze the file system or scan the content area on the disk looking for deleted files by matching the raw content against a database of pre-defined persistent signatures.
This second method is often called “signature search” or “content-aware analysis”. In forensic applications, this same approach is called “carving”. Whatever the name, the algorithms are very similar. They read the entire disk surface looking for characteristic signatures identifying files of certain supported formats. Once a known signature is encountered, the algorithm will perform a secondary check, then read and parse what appears to be the file’s header. By analyzing the header, the algorithm can determine the exact length of the file. By reading disk sectors following the beginning of the file, the algorithm recovers what it assumes to be the content of a deleted file.
If you’re following carefully, you could have already noticed several issues with this approach. It works extremely slowly, and it can only identify a finite number of known (supported) file formats. Most importantly, this approach assumes that disk sectors following the file’s header do belong to that particular file, which is not always true. Files are not always stored in a consecutive manner. Instead, the operating system can write chunks into first available clusters on the disk. As a result, the file can be fragmented into multiple pieces. Recovering fragmented files with signature search is a matter of hit or miss: short, defragmented files are usually recoverable without a sweat, while long, fragmented ones may not be recovered or may come out damaged after the recovery.
In practice, signature search does work pretty well. Most files that are of any importance to the user are documents, pictures, and other similarly small files. Granted, a lengthy video may not be recovered, but a typical document or a JPEG image is usually sized below fragmentation threshold and recovers pretty well.
If, however, one needs to recover fragmented files, the tool must combine information obtained from the file system and gathered during the disk scan. This, for example, allows excluding clusters that are already occupied by other files, which, as we’ll see in the next chapter, greatly improves the chance of successful recovery.
Using Information from the File System
As we could see, signature search alone works great if there is no file system left on the disk, or if the file system is so badly damaged that it becomes unusable. In all other cases, information obtained from the file system can greatly improve the quality of the recovery.
Let’s take a large file we need to recover. Suppose the file was fragmented (as is typical for larger files). Simply using signature search will result in only recovering the first fragment of the file; the other fragments will not recover correctly. It is therefore essential to determine which sectors on the disk belong to that particular file.
Windows and other operating systems determine which sectors belong to which file by enumerating records in the file system. File system records contain information about which sectors belong to which file.
Searching for a File System
Before analyzing the file system, we must identify and locate one first. But before we start looking for a file system, let’s look at how Windows handles partitions.
In Windows, disks are described with a partition system containing one or more tables. Each table describes a single partition. The record contains the partition’s initial address as well as its length. Partition type is also specified.
1. The hard drive is divided into three partitions with corresponding volume labels.
2. This table contains information about the type, beginning and end of each partition.
In order to locate the file system, the data recovery tool must analyze the partition table, if one is still available. But what if there is no partition table left, or what if the disk has been repartitioned, and the new partition table no longer contains information about the deleted volume? If this is the case, the tool will scan the disk in order to identify all available file systems.
Table 1. Partitions Table.
When looking for a file system, the algorithm assumes that each logical volume contained a file system. Most file systems can be identified by looking for a certain persistent signature. For an instance, the FAT file system is identified by values recorded in the 510th and 511th bytes of the initial sectors. If the values recorded in those addresses are 0х55 and 0хАА, the tool will start performing a secondary check.
The secondary check allows the tool ensuring that the actual file system is found as opposed to random encounters. The secondary check validates certain values used by the file system. For example, one of the records available in the FAT system identifies the number of sectors contained in the cluster. This value is always represented with a power of two. It can be 1, 2, 4, 8, 16, 32, 64 or 128. If there is any other value stored by that address, the structure is not a file system.
The article “Algorithm of file recovery from a FAT disk” explains the process of searching for the contents of a deleted file, and provides some examples.