Inside FAT: File Recovery
Read about the process of data recovery from a FAT disk and the algorithm used for file recovery. Now when we found the file system, we can start analyzing its records. Our goal is identifying addresses of the physical sectors on the disk that contain data belonging to a deleted file. In order to do that, a data recovery algorithm will scan the file system and enumerate its records.
Search algorithm and the structure of the FAT partition table
In the first part of the article, we took a look at the search algorithm and the structure of the FAT partition table. In the FAT system, each file and directory has a corresponding record in the file system, a so-called directory entry. Directory entries contain information about the file including its name, attributes, initial address and length.
The content of a file or directory is stored in data blocks of equal length. These data blocks are called clusters. Each cluster contains a certain number of disk sectors. This number is a fixed value for each FAT volume. It’s recorded in the corresponding file system structure.
The tricky part is when a file or directory contains more than a single cluster. Subsequent clusters are identified with data structures called FAT (File Allocation Table). These structures are used to identify subsequent clusters that belong to a certain file, and to identify if a particular cluster is occupied or available.
Before analyzing the file system, it is essential to identify the three system areas.
- The first area is reserved; it contains essential information about the file system. In FAT12 and FAT16, this area is one sector long. FAT32 can use more than one sector. The size of this area is specified in the boot sector.
- The second area belongs to the FAT system, and contains primary and secondary structures of the file system. This area immediately follows the reserved area. Its size is defined by the size and number of FAT structures.
- Finally, the last area contains the actual data. The content of files and directories is stored in this particular area.
When analyzing the file system, the FAT area will be of principal interest. It is this area that contains information on files’ physical addresses on the disk.
Figure 1. Physical structure of the FAT file system.
When analyzing the file system, it is essential co correctly determine the three system areas. The reserved area always begins at the very beginning of the file system (sector number 0). The size of this area is specified in the boot sector. In FAT12 and FAT16 the size of this area is exactly one sector. In FAT32, this area may occupy several sectors.
The FAT area immediately follows the reserved area. The FAT area contains one or more FAT structures. The size of this area is calculated by multiplying the number of FAT structures by the size of each structure. These values are also stored in the boot sector.
We’re finally close to recovering our first file. Let’s assume the file has been just recently deleted, and no part of the file was overwritten with other data. This means that all clusters previously used by this file are now marked as available.
It is important to note that the system also can erases the corresponding FAT records. This means that we can get information about the file’s initial address, its attributes and size, but have no way to obtain data on any subsequent clusters.
At this point, we cannot recover the entire list of clusters that belong to the deleted file. However, we can still try to recover the file’s content by reading the first cluster. If the file is reasonably small and fits into a single cluster, great! We’ve just recovered the file. If, however, the file is larger than the size of a single cluster, we have to develop an algorithm to recover the rest of the file.
The FAT system offers no easy way to determine which clusters belong to a deleted file, so this task is always a bit of a guessing game. The simplest way is just reading the clusters following the initial one, ignoring whether or not these clusters are occupied by other files. However silly it may sound, this is the only method available if no file system is available or if the file system is empty (e.g. after formatting the disk).
The other method is more sophisticated, only reading information from clusters that aren’t occupied with data belonging to other files. This method takes into account information on clusters occupied by other files as specified in the file system.
It is logical to assume that the second method yields better results compared to the first one (assuming that the file system is available and not empty). The second method can even recover some fragmented files.
Figure 2 shows three different scenarios of recovering a file occupying 6 clusters of the file system. The file size is 7094 bytes; cluster size is 2048 bytes. This means that the deleted file initially occupied 4 clusters. In addition, we know the address of the initial cluster (cluster 56). For demonstration purposes, clusters that used to belong to the deleted files are colored blue. Red color marks clusters occupied with other data, while empty clusters are filled white.
Figure 2. Three file recovery scenarios.
- In scenario 2.A, the file occupies 4 subsequent clusters (that is, the file is not fragmented). In this case, the file can be recovered correctly by either algorithm. Both algorithms will correctly read clusters 56 through 59.
- In scenario 2.B, the file was fragmented and stored in 3 fragments. Clusters 57 and 60 are used by other file. In this scenario, the first algorithm will recover clusters 56 through 59, which will return a corrupted file. The second method will correctly recover clusters 56, 58, 59 and 61.
- In the final scenario 2.C, the deleted file was also fragmented (same clusters as in 2.B). However, clusters 57 and 60 are not used by any other file. In this scenario, both algorithms will recover clusters 56 through 59, both returning a corrupted file.
As we can see, neither method is perfect, but the second algorithm offers a higher chance of successful recovery compared to the first one. This method is used in Hetman Partition Recovery.
In our simple scenario we assumed that all parts of the file are still available and not overwritten with other data. In real life, this is not always the case. If some parts of a file are taken by other files, no algorithm will be able to recover the file completely.
As we could see, the FAT file system is pretty simple. This is exactly the reason it is so widely used by manufacturers of CompactFlash, Memory Stick and xD-Picture Card. However, when it comes to data recovery, FAT is not the best file system around. With a certain level of fragmentation, some files may or may not be recovered regardless of which approach is chosen.
Fortunately, Windows no longer uses the FAT system to format internal hard drives. The use of a much newer and more sophisticated file system, the NTFS, is enforced when formatting a system disk, and highly recommended when formatting other partitions. The NTFS is a much more complicated file system. NTFS internals will be discussed in the upcoming article.