Recovering Information
with Signature Search

Signature search is a major data recovery technology that revolutionized the entire recovery market. Products using signature search can do things that appear magic. Tool employing signature search can undelete files deleted a long time ago, recover data from formatted and repartitioned hard drives, and extract information from disks with damaged or missing file systems.

Signature Search

Those who read our previous article “Why Deleted Files Can Be Recovered” will undoubtedly question those magic capabilities. Indeed, classic data recovery tools base their efforts on scanning the file system, detecting records pointing to deleted files or folders, and determining the exact location of the deleted files by analyzing existing file system records. Note that the file system record about a file must at least exist on the disk in order to be detected, located and analyzed.

But what if there is no such record? What if the file was deleted a long time ago and the corresponding record was overwritten with another one? Or what if the entire file system was erased (by formatting the disk), corrupted or destroyed after a system failure? If this is the case, classic file recovery tools will fail to locate any meaningful information on the disk.

Signature Search

Signature search was a technology that was developed and released at the same time by several different companies. There are multiple trade names used to describe the technology. Different companies name it “Power Search”, “Content-Aware Analysis”, “Smart Scan”, and no doubt there are other names for this technology. Under the hood, however, all these algorithms use the same underlying principle.

How Signature Search Works

Detecting Files

Signature search borrowed its main operating principle from anti-virus programs. Anti-virus tools will read the entire file, scanning its content for known signatures in order to identify viruses. Similarly, signature search will read the entire contents of the hard disk or the whole volume or partition, analyzing every sector for characteristic signatures that could belong to known file types. There’s no miracle, as many files do have some very characteristic signatures that will make them easy to identify. In addition, most such signatures are located at the very beginning of each sector, making the analysis even faster and more reliable. Examples of such signatures are “JFIF” for JPEG, “PK” followed by certain binary information for ZIP archives, “%PDF-“ for Adobe PDF files, and so on.

Some files do not a characteristic signature, but can still be located. This includes text and HTML files that only use a certain subset of encoded data from ASCII character table.

Determining File Length

After locating the beginning of a file, the signature search algorithm will perform further analysis in order to calculate the file’s length. The length of *.zip, *.jpeg, *.avi, *.psd, *.pst, *.rar, *.tiff files can be often derived from the file’s header. Sometimes, the tool will have to continue reading subsequent disk sectors in order to locate a marker defining the end of the file, and sometimes (as is the case with text and HTML files) the end of the file is determined by the appearance of a certain number of non-ACSII symbols.

After detecting the beginning and length of a file, signature search can attempt the recovery.

Limitations of Signature Search Algorithms

Signature search does no magic. There are cases where even the best algorithm can’t recover a file. Obviously, if disk space occupied by the original file is partially overwritten with other data, the file will be only partially recoverable at best. However, things such as disk fragmentation play a more important role in limiting the usefulness of signature search. Larger files (e.g. movie clips) are often stored on the disk in multiple fragments scattered around the disk surface. This often occurs when there is not enough consecutive free space available on the disk to store the file in one chunk. Such files can’t be recovered correctly with signature search.

Hybrid Algorithms

Some of the limitations of signature search can be lifted by applying hybrid analysis approach. Hybrid algorithms will analyze the file system and then scan the disk surface, getting the best of the two worlds and achieving the best possible recovery rates. Most data recovery tools on the market today are using the hybrid approach.

Author: Michael Miroshnichenko