Inside NTFS: Files in the NTFS System
In this article, we’ll tell you how file recovery works with NTFS-formatted disks. In our previous article we were discussing how data recovery tools are able to recover information. In that article we covered FAT, one of the two major file systems used in Windows computers. Today, we’ll be discussing the other file system: the NTFS. The authors of this article used their experience by developing Hetman Partition Recovery, a universal data recovery tool working with both FAT and NTFS formatted devices.
What is NTFS?
Microsoft developed the New Technology File System as a file system to replace the already aging FAT. Instead of attempting to maintain compatibility with the older systems, Microsoft decided to develop the new file system from scratch. As a result, the NTFS was an all-new design, dropping the legacy of the file allocation table and implementing a revolutionary modular approach, making the new file system design logical and straightforward as never before.
Compared to FAT, the new file system was made extremely robust and feature rich. Up to these days, the NTFS is still among the best file systems, serving reliably on millions of computers.
Everything is a File
In FAT, no single entity could be called a “file”. The NTFS took the definition of a file to the whole new level, introducing a completely reworked concept of storing information. In the new file system all types of data down to system structures are universally presented as files. Moreover, in NTFS the file system itself is stored in individual files!
NTFS stores all system and administration data of the file system in files. This is the same information that other file systems keep in hidden areas normally located at the beginning of the disk with fixed physical addresses. In NTFS there is no need to reserve any specific physical addresses on the disk for any specific type of data such as file allocation tables, partitions table or transaction logs. This information is stored as ordinary files that can be physically located anywhere on the NTFS volume. If required, these files can be resized (usually grown; the file tables grow fast when the number of files stored on the volume increases). When resizing these files, the file system uses exactly the same mechanisms applicable to all other files such as pictures and documents. Moreover, if there is no contiguous chunk of free space available on the volume, the file system will simply fragment the file by using the available chunks of free space.
This concept signifies a major difference between the NTFS and most other file systems. Unlike other file systems, the NTFS has no fixed structure tied to certain physical addresses on the HDD. Unlike FAT, it does not have specific areas dedicated to system structures, file tables or data. In NTFS, the entire file system is considered a data area, so any file can be stored in any part of the volume. The only unavoidable exception is the boot sector and boot code located in the first several sectors of the volume.
Master File Table (MFT)
NTFS stores information about the files and directories in the Master File Table (MFT). This file table contains information about every file and directory listed in the file system. Each file or directory has at least one record in MFT.
The format of the MFT records is extremely simple. Each record is exactly 1 KB in size. The first 42 bytes in the header have a fixed structure, while the rest of the record is used to store attributes such as the file name or system attributes. The number of attributes as well as the size of each attribute can vary.
Unique to NTFS is the ability to store small files right on the spot. The entire content of a small file can be stored as an attribute in an MFT record, greatly improving reading performance and decreasing wasted disk space (“slack” space).
Fig. 1. An MFT record including the header and three attributes.
MFT Record Format
According to specifications, MFT record size is determined by the value of a variable in the boot sector. In practical terms, all current versions of Microsoft Windows are using records sized 1024 bytes. The first 42 bytes store the header. The header contains 12 fields. The other 982 bytes do not have a fixed structure, and are used to keep attributes.
MFT record format is simple and well laid out, ensuring fast file operations for normal work while also providing means for locating deleted files.
You may consider MFT records as deposit boxes with a label. The label (the first 42 bytes) identifies and describes the box, while space inside the box (982 bytes) allows stuffing a variety of things (attributes). Their number and size are only limited by the available space.
Addressing MFT Records
MFT records are addressed in a 48-bit system. The first record has the address of zero. The address of the last record changes as the MFT grows. The address of the last record can be measured by dividing the size of the $MFT file by the size of each record. Considering the fact that each record is sized exactly 1 KB in all existing versions of Windows, this task is trivial.
All MFT records are numbered. Each record has a 16-bit index number called MFT record number. This number increases every time a new record is created.
Let us take, for example, an MFT record 313 with an index number of 1. If we delete a file allocated by that record, and allocate it to a different file, the MFT record will receive an index value of 2.
The file address is formed in the following way. The address of an MFT record is joined with the MFT record number occupying the high 16 bits. This way, the system creates a unique 64-bit base file address.
Fig. 2. Base file address made by joining the MFT record address with its number.
To address an MFT record, the NTFS uses the MFT record number. The use of a unique number offers an extra convenience when it comes to detecting and fixing the damage in the file system. For example, if an error occurs at the time a data structure is being allocated to a new file, the system can determine whether the record belongs to the new file or its predecessor by the MFT record number. For us, this means that the MFT record number can be used to recover information from NTFS volumes.
As we’ve been discussing before, NTFS is a unique file system. Unlike FAT, the NTFS does not has a fixed record structure. Each MFT record bears minimal structuring. Each record has a header and space for storing a variety of attributes. In NTFS, anything can be an attribute up to and including the actual content of a file.
Attributes can hold many types of information. Obviously, different types of data can be stored in a variety of formats and occupy more or less space in the MFT record.
Fig. 3. An MFT record with a header, two attributes and unused space.
So as we figured, attributes may contain any kind of data. However, every attribute has a header. Header format is standard for all attributes; the content may vary greatly.
In the article “Inside NTFS: File Recovery Algorithm”, we will describe the process of searching for and recovering a deleted file.