ReFS file system structure and data recovery algorithm

ReFS or Resilient File System is a new file system based on NTFS code. As any system of the kind, it has both advantages and disadvantages, but the essential fact is that ReFS is meant to address the major issues that NTFS suffers from. It is more resilient to data damage, can handle heavy workloads better and is easily scalable for very big file systems.

ReFS file system structure and data recovery algorithm

Introduction

The new file system, ReFS, is the product of further development of its predecessor, NTFS. It supports reparse points, the technology which was previously included into NTFS only. The reparse points help to implement support for symbolic links and mounting points in Windows.

Main functions:

  • Metadata integrity with checksums.
  • Integrity streams: the method of writing data to disk for additional protection of information in cases when a part of the disk gets damaged.
  • Allocate on write transactional model (also known as copy on write).
  • Higher size limits for partitions (volumes), files and directories.
  • Storage pooling and virtualization for easier creation of volumes and file system management.
  • Segmentation of serial data known as data striping for better performance, redundancy for fault tolerance.
  • Support of background disk cleaning known as disk scrubbing for protection against latent disk errors.
  • Data salvage around the damaged area of the disk.
  • Shared storage pools across machines for additional failure tolerance and load balancing.
  • Compatible with the widely used features of NTFS.
  • Data verification and auto-correction.
  • Maximal scalability.
  • The file system cannot be disabled due to isolation of bad blocks sectors.
  • Flexible architecture using the Storage Spaces feature which was designed and implemented specifically for ReFS.

In addition, ReFS inherits many features from NTFS, including BitLocker encryption, access control lists – ACL, USN journal, changes notifications, symbolic links, junction points, mount points, reparse points, volume snapshots, file IDs and oplocks.

Of course, the data from ReFS will be available for clients through the same APIs currently used in all operating systems to access partitions formatted in NTFS.

Peculiarities

Go to view
Software RAID in Windows 10, Storage Spaces and Data Recovery from RAID drives 💻⚕️🤔

Software RAID in Windows 10, Storage Spaces and Data Recovery from RAID drives 💻⚕️🤔

Peculiarities of ReFS file system:

The file system uses checksums for metadata, and it can also use checksums for file data. When reading or writing a file, the system examines the checksum to make sure it is correct. In this way, data distortion can be monitored in real-time mode.

If the file system detects damaged data which has no alternative copy for recovery, ReFS will remove such data from the disk immediately. In such case, you don’t need to restart the computer or disconnect the media – which is required if you’re using NTFS.

You no longer need to use the chkdsk utility as the file system is corrected automatically the moment an error appears. The new system is also resilient to other cases when data becomes corrupt.

Better reliability for data storage. ReFS uses B+ trees for all on-disk structures, including metadata and file data. The file size, number of files in a folder, total volume size and number of folders in a volume are limited by 64-bit numbers. Free disk space is counted by a hierarchical allocator which includes three separate tables for large, medium, and small chunks. File name and path name length is limited to 32 Kibibytes of Unicode characters.

The new file system is also more resilient to damage that can be caused to your data in any other way. For example, when you update file metadata – for example, a file name – NTFS will edit the file metadata directly. If your computer breaks down, crashes or there is a power cut in the middle of the process, data could get damaged. On the contrary, when you update file metadata in ReFS, it will create a new copy of metadata, and the updated metadata will be assigned to the file only after all new information is written. This way, there is no danger for the file metadata to become corrupt. This approach is known as Copy-on-write.

ReFS is integrated with the virtualization technology known as Storage Spaces, which enables mirroring and combining several physical storage devices within one computer or network.

However, this file system doesn’t support named streams, short names, compression and encryption at the file level, Encrypting File System, as well as NTFS transactions, hard links, extended attributes, and disk quotas.

How it differs from NTFS

ReFS is newer and supports larger volumes and longer file names than NTFS. In the long-term prospect, these are very important developments.

In NTFS file paths are limited to 255 characters. Meanwhile, ReFS supports over 30 thousand characters (32 768) in a file name.

NTFS has a theoretical maximum capacity of 16 exabyte, while ReFS boasts the unbelievable 262 144 exabyte. Most of the time, it doesn’t change the current situation too much, but it’s a good reserve for the future.

In ReFS you won’t find some of NTFS functions such as data compression, encrypting file system, hard links, extended attributes, data deduplication and disk quotas. Nevertheless, ReFS is compatible with various features. For example, if you can’t encrypt certain data at the file system level, ReFS still supports BitLocker encryption.

Windows 10 won’t let you format any partition into ReFS, and at the moment, ReFS can be used only for storage spaces where its features help protect your data from any damage. In Windows Server 2016, you can format volumes with ReFS instead of NTFS. However, you can’t use ReFS for a boot volume, as Windows can only boot from an NTFS disk.

These days, ReFS is only used on server versions of Windows and on Windows Enterprise (also known as LTSC).

File system architecture

In spite of ReFS and NTFS often mentioned as being similar, the actual thing they share is compatibility of some metadata structures. The way how ReFS disk structure is implemented differs completely from other Microsoft file systems.

The main structural elements of the new file system are B+ trees. All elements of the file system structure can be of single-level (leaves) or multi-level (trees) type. Such approach allows for greater scalability for almost any element of the file system. Together with real 64-bit addressing for all system elements, it excludes possible bottlenecks if the file system is to be scaled any further.

File system architecture

In addition to the B+ tree root record, all other records have the metadata block size of 16 KB. Intermediate (address) nodes have a small size (about 60 bytes). That is why we usually need a small number of tree levels to describe even very large structures, which certainly improves overall system performance.

The main structural element of the file system is the Directory presented in the form of a B+ tree with the key as a number of the folder object. Contrary to other similar file systems, a file in ReFS is not a separate key element of the Directory but it only exists as a record in the folder which contains it. Perhaps, this architectural feature explains why ReFS doesn’t support hard links.

Leaf directories are typified records. For a folder object, there are three main types of records: a directory descriptor, an index record and a nested object descriptor. All such records are packaged as a separate B+ tree with a folder identifier. The root of this tree is a leaf of the Directory B+ tree. It allows to pack almost any number of records. At the lower level in the leaves of the B+ tree there is primarily a directory descriptor record containing basic information about the directory such as name, standard information, file name attribute etc.

Further in the directory are the so-called index entries: short structures with directory elements’ data. Compared with NTFS, these records are considerably shorter which means the volume has to store less metadata. The last elements are directory items’ records. For folders these elements contain the name of the folder as well as the folder identifier in the Directory and the structure of the standard information. For files, the identifier is missing but instead, the structure contains all the basic data about the file including file fragments of the root tree. Hence, a file can consist of almost any number of fragments (chunks).

Files on disk are located in 64KB blocks. They are addressed in exactly the same way as metadata blocks (in 16 KB clusters). The residency of file data on ReFS is not supported so a file of 1 byte on disk will take up a whole block of 64 KB which results in significant redundancy of storage on small files. On the other hand, it simplifies the management of free space and a new file allocation process is much faster.

The metadata size of an empty file system is about 0.1% of the size of the file system itself (i.e., about 2 GB on a 2 TB volume). Some basic metadata is duplicated which improves failure resilience.

ReFS file system structure

You can identify a file system as ReFS by the following signature at the beginning of the partition:

00 00 00 52  65 46 53 00  00 00 00 00  00 00 00 00 ...ReFS.........
46 53 52 53  XX XX XX XX  XX XX XX XX  XX XX XX XX FSRS

ReFS pages are 0x4000 bytes in length.

ReFS file system structure

On all inspected systems, the first page number is 0x1e (0x78000 bytes after the start of the partition containing the file system). This is in line with Microsoft documentation which states that the first metadata directory is at a fixed offset on the disk.

Other pages contain various system, directory, and volume structures and tables as well as journaled versions of each page.

The first byte of each page is its page number.

The first 0x30 bytes of every metadata page form the Page Header which looks as follows:

byte  0: XX XX 00 00  00 00 00 00  YY 00 00 00  00 00 00 00
byte 16: 00 00 00 00  00 00 00 00  ZZ ZZ 00 00  00 00 00 00
byte 32: 01 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00

dword 0 (XX XX) is the page number which is sequential and corresponds to the 0x4000 offset of the page;

dword 2 (YY) is the journal number or sequence number;

dword 6 (ZZ ZZ) is the Virtual Page Number, which is non-sequential

The Object Table, virtual page number 0x02 associates object identifiers with the pages on which they reside. Here we can see AttributeList consisting of records of Key / Value pairs.

We can use them to look up the object ID of the root directory and retrieve the page where it resides:

50 00 00 00 10 00 10 00 00 00 20 00 30 00 00 00 – total length / key and value borders
00 00 00 00 00 00 00 00 00 06 00 00 00 00 00 00 – object identifier
F4 0A 00 00 00 00 00 00 00 00 02 08 08 00 00 00 – page identifier / flags
CE 0F 85 14 83 01 DC 39 00 00 00 00 00 00 00 00 – checksum
08 00 00 00 08 00 00 00 04 00 00 00 00 00 00 00

The object table entry for the root directory, containing its page (0xAF4)

When retrieving pages by ID or virtual page number, look for the ones with the highest sequence number as those are the latest copies of the shadow-write mechanism.

Directories, from the root directory down, follow a consistent pattern. They are comprised of sequential lists of data structures whose length is determined by the first word value (attributes and attribute lists).

List are often prefixed with a header attribute defining the total length of the attributes that follow, which make up the list.

In either case, attributes may be parsed by iterating over the bytes after the directory page header, reading and processing the first word to determine the next number of bytes to read.

Various attributes take on different semantics including references to subdirectories and files as well as branches to additional pages containing more directory contents.

The structures in a directory listing have one of the following formats:

Base Attribute

The simplest basic attribute consisting of a block whose length is given at the very start.

Below, there is an example of a typical attribute:

a8 00 00 00  28 00 01 00  00 00 00 00  10 01 00 00
10 01 00 00  02 00 00 00  00 00 00 00  00 00 00 00
00 00 00 00  00 00 00 00  a9 d3 a4 c3  27 dd d2 01
5f a0 58 f3  27 dd d2 01  5f a0 58 f3  27 dd d2 01
a9 d3 a4 c3  27 dd d2 01  20 00 00 00  00 00 00 00
00 06 00 00  00 00 00 00  03 00 00 00  00 00 00 00
5c 9a 07 ac  01 00 00 00  19 00 00 00  00 00 00 00
00 00 01 00  00 00 00 00  00 00 00 00  00 00 00 00
00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
00 00 00 00  00 00 00 00  01 00 00 00  00 00 00 00
00 00 00 00  00 00 00 00

Here you can find a section of 0xA8 length containing the following four file timestamps. See more below:

a9 d3 a4 c3  27 dd d2 01 - 2017-06-04 07:43:20
5f a0 58 f3  27 dd d2 01 - 2017-06-04 07:44:40
5f a0 58 f3  27 dd d2 01 - 2017-06-04 07:44:40
a9 d3 a4 c3  27 dd d2 01 - 2017-06-04 07:43:20

It is safe to assume that either:

  • one of the first fields in any given attribute contains an identifier detailing how the attribute should be parsed, or
  • the context is given by the attribute’s position in the list.
  • attributes corresponding to the given meaning are referenced by this address or identifier

Records

Key / Value pairs – their values are given in the first 0x20 bytes of the attribute. These are used for associated metadata sections with files whose names are recorded in the keys and contents are recorded in the value.

Below, find a typical Record example:

40 04 00 00  10 00 1A 00  08 00 30 00  10 04 00 00  @.........0.....
30 00 01 00  6D 00 6F 00  66 00 69 00  6C 00 65 00  0...m.o.f.i.l.e.
31 00 2E 00  74 00 78 00  74 00 00 00  00 00 00 00  1...t.x.t.......
A8 00 00 00  28 00 01 00  00 00 00 00  10 01 00 00  ¨...(...........
10 01 00 00  02 00 00 00  00 00 00 00  00 00 00 00  ................
00 00 00 00  00 00 00 00  A9 D3 A4 C3  27 DD D2 01  ........©Ó¤Ã'ÝÒ.
5F A0 58 F3  27 DD D2 01  5F A0 58 F3  27 DD D2 01  _ Xó'ÝÒ._ Xó'ÝÒ.
A9 D3 A4 C3  27 DD D2 01  20 00 00 00  00 00 00 00  ©Ó¤Ã'ÝÒ. .......
00 06 00 00  00 00 00 00  03 00 00 00  00 00 00 00  ................
5C 9A 07 AC  01 00 00 00  19 00 00 00  00 00 00 00  ..¬............
00 00 01 00  00 00 00 00  00 00 00 00  00 00 00 00  ................
00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ................
00 00 00 00  00 00 00 00  01 00 00 00  00 00 00 00  ................
00 00 00 00  00 00 00 00  20 00 00 00  A0 01 00 00  ........ ... ...
D4 00 00 00  00 02 00 00  74 02 00 00  01 00 00 00  Ô.......t.......
78 02 00 00  00 00 00 00  ...(cutoff)               x.......

Here we see the Record parameters set by the first row:

  • total length - 4 bytes = 0x440
  • key offset - 2 bytes = 0x10
  • key length - 2 bytes = 0x1A
  • flags / identifier - 2 bytes = 0x08
  • value offset - 2 bytes = 0x30
  • value length - 2 bytes = 0x410

The record finishes after the value, 0x410 bytes after the value start at 0x30, or 0x440 bytes after the start of the record (which lines up with the total length).

The record corresponds to a file created on the disk.

Here the first attribute in the record value is the simple attribute we discussed above, containing the file timestamps. It is followed by Reference Attribute List Header.

With those stamps, we look for records with '0' or '8’ values for w/ flag. The value '4' occurs often, this indicates a Historical Record, or a Record that has since been replaced with another.

Since records are prefixed with their total length, they can be thought of a subclass of Attribute.

AttributeList, or list header, contains the block of attributes.

At first glance, they are simple attributes of length 0x20 but upon further inspection we consistently see it contains the length of a large block of attributes. After parsing AttributeList, we should read the remaining bytes in the List before moving to the next attribute.

20 00 00 00  A0 01 00 00  D4 00 00 00  00 02 00 00 - list header specifying total length (0x1A0) and padding (0xD4)
74 02 00 00  01 00 00 00  78 02 00 00  00 00 00 00
80 01 00 00  10 00 0E 00  08 00 20 00  60 01 00 00
60 01 00 00  00 00 00 00  80 00 00 00  00 00 00 00
88 00 00 00  ... (cutoff)

Directory Tree Branches

Directory Tree Branches are Attribute Lists where each Attribute corresponds to a record whose value references a page which contains more directory contents.

When encountering the header AttributeList with flag value 0x301 we should

  • iterate over the attributes in the list,
  • parse them as records,
  • use dword in each value as the page to repeat the directory traversal process (recursively).

Additional files and subdirectories found on the referenced pages should be appended to the list of current directory contents.

SubDirectories

SubDirectories are records in the directory's Attribute List whose key contains the Directory Metadata flag (0x20030) as well as the subdirectory name.

The value of this record is the corresponding object identifier which can be used to lookup the page containing the subdirectory in the object table.

A typical subdirectory Record:

70 00 00 00  10 00 12 00  00 00 28 00  48 00 00 00
30 00 02 00  73 00 75 00  62 00 64 00  69 00 72 00 - here we see the key containing the flag (30 00 02 00) followed by the directory name ("subdir2")
32 00 00 00  00 00 00 00  03 07 00 00  00 00 00 00 - here we see the object identifier as the first qword in the value (0x730)
00 00 00 00  00 00 00 00  14 69 60 05  28 dd d2 01 - here we see the directory timestamps
cc 87 ce 52  28 dd d2 01  cc 87 ce 52  28 dd d2 01
cc 87 ce 52  28 dd d2 01  00 00 00 00  00 00 00 00
00 00 00 00  00 00 00 00  00 00 00 10  00 00 00 00

Such directories are records whose key contains a flag (0x10030) followed by the filename.

The value is far more complicated, though, and we discovered some basic attributes allowing us to pull timestamps and content from the file system, there is still more to be deduced as to the semantics of this record's value.

The File Record value consists of multiple attributes, though they just appear one after each other, without a List Header. We can still parse them sequentially given that all attributes are individually prefixed with their lengths and the file record value length gives us the total size of the block.

The first attribute contains 4 file timestamps at an offset given by the fifth byte of the attribute (though this position may be coincidental as the timestamps could just reside at a fixed location in this attribute).

The second attribute appears to be the header of an attribute list containing the File Reference.

In this single attribute list, the first attribute contains the length of the file, while the second is the header for yet another list. This attribute also contains a record whose value contains a reference to the page where the file contents actually reside.

----------------------------------------
| ...                                  |
----------------------------------------
| File Entry Record                    |
| Key: 0x10030 [FileName]              |
| Value:                               |
| Attribute1: Timestamps               |
| Attribute2:                          |
|   File Reference List Header         |
|   File Reference List Body(Record)   |
|     Record Key: ?                    |
|     Record Value:                    |
|       File Length Attribute          |
|       File Content List Header       |
|       File Content Record(s)         |
| Padding                              |
----------------------------------------
| ...                                  |
----------------------------------------

In spite of being complicated, each level can be parsed in a similar manner to all other attributes and records, just taking care to parse attributes into their correct levels and structures.

As far as actual values, the file length is always seen at a fixed offset within its attribute (0x3c) and the content pointer resides in the second qword value of the record file. This pointer is simply a reference to the page, the file contents of which can be read verbatim.

SubDirectories

Although ReFS is characterized by improved security and efficient data storage features, it can’t protect important information entirely from accidental deletion, virus attacks, or other things that may cause a data loss. You have to consider the probability of such issues in the future, and get ready with a reliable utility that would be able to fix problems with deleted files.

The search algorithm used by a popular data recovery tool, Hetman Partition Recovery

The best solution to quickly solve such problems should be a specialized data recovery tool.

Go to view

Hetman Partition Recovery allows analyzing the disk storage managed by ReFS file system with the signature analysis algorithm. Analyzing the storage device sector by sector, the program finds certain byte sequences and represents these to the user. Recovering data from an ReFS storage space is no different from doing it in NTFS file system.

The tool recovers data from any devices, regardless of the cause of data loss.

During fast scan, the program looks for the Volume Header which is located in sector 0 (while its copy is located in the last sector). The header contains the information which the program needs for further analysis - the number of bytes in a sector, and the number of sectors in a cluster. When this data is collected, the program finds Superblock, which is stored in block 30. The superblock has two copies - one in the second block from the end of the disk, and one in the third block. From the superblock, the program detects links to checkpoints: there are two checkpoints and they can be found at the addresses specified in the superblock. Following the two addresses, the program finds Virtual Allocated Clock, and uses this data to determine which of the two checkpoints is relevant at the moment. As we know, Windows initially modifies the first checkpoint, and only if the operation is successful, it proceeds to copy the data to the second checkpoint.

The checkpoint contains general tables. From there, our program reads Page Header and the block containing data. This block gives us pointers for every table (that is, links to all general tables).

To transform virtual addresses to physical addresses, it needs to find Container Table. Then, the virtual address is used to find Object ID Table to obtain all tables.

After that, the program searches for information page by page, trying to identify their level. If it’s a level 0 leaf, the data we are looking for is read. If it’s not, the program will look for the path to another level until it finally reaches level 0 node where the required data is located.

Even if one of the elements in this file system structure is damaged or corrupt, the algorithm used for full analysis lets our program exclude those broken links and reach the required information which should be recovered.

The future of this new file system is quite uncertain. Microsoft may refine ReFS to replace the outdated NTFS in all versions of Windows. At the moment, however, ReFS cannot be used everywhere, and is applied for certain tasks only.

Vladimir Artiukh

Author: , Technical Writer

Vladimir Artiukh is a technical writer for Hetman Software, as well as the voice and face of their English-speaking YouTube channel, Hetman Software: Data Recovery for Windows. He handles tutorials, how-tos, and detailed reviews on how the company’s tools work with all kinds of data storage devices.

Oleg Afonin

Editor: , Technical Writer

Oleg Afonin is an expert in mobile forensics, data recovery and computer systems. He often attends large data security conferences, and writes several blogs for such resources as xaker.ru, Elcomsoft and Habr. In addition to his online activities, Oleg’s articles are also published in professional magazines. Also, Oleg Afonin is the co-author of a well-known book, Mobile Forensics - Advanced Investigative Strategies.

Recommended For You

Hello! This is AI-based Hetman Software virtual assistant, and it will answer any of your questions right away.
Start Chat