Is there a reliable way to check the identity between files not based on their content?

This is not a question about a specific programming language, I know that Crystal has the same_file?, Ruby has the identical?, and other languages have similar apis. But at least on Windows, the implementations I know of are unreliable. I would like to know what the crystal developers think about this. :thinking:

Btw my view is: it’s impossible. I wonder if that’s true?

Have you tried same_file?. If that’s unreliable then I would think its a bug, or at least should be documented what the edge cases are.

Looks like on windows it compares the dwVolumeSerialNumber, nFileIndexHigh, and nFileIndexLow of each file. Based on BY_HANDLE_FILE_INFORMATION (fileapi.h) - Win32 apps | Microsoft Docs, it sounds like that should be fairly robust.

@Blacksmoke16 Just to clarify, what I mean by unreliable doesn’t mean a bug in the API, it’s more of a filesystem driver feature.

AFAIK almost all implementations use GetFileInformationByHandle(Ex) to retrieve file info you mentioned. If the file is on a network filesystem (e.g. CIFS), an arbitrary serial number and file index can be returned by filesystem driver. From a security point of view, it is not very reliable if all the application code knows is only two file paths and makes security decisions based on these APIs.

There may be many people who know about this, it’s not a security bug, it’s just a deep dive into a standard library implementation pattern. I’ll use Ruby as an example. Crystal has a similar pattern, but I didn’t research it.

The File.identical? method is used to compare whether two file or IO objects are the same. For files, the comparison is not based on file content, but on file system meta information. Specifically, two files are considered identical if both the volume serial number and the file id are the same, on Windows platform, the pseudocode for the partial core logic of the method looks like this:

rb_w32_file_identical_p(VALUE fname1, VALUE fname2) {
    	w32_io_info_t st1, st2;
        HANDLE f1 = 0, f2 = 0;
        
        f1 = w32_io_info(&fname1, &st1);
        f2 = w32_io_info(&fname2, &st2);
        
        if (st1.dwVolumeSerialNumber == st2.dwVolumeSerialNumber &&
    	    st1.fileIndex == st2.fileIndex) {
    		return true;    
    	} else {
    		return false;
    	}
}

The file meta information stored in the w32_io_info_t structure is returned by GetFileInformationByHandle(Ex). They take a file handle and return file information, currently we only focus on the volume serial number(VSN) and file id. A VSN is an unique number assigned to a drive by file system, and the file id is maintained by the file system.

If two files have the same vsn and file id, File.identical? considers them the same, even though they may be completely different in content. Is this possible with reasonable conditions? It is simple to make the vsn of f1 and f2 the same, i.e. let them on the same drive. The uniqueness of the file id within the same file system depends on the file system itself, but in general if both files are on the same filesystem, their ids are unique. How about if the two files are on different filesystems? To achieve goal, it means you can control the metadata returned by at least one of the FS. On Windows it’s not too hard, GetFileInformationByHandle(Ex) supports files located on the network, such as accessing via SMB, and if you can control an SMB server, it’s easy to control file info, so this is our trick.

POC

Suppose the file f1 exists in the local file system, and it path is c:\f1.txt

  1. download impacket: GitHub - SecureAuthCorp/impacket: Impacket is a collection of Python classes for working with network protocols.
  2. apply patch 0001-update.patch
  3. run cmd command to get vsn of f1.txt: vol
  4. run cmd command to get file id: fsutil file queryFileId c:\f1.txt ,
  5. run following commands to install impacket and start rogue smb server:
    python3 -m pip install -r .
    smbserver.py -smb2support -vsn [VSN] -fih [FileIndexHigh] -fil [FileIndexLow] -debug [ShareName] [Physical path]

expected result

The File.identical? should return false

actual result

The File.identical? returns true

1 Like