Friday, October 30, 2009

Enumerating Files in a Directory from C#

Enumerating files in a directory with C# seems straight forward, there's Directory.GetFiles that gives strings and DirectoryInfo.GetFiles that gets FileInfo objects. However both these methods return arrays, when there are very large numbers of files this can be an expensive call.

In .net 4.0 Microsoft are addressing this issue with methods like DirectoryInfo.EnumerateFiles.

We can get this functionality in .net 2.0/3.x by making native calls to the methods that GetFiles use, namely FindFirstFile, FindNextFile and FindClose. Pinvoke.net is an excellent resource for using native windows API calls from .net code, and looking at their entry on FindFirstFile gives a good example of computing the total size of a directory.

The sample code shows how native handles should be wrapped with classes deriving from SafeHandle which makes sure unmanaged resources get cleaned up, in this case calling FindClose.

Using the DllImports and associated structures provided there we can then easily write a method that uses the yield keyword to enumerate the files in a directory, without having to add them all to a collection and without the consuming code knowing about native structures like WIN32_FIND_DATA:
public static IEnumerable<string> EnumerateFiles(string directory, string filePattern)
{
string pattern = directory + @"\" + filePattern;
WIN32_FIND_DATA findData;
using (SafeFindHandle findHandle = FindFirstFile(pattern, out findData))
{
if (!findHandle.IsInvalid) // was the input valid, e.g. valid directory passed
{
do
{
if ((findData.dwFileAttributes & FileAttributes.Directory) == 0) // if not a directory
{
yield return Path.Combine(directory, findData.cFileName);
}
}
while (FindNextFile(findHandle, out findData));
}
}
}

Using the yield keywords means that at no point are the strings all loaded in to memory. If consuming code breaks the enumeration it means the code after the yield statement will not be executed, so in this case no further calls to FindNextFile would be made. Finally blocks will however be executed when the enumeration gets broken, which means that the SafeFindHandle will get disposed (a using statement implicitly uses a finally block), releasing the find handle.

This just gets file paths, but it could easily be extended to return other information from WIN32_FIND_DATA and return a set of objects. Dates can be converted from the FILETIME structure by using bit shifts and DateTime.FromFileTime, e.g.

static DateTime GetFileTime(FILETIME fileTime)
{
long fileTimeValue = (long)fileTime.dwLowDateTime
| ((long)fileTime.dwHighDateTime <<
return DateTime.FromFileTime(fileTimeValue);
}

This approach should only be used when the directories you're working with are likely to contain large numbers of files, but do allow operations to be performed on the files as the list is being retrieved from the file system.

No comments:

Post a Comment