Friday, October 30, 2009

Enumerating Files in a Directory from C#

Enumerating files in a directory with C# seems straight forward, there's Directory.GetFiles that gives strings and DirectoryInfo.GetFiles that gets FileInfo objects. However both these methods return arrays, when there are very large numbers of files this can be an expensive call.

In .net 4.0 Microsoft are addressing this issue with methods like DirectoryInfo.EnumerateFiles.

We can get this functionality in .net 2.0/3.x by making native calls to the methods that GetFiles use, namely FindFirstFile, FindNextFile and FindClose. Pinvoke.net is an excellent resource for using native windows API calls from .net code, and looking at their entry on FindFirstFile gives a good example of computing the total size of a directory.

The sample code shows how native handles should be wrapped with classes deriving from SafeHandle which makes sure unmanaged resources get cleaned up, in this case calling FindClose.

Using the DllImports and associated structures provided there we can then easily write a method that uses the yield keyword to enumerate the files in a directory, without having to add them all to a collection and without the consuming code knowing about native structures like WIN32_FIND_DATA:
public static IEnumerable<string> EnumerateFiles(string directory, string filePattern)
{
string pattern = directory + @"\" + filePattern;
WIN32_FIND_DATA findData;
using (SafeFindHandle findHandle = FindFirstFile(pattern, out findData))
{
if (!findHandle.IsInvalid) // was the input valid, e.g. valid directory passed
{
do
{
if ((findData.dwFileAttributes & FileAttributes.Directory) == 0) // if not a directory
{
yield return Path.Combine(directory, findData.cFileName);
}
}
while (FindNextFile(findHandle, out findData));
}
}
}

Using the yield keywords means that at no point are the strings all loaded in to memory. If consuming code breaks the enumeration it means the code after the yield statement will not be executed, so in this case no further calls to FindNextFile would be made. Finally blocks will however be executed when the enumeration gets broken, which means that the SafeFindHandle will get disposed (a using statement implicitly uses a finally block), releasing the find handle.

This just gets file paths, but it could easily be extended to return other information from WIN32_FIND_DATA and return a set of objects. Dates can be converted from the FILETIME structure by using bit shifts and DateTime.FromFileTime, e.g.

static DateTime GetFileTime(FILETIME fileTime)
{
long fileTimeValue = (long)fileTime.dwLowDateTime
| ((long)fileTime.dwHighDateTime <<
return DateTime.FromFileTime(fileTimeValue);
}

This approach should only be used when the directories you're working with are likely to contain large numbers of files, but do allow operations to be performed on the files as the list is being retrieved from the file system.

Wednesday, October 21, 2009

Testing Console Applications

I did some work this week on a command line installer for RBS. I needed to make some modifications to the install process. I wanted to-do this the TDD way. I did not want to refactor the entire project just to make it testable, I thought I would try to come up with a nice way to test my console application. I needed to capture the output of the console application to check that my new features were working correctly.

RBS is remote blob storage, see the RBS blog http://blogs.msdn.com/sqlrbs/

Firstly I tried to launch the exe and capture the output, this worked well but I could not debug my new code. I know true TTDers don't use debuggers, but I needed to :)

I managed to get this working by referencing my Console Project from my test project (yes you can reference an EXE). I could then add code like :

InstallProviderSetup.Main(new string[] { "-CLIENTCONFIG" });

As you can see, you can just call the static Main method from code. To see if my test worked I need to check the output of the console application. I did this with the following code :

var stdout = GetStdOut();
Assert.AreEqual(true, stdout.Contains("The required switch CONFIGURATIONFILE"),"The required switch CONFIGURATIONFILE not in console out");
Assert.AreEqual(true, stdout.Contains("The required switch NAME"),"The required switch NAME not in console out");

The rest of the code is :

MemoryStream memoryStreamConsole;
StreamWriter streamWriterConsole;

[TestInitialize()]
public void TestInitialize()
{
memoryStreamConsole = new MemoryStream();
streamWriterConsole = new StreamWriter(memoryStreamConsole);
Console.SetOut(streamWriterConsole);
}

protected string GetStdOut()
{
streamWriterConsole.Flush();
var rval = Encoding.Default.GetString(memoryStreamConsole.ToArray());
System.Diagnostics.Trace.WriteLine(rval);
return rval;
}

Feel free to view the entire project at codeplex

http://sqlrbs.codeplex.com/


 

Diagonal are using Microsoft RBS to give Wisdom (an Electronic Records Management System (EDRM) ) Content Addressable storage (CAS). So far we have integrated with EMC. As more CAS vendors create RBS providers. Wisdom will support more CAS systems.
You can find out more about Diagonal and Wisdom by visting the wisdom website

Thursday, October 8, 2009

Private Members in Unit Tests


I was running a technical session on TDD today. Whilst I was doing my research into ways to work with private members I discovered a new way. The C# language has evolved and the tools in Visual Studio have improved providing new ways to solve old problems. You can now solve the private member problem the following ways:

I won't go into the first few too much as I believe the last one supersedes them.

#if DEBUG or TEST

You can surround your public accesses to your private members inside compiler directives so they only get built when you build you unit tests.

Internal members and internal visible to

Make your private members internal rather than private and then use the InternalsVisibleTo attribute to make them visible to your tests. (new to .net 2)

Partial classes

You can put your unit tests in a partial class and exclude this class when you build the release code (new to .net 2)

Create a private accessor

This is a feature which is new in VS2008. You can see this working as follows:
  • Create a new Class Library project
  • Add a private member to Class1 (the generated one)
    string nottelling = "my tests cannot access me";
  • Add a test project

  • Open class1.cs and right click on the text Class1.


  • Select Create Private Accessor -> TestProject1
  • This will create a Private Accessor in test project one.
  • Add a new Unit test and enter the following code
[TestMethod()]

public void Test()
{
  var class1 = new Class1_Accessor();
  Assert.AreEqual(class1.nottelling, "my tests cannot access me");
}



Old Problems may have different solutions using newer tools, you should always keep looking for new ways to solve old problems

Friday, October 2, 2009

Trees & Hierarchies in SQL


Hierarchies and trees are a challenge in SQL.
This is an example of a very simple tree

Parent ID
ID
Name

0001
Earth
0001
0002
EU
0001
0003
USA
0002
0004
UK
0002
0005
France
0004
0006
England
0006
0007
Yorkshire

With our application the Parent ID and ID columns are both GUIDs. To help with readability I have used numbers in the examples.


This looks like
Earth
  EU
    France
    UK
      England
        Yorkshire
  USA


The table structure above will work very well in most cases. It's very easy to select the parent of an item and select the children of an item. You can easily add new items or delete items. You get problems when you need to find things like all the decedents of an item.

You can use CTE (common table expressions) but under heavy load and with lots of data it just does not work.

We looked at the nested model. This does not work with simple deletes and inserts. (http://www.developersdex.com/gurus/articles/112.asp )

The final approach we took was to add an incremental number to our table and a row path column. As follows :

Row bitint
RowPath varbinary(160)
When we generate the rowpath we pad it with zeros so each row number starts at the same place. We do not use delimiters we used fixed spaces.

This is a simple representation of how the table will look. This is simple as I am just padding the rowpath with 2 zeros (not 16).

Row
Parent ID
ID
Name
RowPath
1

0001
Earth
01
2
0001
0002
EU
0102
3
0001
0003
USA
0103
4
0002
0004
UK
010204
5
0002
0005
France
010205
6
0004
0006
England
01020406
7
0006
0007
Yorkshire
0102040607


This is an example of how the data looks in SQL with it padded with 16 zeros.

0x0000000000000795
0x00000000000007950000000000000796
0x0000000000000797
0x00000000000007970000000000000798
0x000000000000079700000000000007980000000000000799
0x00000000000007970000000000000798000000000000079A
0x00000000000007970000000000000798000000000000079B
To get the child items, then we need to run the following SQL :

@path is the path that we need to find the child items from
@pathplusone is our path plus one, see the function below for how to increment this.
Select Name from Table where RowPath > @path and RowPath < @pathplusone


We increment the path with the following SQL

CREATE FUNCTION [dbo].[IncrementRowPath]
(
    @VarBinary VARBINARY(160) = 0x0000000000000000
)
RETURNS VARBINARY(160)
AS
BEGIN
    DECLARE @Result    VARBINARY(160)
    -- If the data is the wrong size.
    IF(@VarBinary IS NULL OR DATALENGTH(@VarBinary) % 8 != 0)
        SET @Result = 0x0000000000000000
    -- Increment the last 8 byte segment.
    ELSE
    BEGIN
        DECLARE @LastSegment BIGINT
        SET @LastSegment = CAST(SUBSTRING(@VarBinary, DATALENGTH(@VarBinary) - 7, 8) AS BIGINT)
        SET @Result = SUBSTRING(@VarBinary, 1, DATALENGTH(@VarBinary) - 8) + CONVERT(VARBINARY(8), @LastSegment + 1)
    END
    RETURN @Result
END

I hope this helps you out with trees.

The only issue is that you need and index on rowpath, this index is going to be large, but in our testing and our live implementations it's not caused an issue.

SPSite constructor may break AAM link translation

You can use Alternate Access Mappings (AAM) in SharePoint to host the same content on different URLs.

For example, you could have two web applications in the Default zone with the host names example.local and mysite.local, which you publish to the Internet zone using www.example.com and mysite.example.com respectively.

AAM will normally translate links in the HTML automatically, so if the end user is browsing on www.example.com (the Internet zone) the link to the My Site in the top right of the page will include the mysite.example.com host (as this URL is also from the Internet zone).

However, you may find that the default URLs are displayed in links generated by your custom code in web parts or controls, regardless of which zone you are in.

This will occur if you create a new SPSite object and only specify the URL of the site in the constructor. SharePoint will then treat all content returned from this object as being in the default zone and will not translate the links.

Ensure that when you create SPSite objects you specify the current zone in the constructor. If you don't do this, it will use the default zone (and the links retrieved from that object in Url properties, Image fields, Url fields, HTML content etc. will refer to that zone).

So, instead of doing this:

using (SPSite site = new SPSite(url))
{
...
}

Do this:

SPContext context = SPContext.Current;
// This context only exists when the code is executed from a web application. For code running from from console applications, STSADM extensions, and (potentially) event handlers, you may need to refer to a different context.
using (SPSite site = new SPSite(url, context.Zone))
{
...
}

To Spin or not to Spin and a little PFX

With the advent of multi core computers more developers are getting into spinning threads. I won’t go into how to spin a thread. But it’s getting easier, .NET4 will bring parallel extensions to the framework so you can do parallel “for” loops. Checkout http://blogs.msdn.com/pfxteam/ for more info.

You may think running something on more than one thread won’t be any faster, you will be very surprised with the results.

But just because you can spin a thread it does not mean you should.

A good rule to follow is

Don’t spin if your code is going to be executed by a web service or web site

Web sites / web services by their nature are multi threaded. IIS will decide how many threads to spin to service your content. If you spin extra threads you may add load to the server.

Do spin threads if your code is going to be executed on a desktop machine

Generally your program is not going to have multiple instances that are all very busy doing work for multiple users. So it’s good to spin threads. But do consider the effect of nesting:
If we are going to spin more than one thread for this loop :

public static void DoSubWork()
{
  // We could spin 4 threads for this
  for (int i = 0; i < 5000; i++)
  {
  }
}

And we decide to spin more threads for this loop :

public static void DoLotsOfWork()
{
  // We could spin 4 threads for this
  for (int i = 0; i < 1000000; i++)
  {
  }
}

Then we make DoLotsOfWork() call DoSubWork() we are going to be spinning lots more threads. That’s where PFX comes in. PFX will manage the thread pool count so that is does not exceed a fixed limit. So with PFX your code would look like :

public static void DoLotsOfWork()
{
  Parallel.For(1, 1000000, i; =>
  {
    DoSubWork();
  });
}

public static void DoSubWork()
{
  Parallel.For(1, 1000000, i =>
  {
  });
}

If you call DoLotsOfWork() you will see that it will not spin more than 4 threads. You can view by debugging the code and viewing the thread window (Debug – Windows – Threads) or (CTRL+D,T)

This is my thread list. (just the ones that are doing the looping)

840 ThreadExample.Program.DoLotsOfWork
1880 ThreadExample.Program.DoLotsOfWork.AnonymousMethod
6724 ThreadExample.Program.DoLotsOfWork.AnonymousMethod
4332 ThreadExample.Program.DoSubWork.AnonymousMethod
6176 ThreadExample.Program.DoSubWork.AnonymousMethod

I do think one limitation of PFX (or at least with the CTP) is that you cannot choose how many threads. It will use core count * 2. I can understand the reason for this but if you are spinning more than one thread to download lots of images from a website then you may want to base the number of threads that you spin upon bandwidth rather than CPU resources. (I will have to check .NET4 to see if this has changed)

AOP and Transactions

A project I work on started out as a .NET 1.1 solution, we made a decision to use Serviced Components to handle transactions.

We had our reasons for this, things like :

  • Developers don’t need to worry about transactions, they are handled.
  • In some installations we need to span databases.
  • Transactional Message Queues.

It worked great, but then .NET 2 came along, and we had heard good things about System.Transaction, we wanted to use it, not just because it was cool and new but serviced components did give us a few headaches.

So, we had a fully transactional API, all of our Business classes derived from a common bases class, so.. to make them 'non serviced components' we just removed the base class, but how did we start a transaction for the first call the method and commit it when the last class was disposed.

That’s were AOP comes in, I looked on the web for some examples, I found one that fitted out requirements on MSDN :

http://msdn.microsoft.com/msdnmag/issues/02/03/AOP/default.aspx

This gave me a good start.

We needed AOP as we check each method call for an error, if we get an error we set a bool, then in the dispose we just abort the transaction.

The code below shows how we did this. Note, this code was taken from a production system, I have removed some parts of it, but it should work.

[TransactionSupport()]
public abstract class AOPServicedComponent : ContextBoundObject, IDisposable
{

}

#region AOP Transaction


internal class TransactionSupportedAspect : IMessageSink
{
private IMessageSink m_next;
private bool disposing;

TransactionScope ts;

bool inerror = false;

internal TransactionSupportedAspect(IMessageSink next)
{
// Cache the next sink in the chain
m_next = next;
}

public IMessageSink NextSink
{
get
{
return m_next;
}
}

public IMessage SyncProcessMessage(IMessage msg)
{
IMessage returnMethod;
if (preProcessTransaction(msg) == TransactionParticipation.Suppress)
{
using (new TransactionScope(TransactionScopeOption.Suppress))
{
returnMethod = m_next.SyncProcessMessage(msg);
}
}
else
{
returnMethod = m_next.SyncProcessMessage(msg);
}
postProcessTransaction(msg, returnMethod);
return returnMethod;
}

public IMessageCtrl AsyncProcessMessage(IMessage msg, IMessageSink replySink)
{
throw new InvalidOperationException();
}

public static string ContextName
{
get
{
return "TransactionSupported";
}
}

private TransactionParticipation? preProcessTransaction(IMessage msg)
{

// We only want to process method calls
if (!(msg is IMethodMessage)) return null;

IMethodMessage call = msg as IMethodMessage;
Type t = Type.GetType(call.TypeName);

if (t.Name == "IDisposable")
disposing = true;
else
disposing = false;

if (!disposing)
{
// Create the Transaction
ts = new Transaction();
}

// set us up in the callContext
call.LogicalCallContext.SetData(ContextName, this);
return participation;
}

private void postProcessTransaction(IMessage msg, IMessage msgReturn)
{
// We only want to process method return calls
if (!(msg is IMethodMessage) ||
!(msgReturn is IMethodReturnMessage)) return;

IMethodReturnMessage retMsg = (IMethodReturnMessage)msgReturn;

if (disposing)
{
// Commit the transaction
if (!inerror)
{
ts.Complete();
}
// else it will be rolled back when it is disposed

ts.Dispose();
//ts = null;
}
Exception e = retMsg.Exception;
if (e != null) inerror = true;
}
}

public class TransactionSupportedProperty : IContextProperty, IContributeObjectSink
{
public IMessageSink GetObjectSink(MarshalByRefObject o, IMessageSink next)
{
return new TransactionSupportedAspect(next);
}
public bool IsNewContextOK(Context newCtx)
{
return true;
}
public void Freeze(Context newContext)
{
}
public string Name
{
get
{
return "TransactionSupportedProperty";
}
}
}

[AttributeUsage(AttributeTargets.Class)]
public class TransactionSupportAttribute : ContextAttribute
{
public TransactionSupportAttribute() : base("TransactionSupported") { }
public override void GetPropertiesForNewContext(IConstructionCallMessage ccm)
{
ccm.ContextProperties.Add(new TransactionSupportedProperty());
}
}
[AttributeUsage(AttributeTargets.Method)]
public class TransactionParticipationAttribute : Attribute
{
TransactionParticipation participation = TransactionParticipation.Required;
public TransactionParticipation Participation
{
get
{
return participation;
}
}
public TransactionParticipationAttribute() { }
public TransactionParticipationAttribute(TransactionParticipation participation)
{
this.participation = participation;
}
}
public enum TransactionParticipation
{
UseExisting,
Required,
Suppress
}
#endregion


We didn’t just leave it there, we added code to allow us to control the transactionscope with the use off attributes, just like ServicedComponents. we added code the 'post' method to to-do some tricks with exceptions (I may blog this later).