Tuesday, April 1, 2008

ServiceHost vs InstanceContext

Well, I might be missing something but from what I can tell using ServiceHost doesn't make very much sense in PeerChannel application. From my experiments if you create a ServiceHost and then create a DuplexChannel, when you invoke a service operation on the channel the message actual appears to arrive twice. Also creating a 'regular' Channel (ChannelFactory instead of DuplexChannelFactory) doesn't push the message to all nodes – it seems like only some of the nodes actually receive the message. There are a few examples out there using a 'regular' Channel – but when I run more that than 3 nodes using that configuration each node just does not get the message.

So – What I do recommend is using InstanceContext housed in a separate thread and DuplexChannels for all sending of data. A quick sample might look like this…

[ServiceBehavior(InstanceContextMode = InstanceContextMode.Single,
ConcurrencyMode = ConcurrencyMode.Reentrant,
UseSynchronizationContext = false)]
public class ServiceImplementation: IPeerServiceContract
{ /* snip */ }

public static class WorkerThread
{
/* snip variables */
public static void Run()
{
service = new ServiceImplementation();
host = new InstanceContext(service);
host.Open();


channelFactory = new
DuplexChannelFactory(service,
"WCFPeerPerformance");
channelFactory.Credentials.Peer.MeshPassword = MeshPassword;
channel = channelFactory.CreateChannel();
channel.Open();
}
}

PeerChannel Performance Improvement

My advisor wanted me to drill down into the following areas to make sure I was doing things correctly since the performance was so bad on my first test runs through. Specifically he wanted me to look into:

  1. Why was I getting "The system hit the limit set for throttle MaxConcurrentCalls"
  2. How does the graph pruning work?

Well, after digging into the protocol spec and hitting Google about a few interesting sections – I've refactor rewritten the pretty much the entire application. And the good news is that my application runs ridiculously better, and so here are some of the tips/hits I figured out (with the help of some great Microsoft folks out there, MVPs and other bloggers)…

Threading Issues
First, the biggest tip off should have been that the WinForm I was using kept locking up and not responding. Having WCF run within the WinForm context is a very bad idea. If your WinForm freezes or locks up, I would start here looking at fixing that first. There are few articles[Here, Here, or Here] that talk about the UseSynchronizationContext and how that should be false to improve performance (as stated in the articles, it doesn't synchronize your callbacks to the UI thread). It seems (to me at least) that if you decorate your actual WinForm with the ServiceBehavior setting the InstanceContextMode, the ConcurrencyMode and UseSynchronizationContext it would still get nonresponsive as I added more nodes to the mesh (testing on a single box currently).

For some reason that seems to be the popular way to setup the samples out on the net – IMHO it seems much better to separate out the responsibilities into three classes 1) The UX View, 2) The Service Implementation Class, and 3) The Worker Thread Class. The Service Implementation class contains the ServiceBehavior and handles any incoming calls. The Worker Thread Class collaborates with the Service and contains both InstanceContext (which gets an instance of the Server Class) and the DuplexChannel that implement the ServiceContract. Finally the View just launches the Worker Thread on a dedicated thread. Take a look at Perf2 version on CodePlex Version 1.0 if you want to see it in action...

Graph Pruning
When I was originally running my tests, I also created a simple way to see who was connected to who. The strange thing was that it always looked fully connected. In short, the mistake I made here is that you can't assume it will start pruning right off the back. The two most important things here are Maintenance Timer needs to run clean up nodes, but that can only be effective after the LinkUtility is calculated which runs on another Timer. I need to run some more tests on this, but to be sure but it does appear to be removing some connections, just slowly… The biggest helps here were reading the actual Peer Channel Protocol Specification, the Patent and the blog post "How many channels can a mesh handle?"