Friday, February 21, 2014

Improved Method To Avoid A System.InvalidOperationException Multi-Threaded Race Condition In CRM 2011

In order to improve performance creating connections to CRM via the C# SDK, we followed the steps outlined here and here, to cache the ServiceManagement and Credentials objects.  After some quick testing, everything ran smoothly and we pushed the changes to prod.  A couple days later, these exceptions started showing up in our logs:
System.InvalidOperationException: Collection was modified; enumeration operation may not execute.
   at System.ThrowHelper.ThrowInvalidOperationException(ExceptionResource resource)
   at System.Collections.Generic.List`1.Enumerator.MoveNextRare()
   at System.Collections.Generic.List`1.Enumerator.MoveNext()
   at Microsoft.Xrm.Sdk.Client.ServiceConfiguration`1.CreateLocalChannelFactory()
A quick Google search turned up this page, which listed the problem being caused by a race condition when creating a connection to CRM, and enabling the proxy types on the OrganizationServiceProxy.  I decided to look into what the SDK was actually doing to create the race condition and this is what I found in the call site where the exception was being thrown:
private ChannelFactory<TService> CreateLocalChannelFactory()
{      
     lock (ServiceConfiguration<TService>._lockObject)
      {
          ServiceEndpoint local_0 = new ServiceEndpoint(this.CurrentServiceEndpoint.Contract, this.CurrentServiceEndpoint.Binding, this.CurrentServiceEndpoint.Address);
          foreach (IEndpointBehavior item_0 in (Collection<IEndpointBehavior>)this.CurrentServiceEndpoint.Behaviors)
              local_0.Behaviors.Add(item_0);
          local_0.IsSystemEndpoint = this.CurrentServiceEndpoint.IsSystemEndpoint;
          local_0.ListenUri = this.CurrentServiceEndpoint.ListenUri;
          local_0.ListenUriMode = this.CurrentServiceEndpoint.ListenUriMode;
          local_0.Name = this.CurrentServiceEndpoint.Name;
          ChannelFactory<TService> local_2 = new ChannelFactory<TService>(local_0);
          if (this.ClaimsEnabledService || this.AuthenticationType == AuthenticationProviderType.LiveId)
              ChannelFactoryOperations.ConfigureChannelFactory<TService>(local_2);
          local_2.Credentials.IssuedToken.CacheIssuedTokens = true;
          return local_2;
      }
}
Even with the Lock statement, the CurrentServiceEndpoind.Behaviors call apparently was still throwing the exception. I then checked the OrganizationServiceProxy to see what EnableProxyTypes() was doing:
public void EnableProxyTypes(Assembly assembly)
{
      ClientExceptionHelper.ThrowIfNull((object) assembly, "assembly");
      ClientExceptionHelper.ThrowIfNull((object) this.ServiceConfiguration, "ServiceConfiguration");
      OrganizationServiceConfiguration serviceConfiguration = this.ServiceConfiguration as OrganizationServiceConfiguration;
      ClientExceptionHelper.ThrowIfNull((object) serviceConfiguration, "orgConfig");
      serviceConfiguration.EnableProxyTypes(assembly);
}
Hmm, nothing there looked to be updating the Behaviors collection. Must be something in the OrganizationServiceConfiguration’s EnableProxyType():
public void EnableProxyTypes(Assembly assembly)
{
      ClientExceptionHelper.ThrowIfNull((object) assembly, "assembly");
      ClientExceptionHelper.ThrowIfNull((object) this.CurrentServiceEndpoint, "CurrentServiceEndpoint");
      lock (this._lockObject)
      {
          ProxyTypesBehavior local_0 = this.CurrentServiceEndpoint.Behaviors.Find<ProxyTypesBehavior>();
          if (local_0 != null)
              ((Collection<IEndpointBehavior>) this.CurrentServiceEndpoint.Behaviors).Remove((IEndpointBehavior) local_0);
          this.CurrentServiceEndpoint.Behaviors.Add((IEndpointBehavior) new ProxyTypesBehavior(assembly));
      }
}
And there it was, the cause of the race condition. One thread calls CreateLocalChannelFactory() while another thread calls EnableProxyTypes(). Even though they are different OrganizationsServiceProxies, they share the same ServiceConfiguration. Even though they are both wrapped in lock statements, they are using different lock objects.

The fix suggested by the only Google result for this error is to add a check to see if the ServiceConfiguration’s CurrentServiceEndpoint has any EndpointBehaviors before enabling the proxy types. This still potentially (although very unlikely) still allows for the exception to occur. I decided to simplify it. Since the only thing the OrganizationServiceProxy does in it’s EnableProxyTypes() is pass the call onto the ServiceConfiguration’s EnableProxyTypes(), and since the ServiceConfiguration is shared amongst all threads, the call to EnableProxyTypes() can be performed directly after the creation of the ServiceConfiguration, before it is returned and used by the OrganizationServiceProxy. This removes the race condition as well as having to check for any existing behaviors before calling EnableProxyTypes(). Below is the utility class that we use to create our OrganizationServiceProxies, sharing the ServiceConfiguration and Credentials.

Before diving into the code, here are some things that aren’t shown below:
  • CrmServiceEntity is a class that contains all of the information required to connect to CRM. It overrides Equals(), so it is a valid Key to use.
  • GetOrAddSafe is an extension method that ensures that only once CrmServiceCreationInfo object gets created per CrmServiceEntity.
And now the code, which is pretty simple. A call comes in to CreateService, with a CrmServiceEntity parameter. GetOrAddSafe is then called, looking for any existing value in the ConcurrentDictionary. If none is found, it calls the constructor for CrmServiceCreationInfo with the CrmServiceEntity parameter. Keep in mind this is locked and will not be called twice for the same CrmServiceEntity.

The ServiceConfiguration and ClientCredential get created as normal, but then if Proxy Types are enabled, the EnableProxyTypes is called on the Service Configuration. There is some reflection magic required to call the method since it is an internal class, as well as a fail safe incase Microsoft ever changes the class name. But the end result is before the ServiceConfiguration object is ever returned, it should have it’s proxy settings set, which means the race condition will never happen. Enjoy!
private static ConcurrentDictionary<CrmServiceEntity, CrmServiceCreationInfo> _crmServiceCreationInfos = new ConcurrentDictionary<CrmServiceEntity, CrmServiceCreationInfo>();
private static readonly object _crmServiceCreationLock = new object();
private static OrganizationServiceProxy CreateService(CrmServiceEntity entity)
{
      var crmServiceCreationInfo = _crmServiceCreationInfos.GetOrAddSafe(_crmServiceCreationLock, entity, e => new CrmServiceCreationInfo(e));
      var orgService = new OrganizationServiceProxy(crmServiceCreationInfo.ServiceConfiguration, crmServiceCreationInfo.ClientCredential);
      if (entity.ImpersonationUserId != Guid.Empty)
      {
          orgService.CallerId = entity.ImpersonationUserId;
      }
      return orgService;
}


private class CrmServiceCreationInfo{
      public IServiceManagement<IOrganizationService> ServiceConfiguration { get; set; }
      public ClientCredentials ClientCredential { get; set; }
      public CrmServiceCreationInfo(CrmServiceEntity entity)
      {
          var orgUri = GetOrganizationServiceUri(entity);
          ServiceConfiguration = ServiceConfigurationFactory.CreateManagement<IOrganizationService>(orgUri);
          ClientCredential = GetCredentials(entity);
          if (entity.EnableProxyTypes)
          {
              // As of at least CRM 2011 Rollup 15 there exists the potential that sharing the Service Configuration and EnablingProxyTypes could cause a 
              // System.InvalidOperationException: Collection was modified; enumeration operation may not execute.
              //    at System.ThrowHelper.ThrowInvalidOperationException(ExceptionResource resource)
              //    at System.Collections.Generic.List`1.Enumerator.MoveNextRare()
              //    at System.Collections.Generic.List`1.Enumerator.MoveNext()
              //    at Microsoft.Xrm.Sdk.Client.ServiceConfiguration`1.CreateLocalChannelFactory()
              // http://social.microsoft.com/Forums/en-US/d8d81294-5c11-4490-824d-649c653c7335/linq-exception-occurs-while-retrieving-paged-crm-data-in-a-multihreaded-manner

              // Rather than not enabling the proxy types if it has already been enabled which could still cause the issue,
              // enable it here which is guaranteed to only execute once.
              // type should be of type OrganizationServiceConfiguration which is an internal type.  If something changes
              // Create a temporary OrganizationServiceProxy to then fix the issue.

              var type = ServiceConfiguration.GetType();
              var method = type.GetMethod("EnableProxyTypes", new[] { typeof(System.Reflection.Assembly) });
              if (method == null)
              {
                  LogManager.GetCurrentClassLogger().Warn("EnableProxyTypes doesn't exist for " + type.FullName);
                  using (var orgService = new OrganizationServiceProxy(ServiceConfiguration, ClientCredential))
                  {
                      orgService.EnableProxyTypes(GetEarlyBoundProxyAssembly());
                  }
              }
              else
              {
                  method.Invoke(ServiceConfiguration, new Object[] { GetEarlyBoundProxyAssembly() });
              }
          }
      }
}