Wednesday 12 August 2015

Enumerable Timer with Stopwatch using an Extension Method

There have been numerous times when I’ve wanted to run a deferred query using either Linq-to-SQL (IQueryable<T>) or Linq-to-Objects (IEnumerable<T>) to capture the execution time for analysis.  Doing this for Linq-to-Objects is relatively easy using an extension method:

 public static IEnumerable<T> WithTimer<T>(this IEnumerable<T> source, Stopwatch watch)
 {
#if TRACE
  watch.Reset();
  watch.Start();
  foreach (var item in source)
   yield return item;
  watch.Stop();
#else
  return source;
#endif
 }

We simply pass in a StopWatch which is started and stopped pre- and post- enumeration respectively.  This allows us to get the total elapsed time outside of the enumeration as follows:

 Stopwatch watch = new Stopwatch();
 Enumerable.Range(0, 1000000).Where(n => n % 2 == 0).WithTimer(watch).ToList();
 Console.WriteLine(watch.ElapsedMilliseconds);

However, doing this for Linq-to-SQL is marginally more difficult as YIELD RETURN isn't supported for IQueryable<T> as it isn't an iterator interface type.  We therefore have to create our own Queryable wrapper (TimerQueryable<T>) for the IQueryable source and then wrap the enumerator which is returned by GetEnumerator into a TimerEnumerator<T> to control starting and stopping the timer.

    public class TimerQueryable<T> : IQueryable<T>
    {
        readonly IQueryable<T> _queryable;
        readonly IQueryProvider _provider;
        readonly Stopwatch _watch;

        public TimerQueryable(IQueryable<T> queryable, Stopwatch watch)
        {
            _watch = watch;
            _queryable = queryable;
            _provider = queryable.Provider;
        }

        public IEnumerator<T> GetEnumerator()
        {
            return new TimerEnumerator<T>(this._provider.CreateQuery<T>(Expression).GetEnumerator(), _watch);
        }

        System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
        {
            return this.GetEnumerator();
        }

        public Type ElementType
        {
            get { return _queryable.ElementType; }
        }

        public System.Linq.Expressions.Expression Expression
        {
            get { return _queryable.Expression; }
        }

        public IQueryProvider Provider
        {
            get { return _queryable.Provider; }
        }
    }

    public class TimerEnumerator<T> : IEnumerator<T>
    {
        IEnumerator<T> _enumerator;
        Stopwatch _watch;

        public TimerEnumerator(IEnumerator<T> enumerator, Stopwatch watch)
        {
            _enumerator = enumerator;
            _watch = watch;
        }

        public T Current
        {
            get { return _enumerator.Current; }
        }

        public void Dispose()
        {
            _watch.Stop();
            _enumerator.Dispose();
        }

        object System.Collections.IEnumerator.Current
        {
            get { return this.Current; }
        }

        public bool MoveNext()
        {
            if (!_watch.IsRunning)
            {
                _watch.Reset();
                _watch.Start();
            }
            return this._enumerator.MoveNext();
        }

        public void Reset()
        {
            this._enumerator.Reset();
        }
    }
The StopWatch in the TimerEnumerator<T> class is started upon the enumeration in MoveNext and stopped upon the enumerator being disposed of in Dispose.  The timer isn't disposed of as we'd like to get the total elapsed time outside of the enumerator.  As IEnumerator<T> derives from IDisposable it would be fair to assume that the enumerator will be disposed correctly if the developer is adhering to the IDisposable pattern.

As there is only one instance of the StopWatch created and passed into multiple TimerEnumerator<T> instances it would be useful to mention that this isn't inherently thread-safe, but then again, do you know of an IQueryable or IEnumerable source which is?  Trying to execute two queries at the same time against an Entity Framework context in different threads would most certainly cause exceptions to be thrown.

Additionally, as we don't want this overhead in production code I have used the TRACE constant so that the timers will only work with the TRACE constant defined in the project - it doesn't make sense to run these timers when efficiency is paramount.

The full source code of the Enumerable Timer can be found on GitHub.

Wednesday 10 June 2015

Return Await vs Returning Task

I'm a true believe in trying to keep code as simple as possible by minimising the amount I have to write whilst not compromising functionality and readability.  I ran into a problem the other day when using the async/await pattern along with the Task Parallel Library (TPL).  Observe the following code snippet:

Task<int> CalculateValue()
{
    using (var calc = new Calculator())
    {
        return calc.Calculate();
    }
}

First thing to mention is I like to conform to standards - if my class is directly using unmanaged resources or any indirect unmanaged resource through another managed class then I like to implement the IDisposable pattern to ensure there are no memory leaks throughout the lifetime of the application.

So back to the problem at hand - the Calculate method on the Calculator class returns a Task<int> as this is a long running calculation which is done in another thread.  All is good except for the crucial part that upon the Calculate method being called the execution is returned back to the CalculateValue method whereby the Dispose method on the Calculator class is called long before the Calculate method completing.  The reason is simply because I didn't decorate the calc.Calculate call with an await and the CalculateValue method with an async keyword.

To resolve this problem I should change to the following:

async Task<int> CalculateValue()
{
    using (var calc = new Calculator())
    {
        return await calc.Calculate();
    }
}

So once the execution of the Task returned by the Calculate method is complete the Dispose method on the Calculator is subsequently called.

The reason I didn't implement it like this in the first place is because I don't like to add complexity to where I believe it isn't required.  Adding the await keyword on the calc.Calculate call would imply it has code to run after the call is complete - in my opinion it didn't have anything further to call apart from correctly disposing of the calculator which I easily missed.  Implementing the call this way would have been perfectly fine (no async and await keywords necessary):

Calculator _calc = new Calculator()
Task<int> CalculateValue()
{
    return calc.Calculate();
}

As I inject most of my dependencies using a DI framework it has been a rare circumstance where I've ran into this problem.

Saturday 10 January 2015

EF Mapping with Discriminator Column and Concrete Tables for Derived Types

A problem I recently ran into with Entity Framework 6 was where I wanted to use two features at the same time, specifically the Table-Per-Hierarchy mapping and Table Splitting. However, when using both of then at the same time it caused a conflict with the framework which it was unable to handle.

The following database schema was used for mapping to my domain model where I wanted the Person.PersonType column as a discriminator for my derived types.



My domain model was as follows (PersonType property has been omitted as discriminators cannot be mapped to properties):

class Person
{
 public int PersonId { get; set; }
 public string Name { get; set; }
}

class Customer : Person
{
 public string CustomerNumber { get; set; }}

class Operator : Person
{
 public string OperatorNumber { get; set; }
}
The model mapping I had in place was as follows:
  modelBuilder
    .Entity<Person>()
    .HasKey(n => n.PersonId)
    .Map(map =>
    {
        map.Properties(p => new { p.Name });
        map.ToTable("dbo.Person");
    })
    .Map<Customer>(map =>
    {
        map.Requires("PersonType").HasValue("C");
        map.Properties(p => new { p.CustomerNumber });
        map.ToTable("dbo.Customer");
    });
Which caused the following SQL exception to be thrown:

EntityCommandExecutionException
An error occurred while executing the command definition. See the inner exception for details
Invalid column name 'PersonType'.Invalid column name 'PersonType'.

The SQL which was generated by the entity framework was as follows:

SELECT 
    CASE WHEN (( NOT (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL))) AND ( NOT (([Project2].[C1] = 1) AND ([Project2].[C1] IS NOT NULL)))) THEN '0X' WHEN (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL)) THEN '0X0X' ELSE '0X1X' END AS [C1], 
    [Extent1].[PersonId] AS [PersonId], 
    [Extent1].[Name] AS [Name], 
    CASE WHEN (( NOT (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL))) AND ( NOT (([Project2].[C1] = 1) AND ([Project2].[C1] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL)) THEN [Project1].[CustomerNumber] END AS [C2], 
    CASE WHEN (( NOT (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL))) AND ( NOT (([Project2].[C1] = 1) AND ([Project2].[C1] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) ELSE [Project2].[OperatorNumber] END AS [C3]
    FROM   [dbo].[Person] AS [Extent1]
    LEFT OUTER JOIN  (SELECT 
        [Extent2].[PersonId] AS [PersonId], 
        [Extent2].[CustomerNumber] AS [CustomerNumber], 
        cast(1 as bit) AS [C1]
        FROM [dbo].[Customer] AS [Extent2]
        WHERE [Extent2].[PersonType] = N'C' ) AS [Project1] ON [Extent1].[PersonId] = [Project1].[PersonId]
    LEFT OUTER JOIN  (SELECT 
        [Extent3].[PersonId] AS [PersonId], 
        [Extent3].[OperatorNumber] AS [OperatorNumber], 
        cast(1 as bit) AS [C1]
        FROM [dbo].[Operator] AS [Extent3]
        WHERE [Extent3].[PersonType] = N'O' ) AS [Project2] ON [Extent1].[PersonId] = [Project2].[PersonId]

As you can see from the generated SQL it is trying to use the discriminator column on our satellite tables (dbo.Operator and dbo.Customer) rather than on our base table (dbo.Person).  As of yet I haven’t found a way to change the way in which the entity framework mapping works for the discriminator column to achieve both TPH mappings and Table Splitting.  To get around this problem I had to create a view on top of all of the table schemas involved in the hierarchy mapping:

create view dbo.vw_PersonExtended
as

    select
        p.Name, p.PersonId, p.PersonType, c.CustomerNumber, o.OperatorNumber
    from
        dbo.Person p
        left join dbo.Customer c on c.PersonId=p.PersonId
  left join dbo.Operator o on c.PersonId=o.PersonId
And mapping this view to the base class type Person and removing the derived class table mapping as follows:
   modelBuilder
    .Entity<Person>()
    .HasKey(n => n.PersonId)
    .Map(map =>
    {
        map.Properties(p => new { p.Name });
        map.ToTable("dbo.vw_PersonExtended");
    })
    .Map<Customer>(map =>
    {
        map.Requires("PersonType").HasValue("C");
        map.Properties(p => new { p.CustomerNumber });
    });

The derived types Person and Operator are no longer mapped to their individual concrete tables, but instead their values come from the mapping in the view.

Additionally, we have to create either an INSTEAD OF INSERT trigger on top of this view or create a stored procedure to allow our DB context to support persisting new entities.

The stored procedure would be:

create procedure dbo.usp_InsertCustomer
    @Name varchar(50),
    @CustomerNumber varchar(50)
as
begin

        set nocount on
        declare @id int

        insert into dbo.Person (Name, PersonType)
        values (@Name, 'C')
        set @id = scope_identity()

        insert into dbo.Customer (PersonId, CustomerNumber)
        values (@id, @CustomerNumber)

        select @id as PersonId

end

And mapped as follows:

modelBuilder
        .Entity<Customer>()
        .MapToStoredProcedures(map => map.Insert(i => i.HasName("usp_InsertCustomer")));

The obvious drawback to this approach is that you have to create views and stored procedures on the database side when you want to implement the discriminator hierarchy pattern in Entity Framework as well as the plumbing work in the database context mapping. This however shouldn't impede the performance of the implementation, it'll just make the implementation more time consuming.