Wednesday 17 December 2014

SQL Scalar Value Function Inefficiencies

Are scalar value functions (SVF) really the best place to centralise the logic in your DB?

Although I’m against it many developers place business logic regardless of complexity into the database through the use of inline case statements where this could be a simple calculation to calculate a shipping cost for example. As the database grows and the number of SQL queries increases where they have a dependency on this calculation it becomes good practice to centralise the code for reuse rather than duplicating the logic in numerous queries.

One of the options for centralising this logic is through the use of scalar value functions where these calls are placed inline in the SQL statement, either in the SELECT or as a predicate in the WHERE clause. Using these inside a query actually impedes the performance as the SVF call is executed in a different context to the main query, thus causing an additional cost to the analyser.

If we take a simple Products data table with 100k records as an example (this simply generates some random numbers for the properties of a product):

set nocount on

create table dbo.temp_Products
(
 [ProductID] int,
 [Type] tinyint,
 [Price] float,
 [Weight] float 
)

declare @rowcnt int = 0
while (@rowcnt <= 100000)
begin
 insert into dbo.temp_Products
 select @rowcnt, 1+rand()*4, 1+rand()*100, 1+rand()*10

 set @rowcnt = @rowcnt + 1
end

And a scalar value function which does a simple logical calculation for calculating a shipping cost:

create function dbo.usvf_CalculateShipping
(
    @PricePerKG float = 1,
    @Type tinyint,
    @WeightInKG float = 1
)
returns float
as
begin

    return  /*get the appropriate factor to apply*/
            case
                when @Type = 1 then 0.1
                when @Type = 2 then 0.2
                when @Type = 3 then 0.35
                when @Type = 4 then 0.43
            end * @PricePerKG * @WeightInKG

end
go

When we call the SVF inside a query the resulting performance is impaired relative to calling the logic inline. To compare the differences between the two take the following two SQL statements:

select ProductID, case
                when [Type] = 1 then 0.1
                when [Type] = 2 then 0.2
                when [Type] = 3 then 0.35
                when [Type] = 4 then 0.43
            end *Price*[Weight] from temp_Products

select ProductID, dbo.usvf_CalculateShipping(Price, [Type], [Weight]) from temp_Products

The first query which runs the logic inline produces a result in a quickier time in comparison to the second query. The results are as follows:

(100001 row(s) affected)
Table 'temp_Products'. Scan count 1, logical reads 390, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 78 ms,  elapsed time = 212 ms.

(100001 row(s) affected)
Table 'temp_Products'. Scan count 1, logical reads 390, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 842 ms,  elapsed time = 938 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

To get around this the solution would be to use an inline table-valued function (TVF) where the logic resides, this logic would then be injected into the plan of the query calling this function, affectively placing the logic inline.

create function dbo.utvf_CalculateShipping
(
    @PricePerKG float = 1,
    @Type tinyint,
    @WeightInKG float = 1
)
returns table as return 
select case
         when @Type = 1 then 0.1
         when @Type = 2 then 0.2
         when @Type = 3 then 0.35
         when @Type = 4 then 0.43
       end * @PricePerKG * @WeightInKG as Shipping
go

If I run the following two queries you’ll find the performance of both are almost identical:

select ProductID, case
                when [Type] = 1 then 0.1
                when [Type] = 2 then 0.2
                when [Type] = 3 then 0.35
                when [Type] = 4 then 0.43
            end *Price*[Weight] from temp_Products
select ProductID, tt.Shipping from temp_Products t cross apply dbo.utvf_CalculateShipping(Price, Type, Weight) tt



(100001 row(s) affected)
Table 'temp_Products'. Scan count 1, logical reads 390, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 109 ms,  elapsed time = 310 ms.

(100001 row(s) affected)
Table 'temp_Products'. Scan count 1, logical reads 390, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 109 ms,  elapsed time = 285 ms.

Monday 13 October 2014

Thread-Safe Singleton Construction

Using the Singleton design pattern it is a common technique to ensure only one instance of a class exists during runtime:

class NonThreadSafeSingleton
{
 static NonThreadSafeSingleton instance;

 public static NonThreadSafeSingleton Instance
 {
  get
  {
   if (instance == null)
    instance = new NonThreadSafeSingleton();
   return instance;
  }
 }
}

The above implementation of the Singleton pattern may work, but it doesn't make it safe to use between multiple threads.  If multiple threads were to access the above Singleton implementation we may experience the following sequence as executed by the runtime:

Order Thread 1 Thread 2
1 if (instance == null)
2 if (instance == null)
3 instance = new NonThreadSafeSingleton();
4 instance = new NonThreadSafeSingleton();
5 return instance;
6 return instance;

As you can see in the order of execution the NonThreadSafeSingleton class is instantiated twice due to a race-condition.  When in reality our desired execution is as follows:

Order Thread 1 Thread 2
1 if (instance == null)
2 instance = new NonThreadSafeSingleton();
3 return instance;
4 return instance;

If we're going to make this thread-safe one approach in .NET 2.0 would be to use an exclusive lock:

class ThreadSafeSingletonWithLock
{
 static ThreadSafeSingletonWithLock instance;
 static object locker = new Object();

 public static ThreadSafeSingletonWithLock Instance
 {
  get
  {
   lock (locker)
   {
    if (instance == null)
     instance = new ThreadSafeSingletonWithLock();
   }
   return instance;
  }
 }
}

But looking at the sequence of execution we now run into the problem of a lock always being acquired even if the instance is already constructed (executions 1 and 6 listed below):

Order Thread 1 Thread 2
1 lock (locker)
2 if (instance == null)
3 instance = new ThreadSafeSingletonWithLock();
4 return instance;
5 lock (locker)
6 return instance;

This causes an additional overhead which isn't desired.  It is actually possible to achieve the same requirement of synchronising the object construction whilst minimising the amount of locking required:

class ThreadSafeSingletonWithDoubleCheckLock
{
 static ThreadSafeSingletonWithDoubleCheckLock instance;
 static object locker = new Object();

 public static ThreadSafeSingletonWithDoubleCheckLock Instance
 {
  get
  {
   if (instance == null)
   {
    lock (locker)
    {
     if (instance == null)
      instance = new ThreadSafeSingletonWithDoubleCheckLock();
    }
   }
   return instance;
  }
 }
}

It is still possible for two threads to "lock" during object construction, although if  the object construction is quick the likelihood of this condition occurring is minimised.  Any additional instance retrievals after instantiation are done outside of the lock providing an improvement during runtime.  Although the above implementation of the thread-safe Singleton pattern is better than the prior implementation, it kind of feels a bit messy having to implement the condition to check if the instance is equal to null two times, an inexperienced developer may think this is unnecessary and run the risk of the line of code being removed.

An even better implementation is provided in .NET 4.0 BCL with Lazy<T>.  This allows a single instance of a class to be instantiated upon request in a thread-safe way.   An example of it's usage is shown below:

class ThreadSafeLazySingleton
{
 readonly static Lazy<ThreadSafeLazySingleton> instance = new Lazy<ThreadSafeLazySingleton>(() => new ThreadSafeLazySingleton());

 private ThreadSafeLazySingleton(){}

 public static ThreadSafeLazySingleton Instance
 {
  get { return instance.Value; }
 }
}

The Lazy<T> overloaded constructors provide performance options during construction.  For example, if the instance isn't going to be used between multiple threads then it is possible to define the Lazy instance as non-thread safe to ensure the highest possible performance.

By using the Lazy class we achieve the following:
  1. Encapsulating the locking mechanism: As the synchronisation of instantiation is internal to the Lazy<T> class any future optimisations done by the .NET team should not effect the usage of this class unless they intend of implementing a breaking-change;
  2. Allows the singleton instance to be used in a thread-safe environment;
  3. Gracefully encapsulates the double-check locking inside the class without having the developer to implement this themselves
The private constructor in the examples above have been omitted for brevity purposes.

Wednesday 8 October 2014

Binding ConverterParameter

There have been numerous cases where I've wanted to bind a ConverterParameter on a converter instance to either a property on another object or to a property on my view model.

XAML Binding
<Label Content="{Binding BoundedValue
            , Converter={StaticResource valueIfNullConverter}
            , ConverterParameter={Binding DefaultNullValue}}"/>
Converter
    public class ValueIfNullConverter : IValueConverter
    {
        public object Convert(object value, Type targetType, object parameter, System.Globalization.CultureInfo culture)
        {
            return value ?? System.Convert.ToString(parameter);
        }
    }

Only to find this failing at runtime with the following exception:

A 'Binding' cannot be set on the 'ConverterParameter' property of type 'Binding'. A 'Binding' can only be set on a DependencyProperty of a DependencyObject.

This therefore implies the ConverterParameter property isn't exposed as a DependencyProperty and therefore can't be bound to.

To overcome this problem we can use a MultiBinding along with a multi-value converter whereby we bind the "values" of the converter to both the data source and converter parameter properties as follows:

        <Label>
            <Label.Content>
                <MultiBinding Converter="{StaticResource valueIfNullConverter}">
                    <Binding Path="BoundedValue" />
                    <Binding Path="DefaultNullValue" />
                </MultiBinding>
            </Label.Content>
        </Label>
And changing our converter to a multi-value converter:
    public class ValueIfNullConverter : IMultiValueConverter
    {
        public object Convert(object[] values, Type targetType, object parameter, System.Globalization.CultureInfo culture)
        {
            if (values != null && values.Length == 2)
            {
                return values[0] ?? System.Convert.ToString(values[1]);
            }

            return values;
        }
    }
In the multi-value converter above, the bound value and parameter are now accessible inside the 'values' collection argument instead of accessing the parameter through the 'parameter' argument.

Thursday 18 September 2014

Lazy loading with ConcurrentDictionary for Thread-Safe Factory Methods

Developers who have started using the ConcurrentDictionary class in a multithreaded environment may have noticed some unusual behaviour with the GetOrAdd or AddOrUpdate methods exposed by the class. These methods are NOT performed in an atomic operation meaning the delegates parameters in these methods are not performed in a blocking manner. This can therefore cause the delegates to be called multiple times if multiple threads are accessing the dictionary at the same time. This may not have been your intention in the design of your code, especially if this delegate is processing a time consuming command (e.g. accessing a data layer).

For Microsoft to implement these methods this way is by design, this is because the ConcurrentDictionary class is a non-blocking data structure, whereby the delegates are run outside of the dictionary’s internal lock mechanism in order to prevent unknown code from blocking all threads accessing the dictionary. The design of this class is to keep access from multiple threads as quick as possible.

To get around this issue the ConcurrentDictionary can be used with the Lazy class to ensure the delegate is invoked only once. This guarantees the delegate is called only once.

class Program
{
    static void Main(string[] args)
    {
        Consumer consumer = new Consumer();
        Parallel.For(0, 10, (i) => consumer.Consume("NonLazyKey"));
        Parallel.For(0, 10, (i) => consumer.LazyConsume("LazyKey"));
        Console.ReadKey();
    }
}
 
class Consumer
{
    ConcurrentDictionary<string, int> _cache = new ConcurrentDictionary<string, int>();
    ConcurrentDictionary<string, Lazy<int>> _lazyCache = new ConcurrentDictionary<string, Lazy<int>>();
 
    public int Consume(string key)
    {
        return _cache.GetOrAdd(key, this.GetValue(key));
    }
 
    public int LazyConsume(string key)
    {
        return _lazyCache.GetOrAdd(key, new Lazy<int>(() => this.GetValue(key))).Value;
    }
 
    int GetValue(string key)
    {
        Console.WriteLine("Getting Value for Key: {0}", key);
        return 1;
    }
}

In the example provided the Consumer class implements two internal caches; one cache declared with a string key and an integer value, the other cache declared with a string key and a Lazy data type. Two methods are provided to get the integer values from the caches provided by a key, these both subsequently call the GetValue method to get a “calculate” value (this GetValue method could be long running). The GetValue method outputs to the Console every time it is called, this is done so that you can see how many times it is called by each Consume method. In this example the LazyComsume only ends up calling the GetValue method once, whereas the non-lazy method Consume calls the method multiple times.

The ConcurrentDictionary and Lazy classes should be used in conjunction with each other going forward as a safe design pattern to use to ensure your factory value generators aren't called multiple times.

Sunday 3 August 2014

Linq 2 SQL ObjectTrackingEnabled Memory Leaks

If you're working with disconnected entities with Linq 2 SQL you should understand that setting the ObjectTrackingEnabled flag on the data context is important to prevent memory leaks in regards to subscribed events as the default implied value for this flag is TRUE.

Take the following data access layer snippet for retrieving a Person entity (AdventureWorks db):
public Person GetPerson()
{
    using (AdventureWorksDataContext ctx = new AdventureWorksDataContext())
    {
        //ctx.ObjectTrackingEnabled = false;
        return ctx.Persons.First();
    }
}
If the ObjectTrackingEnabled flag is NOT set to FALSE on the data context then the lifetime of the implied change tracker object in Linq 2 SQL will remain alive as long as the entity's.  This is because the object change tracker in Linq 2 SQL subscribes to the PropertyChanged and PropertyChanging events on the entities which are materialised when the ObjectTrackingEnabled flag is set to TRUE.  This could become problematic when calling this method multiple times to retrieve numerous People within your application, because whenever the data context is instantiated a new instance of the change tracker is also created.  Therefore you'll have one change tracker for every entity!  However, once the entity goes out of scope for garbage collection the change tracker will also be garbage collected.

To prove this point the following code snippet shows two examples: the first, where the change tracker is kept alive by keeping the ObjectTrackingEnabled set to TRUE, and the second where the change tracker is discarded by setting the ObjectTrackingEnabled to FALSE.  The GetPerson method in the DataAccessLayer now contains some code where it retrieves the ChangeTracker object on the data context and assigns it to the WeakReference output parameter so the lifetime of the object can be tracked.
static void Main(string[] args)
{
    DataAccessLayer layer = new DataAccessLayer();

    WeakReference refCTWithTracking;
    Person person1 = layer.GetPerson(true, out refCTWithTracking);
    GC.Collect();
    Console.WriteLine("ChangeTrackerWithTracking IsAlive: {0}", refCTWithTracking.IsAlive);

    WeakReference refCTWithoutTracking;
    Person person2 = layer.GetPerson(false, out refCTWithoutTracking);
    GC.Collect();
    Console.WriteLine("ChangeTrackerWithoutTracking IsAlive: {0}", refCTWithoutTracking.IsAlive);

    Console.ReadKey();
}

public class DataAccessLayer
{
    public Person GetPerson(bool objectChangeTracking, out WeakReference refChangeTracker)
    {
        using (AdventureWorksDataContext ctx = new AdventureWorksDataContext())
        {
            ctx.ObjectTrackingEnabled = objectChangeTracking;

            PropertyInfo propServices = typeof(AdventureWorksDataContext).GetProperty("Services", BindingFlags.NonPublic | BindingFlags.Instance);
            PropertyInfo propChangeTracker = propServices.PropertyType.GetProperty("ChangeTracker", BindingFlags.Instance | BindingFlags.NonPublic);
            refChangeTracker = new WeakReference(propChangeTracker.GetValue(propServices.GetValue(ctx)));

            return ctx.Persons.First();
        }
    }
}
It would have been a better design for Microsoft to explicitly unsubscribe from the events on PropertyChanged and PropertyChanging when disposing of the data context as the lifetime of the change tracker should be scoped to that of the data context.  In my opinion, the change tracker is deemed redundant once the data context has been disposed.  Linq 2 SQL even prevents you reattaching a disconnected entity to another context if it has these events subscribed to by another change tracker.

Friday 1 August 2014

.NET Event Memory Leaks

A common cause of memory leaks in .NET applications is caused by subscribing classes not correctly unsubscribing from events of another class it consumes prior to disposal.

The example below shows where memory leaks can cause a problem (copy and paste into LinqPad):
void Main()
{
 //subscriber subscribes to publisher event
 PublisherSubscriberExample(true);

 //subscriber does NOT subscribes to publisher event
 PublisherSubscriberExample(false);
}

void PublisherSubscriberExample(bool subscribeToEvents)
{
 String.Format("SubscribeToEvents: {0}", subscribeToEvents).Dump();
 Publisher pub = new Publisher();
 Subscriber sub = new Subscriber(pub, subscribeToEvents);
 WeakReference refPub = new WeakReference(pub);
 WeakReference refSub = new WeakReference(sub);
 //print publisher and subscriber object state pre dispose
 String.Format("Pub: {0}", refPub.IsAlive).Dump();
 String.Format("Sub: {0}", refSub.IsAlive).Dump();
 //dispose of the subscriber object
 sub = null;
 //force garbage collection for the subscriber object
 GC.Collect();
 //print publisher and subscriber object state post dispose
 String.Format("Pub: {0}", refPub.IsAlive).Dump();
 String.Format("Sub: {0}", refSub.IsAlive).Dump();
}

class Publisher
{
 public event EventHandler PublisherEvent;

 public Publisher() {}
}

class Subscriber
{
 Publisher _publisher;

 public Subscriber(Publisher publisher, bool subscribeToEvent)
 {
  _publisher = publisher;
 
  if (subscribeToEvent)
   _publisher.PublisherEvent += PublisherEvent;
 }

 void PublisherEvent(object sender, EventArgs e)
 {
 }
}
We have a Subscriber and Publisher class; the Subscriber takes a Publisher instance and subscribes to an event.  When the Subscriber instance goes out of scope (i.e. disposed of and requested for garbage collection), the Publisher instance still maintains a reference to the Subscriber as the Subscriber has NOT explicitly unsubscribed from its original subscription. Whilst the Publisher instance is still in scope so will the Subscriber instance.

To get round this issue it is therefore good practice to implement IDisposable on the Subscriber class so that the event can be unsubscribed from upon disposing of the object:
class Subscriber : IDisposable
{
 Publisher _publisher;

 public Subscriber (Publisher publisher, bool subscribeToEvent)
 {
  _publisher = publisher;
 
  if (subscribeToEvent)
   _publisher.PublisherEvent += PublisherEvent;
 }

 public void Dispose()
 {
  //free managed resources
  this.Dispose(true);
 }

 void Dispose(bool disposing)
 {
  if (disposing)
  {
   //free managed resources
   _publisher.PublisherEvent -= PublisherEvent;
  }
 }

 void PublisherEvent(object sender, EventArgs e)
 {
 }
}
You would then call Dispose on objects which expose this method instead of setting the reference to NULL.