Querying Entity Framework. Part 24 – Aggregating Nullable Fields

It’s a very simple case for Entity Framework model. I have a nullable column in Clients table.

I want to count values and to sum up them. The LINQ query to EF model can be very simple.

But these queries reveals some magic. Count() returns int type, but Sum() return int?. So you may fall into one of the following cases.

Method Return type Result on all rows Result on not-nullable rows Result on empty dataset
Count() int Count all rows regardless null or not null Count only not-nullable rows Zero
Sum() int? Sum only not-nullable values Sum only not-nullable values Null

The most dangerous is the last case when Sum() returns null, so  you should add a check for null value. If you want to get zero, you need to change a request and add DefaultIfEmpty() method.

Querying Entity Framework. Part 23 – Filtering Nullable Field

When you run queries against non-nullable columns (NOT NULL in SQL), everything is going smoothly. You should pay some attention when you deal with nullable fields (one of the cases will be covered at the end of this post).

Now let’s look closer to Entity Framework queries against nullable field. I’ve added a new column LastUpdate with the type DateTime? to my Client entity.

So I have some records with concrete dates, and some with NULL values.

1. Filtering “greater or equal than”

C# understands it perfectly. We have a simple and correct SQL query.

2. Filtering all concrete dates aka WHERE … IS NOT NULL
You can use one of the choices.

Both generate the same correct SQL query.

3. Filtering multiple dates aka WHERE … IN

This case is a bit harder because C# could not compare DateTime[] array and DateTime? field. You need to get a not-nullable value from column.

Looks good, but I’m not satisfied with conversion of C# DateTime array values to SQL Server DateTime2 type. Especially when I set a strong annotation to use Date type in database.

If you don’t want to build a query with every parameter despite whether it is null or not, you might build a dynamic filter like in the following sample.

What will be if both parameters are null? No filters will be added, and you’ll get a query returning all(!) rows regardless of having a concrete value or not. In this case add another check when both parameters are null. One of the simplest implementation is to add third if statement.

Code First. Executing Stored Procedures With Table-Valued Parameter

Today I need to retrieve data from Entity Framework using stored procedure, and passing input values from .NET collection through the table-valued parameter. As for technologies that will be used here, I wrote in these posts:

  1. Using Table-Valued Parameters in SQL Server Stored Procedures – how to create SQL Server type and stored procedure;
  2. Calling SQL Server Stored Procedure With Table-Valued Parameter in C# – calling stored procedure via ADO.NET;
  3. Code First. Executing Stored Procedures – calling stored procedures via DbContext.Database.SqlClient().

As stated in MSDN article Table-Valued Parameters, table-valued parameter can be populated from:

  • DataTable
  • DbDataReader
  • IEnumerable

If you have a .NET collection with parameters, you could transform it to IEnumerable. Here is a good article on C# Corner Passing Table Valued Parameter to Stored Procedure: Part 2 that shows how to make this conversion.

Let’s start to write the code.

1. Create SQL Server type UserType to store the row of a table-valued parameter

2. Create stored procedure with a readonly parameter of type UserType

3. Create .NET class UserEntry to store the row of a table-valued parameter

4. Create .NET class UserParamCollection that must be inherited from the preceding .NET class and also implements IEnumerable

5. Convert List<UserEnty> to a newly created class UserParamCollection

6. Create SqlParameter

Here you need to specify parameter type SqlDbType.Structured. TypeName must be the same as SQL Server type defined in step 1.

7. Call stored procedure and pass the preceding SqlParameter

You must place the parameter name of the SQL Server stored procedure right after the name of the stored procedure as following: “[dbo].[ListUsers] @UserList”. If you omit it, you’ll get an error:
Procedure or function 'ListUsers' expects parameter '@UserList', which was not supplied.

Because SqlQuery() returns a collection of type System.Data.Entity.Infrastructure.DbRawSqlQuery, it would better to cast it to IEnumerable to conform to application interfaces.

Querying Entity Framework. Part 22 – Multiple Fields In Join, GroupBy And OrderBy

Now I want to summarize the info about how to use multiple fields in LINQ queries. You know that LINQ to Entities allows to make joins, grouping and sorting based on single field or multiple fields.

Generally speaking, when the query operation is based on a single field, you use this single field alone. When the operation is based on multiple fields, you would use a construct as the following:
new { field1, field2... }

Here are examples of how to do it in C#.

Join

There are identical key selectors in inner and outer parts of the join.

GroupBy

OrderBy

In this case EF will sort the rows by both fields in ascending order. The last row in T-SQL will look like this:
ORDER BY [Extent1].[CountryCode] ASC, [Extent1].[Name] ASC

If you use OrderByDescending() method, so the output will be sorted by both fields, but in descending order.
In a situation when you need different sorting, you should use ThenBy()/ThenByDescending() for the second and consequent fields.

Not All .NET Types Can Be Used In Entity Framework

Recently I had a demand to use an unsigned integer type in Entity Framework model. Everything was OK, but migration did not create a field for that type.

I knew that EF could not create a SQL type for char (System.Char), even if you place an annotation like this:

I wrote about it in my post Entity Framework. C# To SQL Type Translation.

It was surprise for me that EF could not create any type for unsigned types like ushort, uint or ulong. Column annotation to use signed type ain’t help too.

And the most “amazing” thing is that EF rejects sbyte type (as for the EF version 6.1.3). It generates an error:

There is no store type corresponding to the conceptual side type 'SByte' of primitive type 'SByte'.

The conclusion is that you can use any .NET type in Entity Framework model except char, ushort/uint/ulong and sbyte.

Querying Entity Model. Part 21 – Intersect And Except

SQL Server and LINQ both have some commands that operates on sets like Union, Intersect and Except. I’ve investigated the behavor of Union in Entity Framework earlier in my post Querying Entity Model. Part 15 – Union. Now let’s look at intersect and except.

I want to see the intersection between two sets of country codes taken from Clients and Phones tables (I use the same simple EF model through all the Querying Entity Model series).

Wow! Entity Framework uses exactly INTERSECT operator. Well done.

Except case is similar to intersect one.

I used a collection with one field, but EF is capable to operate on collections with many fields. The only thing you should care about is to intersect/except the collections of the same type.

Querying Entity Model. Part 20 – Retrieving Some Records From Groups

Entity Framework allows to retrieve records from groups in any quantity and from any position, i.e. first record, records 1…N, second record, or records 5…10. These can be made with LINQ methods like First(), Take() and Skip().

When I write T-SQL code I often use ROW_NUMBER() function to generate serial numbers for groups of records, then filtering the required serial numbers of rows. I suspect that EF would generate a less efficient code with APPLY operator (look at this post Querying Entity Model. Part 7 – Left Join With First Row). Now it’s time to confirm this.

Retrieving first row

Another way is to use Take() method.

Entity Framework uses SELECT DISTINCT to get group key values and APPLY operator to get a record for each group. EF uses OUTER APPLY in the first case, and CROSS APPLY in the second.

Retrieving first N rows

The only difference in T-SQL code is the line CROSS APPLY (SELECT TOP (2).

Retrieving rows with offset

I try to get exactly the second rows from groups. Skip() method needs a sorted list, so I have to place OrderBy() before it.

EF generates T-SQL code with OFFSET .. FETCH for SQL Server 2016.

Querying Entity Model. Part 19 – Full Outer Join

There is a rarely used join in SQL as full outer join. It combines the intersection between two sets, and both set exception from left and right parts of the join. Full outer join might be used to merge data from source and target tables, so you would get intersecting rows to update, new rows to insert, and missing rows to delete.

This type of join also can be implemented by LINQ queries on Entity Framework in four steps:

  1. Left outer join
  2. Right outer join
  3. Concat both outer joins
  4. Get distinct result

Here is an abridged T-SQL code (I’ve deleted extra lines in SELECT clause for readability).

As you see, T-SQL code repeats the logic of LINQ. Comparison of execution plans shows that a native FULL OUTER JOIN has cost 36% against 64% of LINQ in my case. Maybe you can get a bigger difference.

And native T-SQL code is more elegant 🙂

Entries In Entity Framework ChangeTracker Could Degrade Database Write Performance

Let’s imagine the case when you need to load a large amount of rows into database through Entity Framework application. If you had tried that, you know this process is very-very slow.
1. Worst case for performance

SaveChanges() will open connection, start transaction and make one INSERT, and so on for all records. Then commit that unique record and closes connection.

2. Less worse case

This time SaveChanges() opens connections and starts transaction just once. Then it emits INSERT command for each record and commits all records simultaneously.

In both cases Entity Framework preserves entities in ChangeTracker with a state Unchanged after saving them in database. When you call SaveChanges() again, it iterates ChangeTracker collection to find what to save. The more items will be in ChangeTracker the more time is needed to process them all.

So you can try to restrict the number of entries in ChangeTracker or disable change tracking mechanism at all.

To restrict the number of entries in ChangeTracker you can do:

  • recreate DbContext;
  • remove entries from ChangeTracker by setting the state to Detached.

So you can save some records (a batch), then get rid of all unnecessary entries from ChangeTracker.

DbContext.Configuration has two options that could improve the performance:

  • ValidateOnSaveEnabled – when it’s false, EF does not validate entity against model;
  • AutoDetectChangesEnabled – when it’s false, EF does not compare original and current states of entities and does not change the state of the entity automatically. In other words, you can change the entity, but EF won’t notice that.

I’ve made a test to estimate the performance for the following cases where I call SaveChanges() method:

  1. immediately after each record was added to context;
  2. after all records;
  3. after adding a batch of records;
  4. after adding a batch of records, then recreating a DbContext;
  5. after adding a batch of records, then removing entries from ChangeTracker collection;
  6. after all records where ValidateOnSaveEnabled and/or AutoDetectChangesEnabled options are turned off.

Here is the graphical representation of a series of tests using different number of records (from 200 to 2000) and different sizes of batches (50, 100 or 200).

savechanges-comparison-chart

There are 3 trends:

  1. highest – where SaveChanges() occurs after each record. Of course, committing every INSERT is much longer then committing a batch.
  2. medium – saving after all records or a batch were added.
  3. lowest – where ChangeTracker collection were not allowed to grow too much or ChangeTracker auto-detect feature was turned off.

On my opinion, there are two preferable ways to optimize the Entity Framework performance for massive write operations:

  1. call SaveChanges() after adding a relatively small batch of records and remove entries from ChangeTracker collection;
  2. turn off AutoDetectChangesEnabled option.

I would not recommend to recreate a DbContext cause you can have some additional code executed of context creation. Also recreation will clear EF cache for metadata and queries. These would lead to a negative impact on overall performance.

If you set AutoDetectChangesEnabled = false, don’t forget to switch back when you return to normal operations. Otherwise EF could overlook updates in entities.

Ignoring DbUpdateException And Continue

Entity Framework has internal capabilities to validate data against the EF model, but there may be a situation when you have some restrictions on the database level. For example, you are importing a large amount of data on a table-per-table basis, and suddenly some rows are breaking the foreign key constraints. In this case database throws an error, sends it to EF, and EF returns DbUpdateException to your code. What’s next?

1) You can examine the entity which caused the exception. DbUpdateException has a property Entries that holds failed entities.
2) If you want to ignore this entity (maybe it’s an expected situation where some data could break foreign key and that rows should be skipped), you need to change the state of DbEntityEntry to Detached. This will remove the entity from context. So it’ll never be saved later.
3) After these steps you must repeat SaveChanges().

The following code snippet is an example of how you can implement the above mentioned logic.