Entries In Entity Framework ChangeTracker Could Degrade Database Write Performance

Let’s imagine the case when you need to load a large amount of rows into database through Entity Framework application. If you had tried that, you know this process is very-very slow.
1. Worst case for performance

SaveChanges() will open connection, start transaction and make one INSERT, and so on for all records. Then commit that unique record and closes connection.

2. Less worse case

This time SaveChanges() opens connections and starts transaction just once. Then it emits INSERT command for each record and commits all records simultaneously.

In both cases Entity Framework preserves entities in ChangeTracker with a state Unchanged after saving them in database. When you call SaveChanges() again, it iterates ChangeTracker collection to find what to save. The more items will be in ChangeTracker the more time is needed to process them all.

So you can try to restrict the number of entries in ChangeTracker or disable change tracking mechanism at all.

To restrict the number of entries in ChangeTracker you can do:

  • recreate DbContext;
  • remove entries from ChangeTracker by setting the state to Detached.

So you can save some records (a batch), then get rid of all unnecessary entries from ChangeTracker.

DbContext.Configuration has two options that could improve the performance:

  • ValidateOnSaveEnabled – when it’s false, EF does not validate entity against model;
  • AutoDetectChangesEnabled – when it’s false, EF does not compare original and current states of entities and does not change the state of the entity automatically. In other words, you can change the entity, but EF won’t notice that.

I’ve made a test to estimate the performance for the following cases where I call SaveChanges() method:

  1. immediately after each record was added to context;
  2. after all records;
  3. after adding a batch of records;
  4. after adding a batch of records, then recreating a DbContext;
  5. after adding a batch of records, then removing entries from ChangeTracker collection;
  6. after all records where ValidateOnSaveEnabled and/or AutoDetectChangesEnabled options are turned off.

Here is the graphical representation of a series of tests using different number of records (from 200 to 2000) and different sizes of batches (50, 100 or 200).

savechanges-comparison-chart

There are 3 trends:

  1. highest – where SaveChanges() occurs after each record. Of course, committing every INSERT is much longer then committing a batch.
  2. medium – saving after all records or a batch were added.
  3. lowest – where ChangeTracker collection were not allowed to grow too much or ChangeTracker auto-detect feature was turned off.

On my opinion, there are two preferable ways to optimize the Entity Framework performance for massive write operations:

  1. call SaveChanges() after adding a relatively small batch of records and remove entries from ChangeTracker collection;
  2. turn off AutoDetectChangesEnabled option.

I would not recommend to recreate a DbContext cause you can have some additional code executed of context creation. Also recreation will clear EF cache for metadata and queries. These would lead to a negative impact on overall performance.

If you set AutoDetectChangesEnabled = false, don’t forget to switch back when you return to normal operations. Otherwise EF could overlook updates in entities.

Ignoring DbUpdateException And Continue

Entity Framework has internal capabilities to validate data against the EF model, but there may be a situation when you have some restrictions on the database level. For example, you are importing a large amount of data on a table-per-table basis, and suddenly some rows are breaking the foreign key constraints. In this case database throws an error, sends it to EF, and EF returns DbUpdateException to your code. What’s next?

1) You can examine the entity which caused the exception. DbUpdateException has a property Entries that holds failed entities.
2) If you want to ignore this entity (maybe it’s an expected situation where some data could break foreign key and that rows should be skipped), you need to change the state of DbEntityEntry to Detached. This will remove the entity from context. So it’ll never be saved later.
3) After these steps you must repeat SaveChanges().

The following code snippet is an example of how you can implement the above mentioned logic.

Implicit and Explicit Transactions In Entity Framework

When you call SaveChanges() method on Entity Framework database context, EF makes the following steps:

  1. opens a connection
  2. starts an implicit transaction
  3. then it generates single T-SQL command for each record to be inserted/updated/deleted
  4. after all it commits the transaction
  5. closes connection

Based on my experiment model (I use it through all Quering Entity Framework posts), I’ve created a simple test with two save operations.

The log shows two sequences of operations described above (log content is not included).

Entity Framework also allows to use explicit transactions. They are described in MSDN article Working with Transactions (EF6 Onwards). My code would be transformed to the following:

Now there’s only one transaction.

I’ve noticed a couple of interesting moments.
1. When EF uses an implicit transaction, transaction starts at the moment when you call SaveChanges() method and commits when all commands sent to SQL Server would finish. The connection opens/closes just before and after this implicit transaction.
When you start explicit transaction, EF opens the connection and starts transaction right when you call BeginTransaction() method. Connection is closed when you go out of the scope of explicit transaction, i.e. out of using statement. So don’t make large calculations when the transaction is opened. It would better to have a transaction as short as possible.

2. EF does not use direct T-SQL commands like INSERT or UPDATE. Instead it wraps those commands as a dynamic SQL and executes them with sp_executesql stored procedure.

ef-saves-data

This wrapping creates execution plan on the first insert (SQL Server needs some time and CPU to compile it), and that execution plan would be reused for any subsequent identical inserts. It’s a very good behavor!

You can run the following query to see what’s in the SQL Server plan cache.

ef-execution-plan-reuse

Inserting Into Entity Framework. Add() vs AddRange()

Entity Framework offers two methods to insert records into database. Add() method allows to insert a single entity. AddRange() method inserts an IEnumerable collection. The main question is does the second method provide any benefit over the first one?

The most important is the database performance. I’ve created a simple test where I added single records with Add() method and a List with AddRange(). I called SaveChanges() after each block.

Also I’ve enabled context database logging with the following command:

The log in Output window shows that Entity Framework generates the same sequence of T-SQL commands.

Yeah, EF generates single INSERT command per each record. So the only benefit is the ease of writing C# code, but you’ll never get any gain from database.
[UPDATE 2016-09-22]
One benefit is the ease of writing C# code. Another is some internal EF optimization in working with ChangeTracking. As stated in MSDN article Performance Considerations for Entity Framework 4, 5, and 6 “the advantage of using the range methods is that the cost of DetectChanges is only paid once for the entire set of entities as opposed to once per each added entity”. But from the database perspective it won’t give you any gain.

Querying Entity Model. Part 18 – Limiting The Number Of Rows Returned

When working with a large number of rows in a table, sometimes you need to retrieve just a part of them. It could be the first N rows (for example, from 1 till 100) or any of the subsequent blocks with N rows (maybe 101…200, or 701…800). The last technique is also called pagination.

LINQ offers four methods to restrict the output:

  • Take() and Skip() – with a concrete number of rows;
  • TakeWhile() and SkipWhile() – with a lambda expression.

You can find the descriptions of these methods in MSDN article Enumerable Methods.

Let’s take this simple query.

It’s not a good idea to take first rows without any ordering because SQL Server does not guarantee the order of rows, especially when the table does not have a clustered index. But it works and Entity Framework generates the following T-SQL code:

Good! The code is correct, it uses a well-known SELECT TOP statement.

Now I want to get next 100 rows. But Entity Framework requires an explicit ordering now. OK, I place an explicit OrderBy() method.

You’ll be surprised that Entity Framework could generate different T-SQL code here. It depends on the version of SQL Server you’re working with. For SQL Server 2012+ it generates code with OFFSET/FETCH.

For previous versions (I’ve tested on SQL Server 2008R2) it uses ROW_NUMBER() function.

* I’ve indented the inner subquery to improve the readability.

Great! Entity Framework does the best it could do here 🙂

Querying Entity Model. Part 17 – ORDER BY

LINQ supplies 4 methods to order the result set, and all these methods are implemented in Entity Framework. The first method should be OrderBy() for ascending ordering or OrderByDescending() otherwise. The second and next subsequent methods should be ThenBy() or ThenByDescending().

This LINQ query will generate the following T-SQL code on the model used in all my posts about Querying Entity Model.

Good! Entity Framework produces the correct ORDER BY clause.

If you place two OrderBy() methods in one LINQ query one after another, you’ll get a wrong result – it will be sorted only by the last expression.

So the correct sequence must be OrderBy – ThenBy.

Encrypting appSettings Key In ASP.NET Web.config

I needed a solution to encrypt a single key in appSettings section in ASP.NET Web.config file. The article on MSDN Encrypting and Decrypting Configuration Sections tells about encrypting the whole section but I want to hide only a particular key.
It seems that apsnet_regiis could encrypt only sections, so I’ve found a way how to extract this key to another section and encrypt that section in one of StackOverflow answers http://stackoverflow.com/a/6224769

Here are the steps that must be done to get the result:
1. Add a line in configSections with the name of your new section

2. Add new section under configuration

3. In .NET code get the section and key from it

4. After you deploy your Web application to IIS, you should encrypt section secureAppSettings according to the MSDN article mentioned above. Launch Command Prompt as Administrator (it’s important!) and run command (something like this):

Web API Anonymous Authentication And IIS Express

I’ve been tested a Web API web service and I needed an anonymous authentication. This can be done with attribute [AllowAnonymous].
You can decorate a whole controller

or a particular method(s)

The most curious thing is that it does not work when you run Visual Studio debugger! The secret is hidden in the properties of the Web API project. Click the project in Solution Explorer, then press F4 (don’t choose Properties from context menu). Here you will see authentication settings.

webapi-project-settings

All these properties are saved in .csproj file.

SSDT Database Project Publishing vs Code First Migration

I work both with SSDT database projects and .NET projects based on Entity Framework Code First models. Now I’ll say some words about how these two types of projects do the deployment.

1. SSDT database project

  • You write the code (it is Transact-SQL code) to create tables, stored procedures, functions and other SQL Server database objects in declarative style. It means that you write it in a way of how it should be in the final stage. Don’t worry about changes, no ALTER commands – Visual Studio do it during deployment.
  • In SSDT database project terminology deployment is called “publishing”. To deploy your database you should find Publish command 🙂
  • During publishing Visual Studio compare the actual state of the database with the state that should be according to your project. Then it generate the ALTER script.

2. Code First Migration

  • You write your model in declarative style too but as a C# class.
  • To create the database you use NuGet Package Manager to run two commands: Add-Migration and Update-Database.
  • During migration Visual Studio looks into previous migration classes and generates the difference. Then it creates a new class with changes.
  • Update-Database generates T-SQL commands according to the code in the migration class. The info about applied migrations is saved in the database in the dbo.__MigrationHistory table.

The main difference between these two deployments is that SSDT database project produces change script based on actual database, whereas Code First Migration does not. So if you use migrations, change your tables only through migrations. Direct changes to database would not be taken into account by migration process.