Comparison: Entity Framework vs C#. When Equality Operators Are Not Equal

When you compare numbers, you know that 5 equals 5. But C#, Entity Framework, SQL and many programming languages have a special value called “null”. It’s something unknown. So when you compare a definite number, for example 5, with null, the result will be “false”, because 5 and unknown is not the same.
But the situation differs when you compare null with null. Guess will it be true or false? But the right answer is: It depends… And now I’ll tell you why.

T-SQL has a concept that unknown value NULL is not equal to another unknown value NULL. At the same time C# and Entity Framework believe that two nulls are equal. This comes from the reference types where null means that a pointer does not reference to any variable. So when 2 pointers are null, so they are equal, because they don’t reference to any variable.

To better understand the differences between Entity Framework and C# I created a simple test to compare values. Here is my C# project StringEqualityTest that you can download and run in your own pace.

The results are the following:

Left operand Right operand C# Entity Framework Explanation
“A” “A” true true Exact match
“B” “b” (lowercase) false true* EF uses SQL Server comparison rules
“B” “B ” (with trailing space) false true EF uses SQL Server comparison rules
“” “” (empty string) true true Exact match
null null true true EF uses C# rules
* – not always

Let’s look deeper into what’s going on.

When Entity Framework compares two not null strings, it applies rules defined in SQL Server. It seems that EF sends a query and merely receives the result from SQL Server. And all the rules come from SQL Server. That’s the reason why SQL Server discards trailing spaces (look at “B” = “B “). Also it uses a collation for comparison and ordering. In my case it’s a Cyrillic_General_CI_AS, CI means Case Insensitive, AS means Accent Sensitive. So “B” = “b”. If you have another collation, the result might be different.

Just compare these examples with C# rules where “B” <> “b”, and “B” <> “B “. C# always returns the same result.

But when EF compares null strings, it shows they are equal like C# do. To do this, EF generate a special query for SQL Server to receive the result in C# logic. When you look at SELECT command generated by EF, you’ll find something interesting.

EF adds a second condition when both operands are null. That’s why EF shows the result in C# style while SQL Server executes all the comparison as it did forever.

Entity Framework has a dedicated parameter that could change the null comparison logic to be implemented by SQL Server. In EF 6.0 it’s a UseDatabaseNullSemantics. When you set it to true (the default is false), you’ll get a SQL Server rules for null equality. Some previous versions of EF uses an opposite parameter UseCSharpNullComparisonBehavior.

This time the SQL query would not have an additional condition for nulls.

UPDATE:
I’ve missed an important option that influence SQL behavor (thanks to Arthur Zubarev for his valuable comment). There’s an option ANSI_NULLS. If you set it to false, so SQL Server would think that nulls are equal. But you can hardly see it in action because SQL Server Native Client driver automatically set it to ON when connecting.

Querying Entity Model. Part 6 – Implementing WHERE EXISTS

This time I’ll show how we can select data from one entity when there’s a dependency on the other entity based on two or more fields. This Entity Framework LINQ query is the equivalent of the SQL query in the form of SELECT … WHERE EXISTS.

I use the same Clients and Phones entities, and I want to find all clients that have phone number in their home countries, i.e. if John is from USA and he has a U.S. phone, not Canadian or Mexican number.

The LINQ query has lambda syntax and is quite simple.

I use Any() function to get a boolean result – if there’s a record that would satisfy the conditions in Where clause.

T-SQL query will be the following:

Yeah, Entity Framework has made a SELECT … WHERE EXISTS command with two conditions on ClientId and CountryCode, but…
Do you notice that T-SQL query has an additional comparison where both Codes are null? It’s a feature of Entity Framework I’ll tell about in my next post Comparison: Entity Framework vs C#. When Equality Operators Are Not Equal.

Querying Entity Model. Part 5 – Implementing WHERE IN Analogue

Working with SQL databases, you could encounter some cases where you write something like:

SELECT ...
FROM ...
WHERE ... IN (subquery)

This type of queries can be implemented in Entity Framework model. We need to place a subquery in Where clause.
My example uses the same Client and Phone entities from the first post about Querying Entity Model. The LINQ query is quite simple, and it has lambda syntax.

T-SQL query generated by Entity Framework uses EXISTS command instead of WHERE … IN. But the result is the same comparing with the alternative query.

If you want to get a WHERE … NOT IN query, place negation operator ! in Where clause right before the subquery:

[UPDATE 2016-08-31]
You can use another form of writing the subquery evaluation with Any() method.

Entity Framework generates the same T-SQL query.

Querying Entity Model. Part 4 – Join On Multiple Fields

Usually we join tables on just one field, but there may be situations when you need to link records based on two or more fields. Entity Framework, along with T-SQL, is capable to do such joins.

This time I slightly modify the EF model that I use before. Now I add just one property with country code to each of my classes Client and Phone. This code would mean which country the client and his/her phone number are registered in.

My query will select all clients and their home phone numbers, i.e. the country code must be the same. Queries are written in lambda syntax.

1. Join on multiple fields using anonymous types

2. Join on multiple fields using cross join and where

And again Entity Framework shows an excellent result. It generates the same T-SQL code with inner join.

Code Highlighting In WordPress.org Blog

I’ve found a very pretty WordPress.org plugin Crayon Syntax Highlighter written by Aram Kocharyran (http://aramk.com). The first impression is very good, it makes the appearance of C# or T-SQL code in blog posts really better. Thanks, Aram!

If you have a WordPress.org blog and install this plugin, you need to change your posts, so the code would be shown correctly (for example, angle brackets, i.e. “greater than” or “less than” mathematical signs). The most important is to change <pre> tags, so they should include class and decode attributes that reflect programming languages. So the tag in default configuration should look like <pre class=”lang:c# decode:true ” > for C# or <pre class=”lang:tsql decode:true ” > for Transact-SQL code.

Querying Entity Model. Part 3 – Cross Join

If you need to create a cross join in Entity Framework model, i.e. to link every record from the first table with every record from the second table, it can be implemented with a quite simple LINQ query.

In this example I use the same EF model with Client and Phone entities. LINQ query written in lambda syntax selects all possible pairs of client name and phone number.

This query generates the following T-SQL code:

Querying Entity Model. Part 2 – Left Outer Join

After investigating inner joins in Entity Framework model I started to explore left outer joins. It’s where you have one record in left table and zero or many records in the right table.

I’ve found three different ways to write LINQ query in lambda syntax and fortunately they all produce the same T-SQL code. So you can choose any style of writing that looks good for your taste.

I use the same entities as described in my previous post Querying Entity Model. Part 1 – Inner Join

1. Left outer join using GroupJoin and SelectMany (DefaultIfEmpty inside SelectMany)

2. Left outer join – rearranged (DefaultIfEmpty inside GroupJoin)

3. Left outer join using SelectMany

As I mentioned earlier, all these LINQ queries generate the same T-SQL code:

Querying Entity Model. Part 1 – Inner Join

Here are some investigations about how we can query an Entity Framework tables created in Code First model. The most important is how it might be done with LINQ and what T-SQL code the EF would create in any situation. This time I touch only inner joins whereas there are left outer joins, not in/not exists, and so on.

UPDATE 2015-01-15
My LINQ queries have lambda syntax cause I like it too much 🙂

I use a couple of very simple tables (entities):

Note that I use some features to precisely create the table structure using annotations and explicit foreign key field. For more details look at my post Code First. Creating Relationships Between Entities.

Generally, there are 2 ways to join entities: using Join, like in ordinary SQL query, or using Navigation properties defined in EF model. But there are some nuances how it can be implemented.

1. Inner join on Client entity

The resulting T-SQL query will be:

2. Inner join on dedicated foreign key property
Pay attention that my Phone class has a dedicated property ClientId. This field is served as a foreign key and links with Client entity.

The resulting T-SQL query will be:

The most curious that EF produces different T-SQL code in these two queries, but the execution plans are the same.

EF inner join execution plan

3. Inner join through Navigation property
We define Navigation properties in each classes, thus it gives us access to corresponding entity. In our case we will get access from Phone to the corresponding Client.

Wow! The most compact code you could ever write for inner join. And this LINQ query produces the same T-SQL code as in case 2.

4. Inner join using DbContext.Set
This approach could be handy when you write universal processing with generic types, for example, <T1> and <T2>. When a generic type does not know about the presence of navigation properties, the only possible way is to use an explicit Join here.

And again, T-SQL code is the same as in step 2.

Code First Model Conventions. Something You’d Like To Turn Off

When Entity Framework creates a database according to your Code First model, it applies some conventions that makes this process more intelligent. For example, it can recognize primary key, set a relationships between entities (1:1, 1:many, many:many), and so on. The full list is available on MSDN article System.Data.Entity.ModelConfiguration.Conventions Namespace. Currently (January 2016) there’s 46 conventions.

In my opinion Entity Framework makes some unnecessary work with all these conventions. So it would better to turn off some of them, and now I’ll explain why I think so.

ForeignKeyIndexConvention

EF creates indexes for every foreign key field. For example, if we have a Client and Phone entities and the last has a FK to Client, we’ll get an additional index on Phones table.

FK index

Is it good in all cases? I don’t think so. It depends on data in these tables and how you query them. Maybe this index would never be used by SQL Server, or you would have too much indexes that can negatively impact your OLTP performance.

To understand the problem better you can watch video recorded by Kimberly L. Tripp about SQL Server index internals, indexing strategies and analysis available in TechNet SQL Server Video Archive.

OneToManyCascadeDeleteConvention
ManyToManyCascadeDeleteConvention

EF creates foreign keys and automatically sets cascade delete. Just imagine if you have a Product entity that has a foreign key to Category entity, and someone accidentally delete a category record. SQL Server immediately deletes all product records belonging to that category. That’s a rather dangerous option, and I suppose it could be turned on with great precaution and only in those cases where it really needed.

My opinion is to turn them off, and if you really need them, let’s revive them. To do this you need to find your database context class, it’s derived from DbContext class, and place some lines of code there;

  • Add a reference to Conventions namespace

  • Override OnModelCreating method

 

Code First. Creating Relationships Between Entities

When you create a one-to-many relationship between two entities in Code First Entity Framework model, you need to set “links” in both entities. In my case they are Client and Phone entities.

During the migration EF will create a field Client_Id (entity name + field name of the primary key) and a foreign key.
Phone table - Client_Id

I don’t like auto-generated names for table fields and I want to get a precise control over it. For example, I want to give another name for a foreign key field, say, ClientId without underscore. EF allows to do this with a couple of lines of code. You need to add a ClientId property and a ForeignKey attribute pointing to ClientId.

To be frankly, that’s my favorite way to declare a relationship.

Another case is when you need to create two or more relationships between the same two entities. At this time you must set an InverseProperty attribute in Client class, or you’ll get 4 fields and foreign keys instead of 2 required.

Now we get the following structure. Please pay attention that auto-generated fields Client_Id and SecondClient_Id are null.

Phone table - 2 autogenerated relationships

When I placed ForeignKey attributes in Phone class to replace the auto-generated names with my own, I’ve got an error on Update-Database phase.

Introducing FOREIGN KEY constraint 'FK_dbo.Phones_dbo.Clients_OwnerClientId' on table 'Phones' may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY constraints.
Could not create constraint or index. See previous errors.

That’s because EF tries to make the FK fields not null and use a cascade delete convention. Open the database migration .cs file and look at these lines:

If you change the declaration of FK fields to nullable type (i.e. int?), everything will be OK. The declaration of foreign key in Phone class

and the resulting database structure:

Phone table - 2 relationships

The other option is to change manually the database migration .cs file and set cascadeDelete to false.