Another kind of ugly T-SQL queries generated by Entity Framework is pivot queries. For example, I have a normalized table with 3 essential columns: operation name, date and numeric result. I need to build a LINQ query that will produce a pivot table with operation date and calculate a sum or results by each operation.
LINQ allows to build a query but I need to place exact operation names in this query.
1 2 3 4 5 6 7 8 9 10 11 12 |
var pivot = context.RegularTable .GroupBy(r => r.OperationDate) .Select(r => new { OperationDate = r.Key, Op1Result = r.Where(x => x.Operation == "op1").Sum(x => x.Result), Op2Result = r.Where(x => x.Operation == "op2").Sum(x => x.Result), Op3Result = r.Where(x => x.Operation == "op3").Sum(x => x.Result), Op4Result = r.Where(x => x.Operation == "op4").Sum(x => x.Result), Op5Result = r.Where(x => x.Operation == "op5").Sum(x => x.Result) } ); |
The T-SQL code is a bit dreadful.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
SELECT 1 AS [C1], [Project6].[OperationDate] AS [OperationDate], [Project6].[C1] AS [C2], [Project6].[C2] AS [C3], [Project6].[C3] AS [C4], [Project6].[C4] AS [C5], [Project6].[C5] AS [C6] FROM ( SELECT [Project5].[OperationDate] AS [OperationDate], [Project5].[C1] AS [C1], [Project5].[C2] AS [C2], [Project5].[C3] AS [C3], [Project5].[C4] AS [C4], (SELECT SUM([Extent6].[Result]) AS [A1] FROM [dbo].[RegularTable] AS [Extent6] WHERE ([Project5].[OperationDate] = [Extent6].[OperationDate]) AND (N'op5' = [Extent6].[Operation])) AS [C5] FROM ( SELECT [Project4].[OperationDate] AS [OperationDate], [Project4].[C1] AS [C1], [Project4].[C2] AS [C2], [Project4].[C3] AS [C3], (SELECT SUM([Extent5].[Result]) AS [A1] FROM [dbo].[RegularTable] AS [Extent5] WHERE ([Project4].[OperationDate] = [Extent5].[OperationDate]) AND (N'op4' = [Extent5].[Operation])) AS [C4] FROM ( SELECT [Project3].[OperationDate] AS [OperationDate], [Project3].[C1] AS [C1], [Project3].[C2] AS [C2], (SELECT SUM([Extent4].[Result]) AS [A1] FROM [dbo].[RegularTable] AS [Extent4] WHERE ([Project3].[OperationDate] = [Extent4].[OperationDate]) AND (N'op3' = [Extent4].[Operation])) AS [C3] FROM ( SELECT [Project2].[OperationDate] AS [OperationDate], [Project2].[C1] AS [C1], (SELECT SUM([Extent3].[Result]) AS [A1] FROM [dbo].[RegularTable] AS [Extent3] WHERE ([Project2].[OperationDate] = [Extent3].[OperationDate]) AND (N'op2' = [Extent3].[Operation])) AS [C2] FROM ( SELECT [Distinct1].[OperationDate] AS [OperationDate], (SELECT SUM([Extent2].[Result]) AS [A1] FROM [dbo].[RegularTable] AS [Extent2] WHERE ([Distinct1].[OperationDate] = [Extent2].[OperationDate]) AND (N'op1' = [Extent2].[Operation])) AS [C1] FROM ( SELECT DISTINCT [Extent1].[OperationDate] AS [OperationDate] FROM [dbo].[RegularTable] AS [Extent1] ) AS [Distinct1] ) AS [Project2] ) AS [Project3] ) AS [Project4] ) AS [Project5] ) AS [Project6] |
If you look at execution plan, you could find 5 nested loops for each operation name and 6 table scans (for 5 operation names + 1 for operation date). SQL query written with PIVOT operator is more compact.
1 2 3 4 5 6 7 8 9 10 |
SELECT OperationDate, [Op1], [Op2], [Op3], [Op4], [Op5] FROM ( SELECT OperationDate, Operation, Result FROM RegularTable ) t PIVOT ( SUM(Result) FOR Operation IN ([Op1], [Op2], [Op3], [Op4], [Op5]) ) AS PivotTable; |
Comparison of execution plans shows that pure T-SQL code is twice cheaper. This result is based on a small set of records.
SQL PIVOT uses just one table scan, but it makes a sort operation. I see a potential threat that it would lead to TempDB spill where it would be run on a large set of records. So the situation could be different on big amount of data.