SQL Join With First Matching Rows. Choosing the Best Approach

It’s a very old task for SQL developers to write a query that will join two tables and will pick only first matches from the second table to every row from the first table. In my case I need to combine client name from Clients table and his/her first phone number from Phones table.

After making some investigations I stopped at three different queries.

1. Join with SELECT TOP 1 subquery

2. Using CROSS APPLY operator

3. Subquery with SQL Server Window function (ROW_NUMBER)

If any client doesn’t have a phone number, you need to make some slight modifications to the code above: change JOIN to LEFT JOIN, and CROSS APPLY to OUTER APPLY. In this case you’ll have a client name with corresponding NULL instead a phone number.

I won’t estimate the simplicity of the code and ease of understanding. The code that looks shorter might not be the most effective. We need to compare query costs and choose the least one.

And now SQL Server will show its magic. The percentage of each query costs are 50%, 49% and 1% (just look at the screenshot below).

Join with first matching rows

So the most effective is the last query that uses a join with a ranking subquery (SQL Server creates a temporary table here). This query also operates with a minimum number of pages to retrieve the result. You can switch on the I/O statistics (run SET STATISTICS IO ON command) and look at Messages tab in SSMS. In my case I have the following output:
(70347 row(s) affected)
Table 'Phones'. Scan count 70713, logical reads 215349...
Table 'Clients'. Scan count 5, logical reads 833...
Table 'Worktable'. Scan count 0, logical reads 0...
Table 'Worktable'. Scan count 0, logical reads 0...

(70347 row(s) affected)
Table 'Phones'. Scan count 70708, logical reads 213139...
Table 'Clients'. Scan count 1, logical reads 761...

(70347 row(s) affected)
Table 'Phones'. Scan count 5, logical reads 2210...
Table 'Clients'. Scan count 5, logical reads 833...
Table 'Worktable'. Scan count 0, logical reads 0...

Tip: If you would use a LEFT JOIN in the last query, don’t place a “row_num = 1” condition in the WHERE clause, only after JOIN … ON. If you place it in WHERE clause, SQL Server will make an left outer join, and then filter rows (all NULL values will be rejected here). So you will get an equivalent of inner join.

Leave a Reply

Your email address will not be published. Required fields are marked *