Orange is my favorite color

I use fairly vanilla SQL most of the time so I was shocked to read this post on the PostgreSQL list when I ran into some slowness with a NOT IN query. While my experience is with Postgres, apparently this works with most databases.

If you have a NOT IN with a subquery that returns lots of rows, an EXISTS or NOT EXISTS will usually be faster. And not just a little bit. Like by a factor of 200:1 in my NOT EXISTS case! I am doing batch modification to a new schema and used this query to get a count of the remaining records while I was running my conversion scripts:

SELECT count(*) as total
FROM tblLookupAuthenticateOld
WHERE uidMember NOT IN (SELECT uidMember FROM tblLookupAuthenticate)

The EXPLAIN looks like:

"Aggregate  (cost=9033491.00..9033491.01 rows=1 width=0)"
"  ->  Seq Scan on tbllookupauthenticateold  (cost=1772.98..9033453.73 rows=14910 width=0)"
"        Filter: (NOT (subplan))"
"        SubPlan"
"          ->  Materialize  (cost=1772.98..2304.14 rows=29816 width=36)"
"                ->  Seq Scan on tbllookupauthenticate  (cost=0.00..1510.16 rows=29816 width=36)"

Look at that cost: 9033491.00!! This took about 100 seconds to run on my laptop. It’s because the query is effectively doing a sequential scan for every record in the first table. Ugh. Then I swapped it to NOT EXISTS:

SELECT count(*) as total
FROM tblLookupAuthenticateOld
WHERE NOT EXISTS (select * FROM tblLookupAuthenticate WHERE uidMember = tblLookupAuthenticateOld.uidMember)

The EXPLAIN looks like:

"Aggregate  (cost=247542.82..247542.83 rows=1 width=0)"
"  ->  Seq Scan on tbllookupauthenticateold  (cost=0.00..247505.54 rows=14910 width=0)"
"        Filter: (NOT (subplan))"
"        SubPlan"
"          ->  Index Scan using tbllookupauthenticate_pkey1 on tbllookupauthenticate  (cost=0.00..8.28 rows=1 width=298)"
"                Index Cond: (uidmember = $0)"

This query ran in about 500ms. That’s a “decent” improvement of 200:1. I’m planning to run a search in Eclipse tonight for “NOT IN” to see if there are any places where NOT IN once made sense but not [NOT] EXISTS is more appropriate as the subquery returns more and more rows.

Leave a Comment

XHTML: Line-breaks are automatic. Available tags are <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong> <img src="">