Monday, August 17, 2015

IN vs Exists performance



Oracle Tip: Understand the difference between IN and EXISTS in subqueries

When using rule-based optimization, you can determine the performance of a rule-based query by understanding which table is the driving table and how many rows each part returns. Find out how with this hands-on solution. 






When coding a SQL statement with tables in master-detail relationships, it's common to have to decide whether to write the query using the WHERE EXISTS (. . .) clause or the WHERE value IN (. . .) clause. You may resist using WHERE EXISTS because it has the awkward syntax of returning a value, which you always ignore.
However, there's a difference when using rule-based optimization. You can determine the performance of a rule-based query by understanding which table is the driving table and how many rows each part returns.
When you write a query using the IN clause, you're telling the rule-based optimizer that you want the inner query to drive the outer query (think: IN = inside to outside). For example, to query the 14-row EMP table for the direct reports to the employee KING, you could write the following:

select ename from emp e
    where mgr in (select empno from emp where ename = 'KING');
Here's the EXPLAIN PLAN for this query:

OBJECT     OPERATION
————— ————————————————————
                 SELECT STATEMENT()
                  NESTED LOOPS()
EMP                TABLE ACCESS(FULL)
EMP                 TABLE ACCESS(BY INDEX ROWID)
PK_EMP               INDEX(UNIQUE SCAN)

This query is virtually equivalent to this:

select e1.ename from emp e1,(select empno from emp where ename = 'KING') e2
    where e1.mgr = e2.empno;
You can write the same query using EXISTS by moving the outer query column to a subquery condition, like this:

select ename from emp e
    where exists (select 0 from emp where e.mgr = empno and ename = 'KING');
When you write EXISTS in a where clause, you're telling the optimizer that you want the outer query to be run first, using each value to fetch a value from the inner query (think: EXISTS = outside to inside).
The EXPLAIN PLAN result for the query is:

OBJECT     OPERATION
————— ————————————————————
                 SELECT STATEMENT()
                  FILTER()
EMP                TABLE ACCESS(FULL)
EMP                 TABLE ACCESS(BY INDEX ROWID)
PK_EMP               INDEX(UNIQUE SCAN)
This is virtually similar to the PL/SQL code:

set serveroutput on;
declare
    l_count integer;
begin
    for e in (select mgr,ename from emp) loop
        select count(*) into l_count from emp
         where e.mgr = empno and ename = 'KING';
        if l_count != 0 then
            dbms_output.put_line(e.ename);
        end if;
    end loop;
end;
To determine which clause offers better performance in rule-based optimization, consider how many rows the inner query will return in comparison to the outer query. In many cases, EXISTS is better because it requires you to specify a join condition, which can invoke an INDEX scan. However, IN is often better if the results of the subquery are very small. You usually want to run the query that returns the smaller set of results first.
Some people avoid the EXISTS clause because of the requirement to return a result from the query—even though the result is never used. Depending on personal style, people often use 'X,' 1, 0, or null. From looking at the EXPLAIN PLAN output, it appears that the optimizer throws out whatever value you enter and uses 0 all the time. Many developers get into the habit of always entering some constant value.


Which is faster – IN or EXISTS?

AUGUST 20, 2010


in-vs-exists
A question asked multiple times over on Oracle forums: Which is faster – IN or EXISTS?
The short answer, post-Oracle 9i is:
Both are pretty much the same!

Huh? What about that thumb rule – IN for small inner query, EXISTS for big inner query?

That used to be true in the pre-9i versions of Oracle.
The recommendation at that time were:
  • If the majority of the filtering criteria is in the subquery, use IN.
  • If the majority of the filtering criteria is in the main query, use EXISTS.
In other words,
  • IN for big outer query and small inner query.
  • EXISTS for small outer query and big inner query.
If both the outer query and inner query were large, either could work well – the choice would depend on indexes and other factors.
In current versions of Oracle, the optimizer (CBO) costs both to produce the best plan.

So which should I use now – IN or EXISTS?

Use whichever makes logical sense in the context. Or whichever is your personal favorite. It’s pretty much the same thing.


You Asked

Tom:

can you give me some example at which situation
IN is better than exist, and vice versa. 

and we said...

Well, the two are processed very very differently.

Select * from T1 where x in ( select y from T2 )

is typically processed as:

select * 
  from t1, ( select distinct y from t2 ) t2
 where t1.x = t2.y;

The subquery is evaluated, distinct'ed, indexed (or hashed or sorted) and then joined to 
the original table -- typically.


As opposed to 

select * from t1 where exists ( select null from t2 where y = x )

That is processed more like:


   for x in ( select * from t1 )
   loop
      if ( exists ( select null from t2 where y = x.x )
      then 
         OUTPUT THE RECORD
      end if
   end loop

It always results in a full scan of T1 whereas the first query can make use of an index 
on T1(x).


So, when is where exists appropriate and in appropriate?

Lets say the result of the subquery
    ( select y from T2 )

is "huge" and takes a long time.  But the table T1 is relatively small and executing ( 
select null from t2 where y = x.x ) is very very fast (nice index on t2(y)).  Then the 
exists will be faster as the time to full scan T1 and do the index probe into T2 could be 
less then the time to simply full scan T2 to build the subquery we need to distinct on.


Lets say the result of the subquery is small -- then IN is typicaly more appropriate.


If both the subquery and the outer table are huge -- either might work as well as the 
other -- depends on the indexes and other factors. 

No comments:

Post a Comment