Click to See Complete Forum and Search --> : EXPLAIN on very similar SQL statements is different


kirilisa
04-27-2008, 08:30 PM
Can anyone tell me what is the difference between the two SQL statements? One has an INNER JOIN as opposed to a LEFT JOIN for one of its joins, and they do the self-join in opposite orders, but otherwise they are the same (and return the same stuff, albeit not in the same order). I ran EXPLAIN on them both. is EXPLAIN indicating that the first is more efficient? (b/c in practice it feels like the second is) And if they are so similar why does EXPLAIN show such difference anyway? Why isn't the ContactFK_1 index being used on table c in the second statement?

When doing multiple joins, are the joins done first on the full tables, and then any applicable WHERE criteria is applied? Or is the WHERE criteria applied before/during the join? When is it better to do a join on a subselect (thus applying the WHERE criteria earlier)? Yeek.


mysql> EXPLAIN SELECT
pc.contactId as ci,
c.contactId,
c.firstName, c.lastName,
pc.firstName as fn, pc.lastName as ln,
a.searchAddress,
n.value
FROM (`Contact` pc)
LEFT JOIN `Contact` c ON pc.contactId = c.primaryContactId
LEFT JOIN `Address` a ON a.contactId = pc.contactId
LEFT JOIN `ContactNumber` n ON n.contactId = pc.contactId
WHERE
(a.invalidFlag is null or a.invalidFlag = 0)
AND
c.searchName LIKE '%wilkinso%';
+----+-------------+-------+-------+------------------------+------------------------+---------+--------------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+------------------------+------------------------+---------+--------------------+-------+-------------+
| 1 | SIMPLE | pc | index | PRIMARY | Contact_NameI | 46 | NULL | 61055 | Using index |
| 1 | SIMPLE | a | ref | Address_FKIndex1 | Address_FKIndex1 | 4 | fcprd.pc.contactId | 1 | Using where |
| 1 | SIMPLE | n | ref | ContactNumber_FKIndex1 | ContactNumber_FKIndex1 | 4 | fcprd.pc.contactId | 1 | |
| 1 | SIMPLE | c | ref | Contact_FKIndex1 | Contact_FKIndex1 | 5 | fcprd.pc.contactId | 1 | Using where |
+----+-------------+-------+-------+------------------------+------------------------+---------+--------------------+-------+-------------+

mysql> EXPLAIN SELECT
pc.contactId as ci,
c.contactId,
c.firstName, c.lastName,
pc.firstName as fn, pc.lastName as ln,
a.searchAddress,
n.value
FROM Contact AS c
INNER JOIN Contact AS pc on c.primaryContactId = pc.contactId
LEFT JOIN Address AS a on pc.contactId = a.contactId
LEFT JOIN ContactNumber AS n on pc.contactId = n.contactId
WHERE
(a.invalidFlag is null or a.invalidFlag = 0)
and
(c.searchName LIKE '%wilkinso%')
+----+-------------+-------+--------+------------------------+------------------------+---------+--------------------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+------------------------+------------------------+---------+--------------------------+-------+-------------+
| 1 | SIMPLE | c | ALL | Contact_FKIndex1 | NULL | NULL | NULL | 61055 | Using where |
| 1 | SIMPLE | pc | eq_ref | PRIMARY | PRIMARY | 4 | fcprd.c.primaryContactId | 1 | |
| 1 | SIMPLE | a | ref | Address_FKIndex1 | Address_FKIndex1 | 4 | fcprd.c.primaryContactId | 1 | Using where |
| 1 | SIMPLE | n | ref | ContactNumber_FKIndex1 | ContactNumber_FKIndex1 | 4 | fcprd.c.primaryContactId | 1 | |
+----+-------------+-------+--------+------------------------+------------------------+---------+--------------------------+-------+-------------+

chazzy
04-27-2008, 09:52 PM
where is applied last. join syntax is applied first. if you want to limit it in the way you're suggesting, i'd suggesting throwing more into the ON clause.

nonamepub
04-27-2008, 09:59 PM
When joining two or more tables, you're performing a cartesian product - Every tuple, or element, is being paired with every element in the other table. This might not sound like it makes sense. For instance:

Employee Table (EmployeeID, Name)
Works_For Table (EmployeeID, Boss)

It wouldn't make sense to pair every employee with every boss, since this joining would produces some relation that doesn't "technically" make sense.

More intuitive, and useful, is the natural join. This allows a joining of two or more tables BASED ON SAME ATTRIBUTES. So if we natural joined the tables above, we would end up with a table (EmployeeID, Name, Boss) with the tuples being lined-up based on the similar attribute, EmployeeID.

Now, you would use outer joins in the following case: Lets say the Work_for table has a tuple with an EmployeeID and a Boss, but this person isn't in the Employee Table. You may want him to show up in the result - but the resulting table will have the attribute Name... What should the database put for this person's name!? The outter join pads any missing fields with null values.

Employee right outer join Works_For
... left outer join ...
... full outer join ...

I would put natural in front to force the table to join based on common attributes. So, Employee left natural join Works_For would produce a table with employeeID, name, and boss, and would pad null values for BOSS if there was a tuple in the Employee Table that did NOT have a boss corresponding to him/her.

Hope this helps.

P.S. In your code, you use the format: "a JOIN b ON a.x = b.x"
This is just explicitly labeling the common attribute. The natural keyword would take care of this for you.

kirilisa
04-27-2008, 11:24 PM
Thanks guys for getting back!

Chazzy, you are saying, turn the statement into something like this:


SELECT
pc.contactId as primaryId,
c.contactId,
c.firstName, c.lastName,
pc.firstName as primaryFname, pc.lastName as primaryLname,
a.searchAddress, a.defaultFlag, a.invalidFlag,
cn.value, cn.typeCode, cn.defaultFlag
FROM Contact AS c
INNER JOIN Contact AS pc on c.primaryContactId = pc.contactId
LEFT JOIN Address AS a on (pc.contactId = a.contactId and a.invalidFlag = 0 and a.defaultFlag = 1)
LEFT JOIN ContactNumber AS cn on (pc.contactId = cn.contactId and cn.defaultFlag = 'Y')
WHERE
(c.searchName LIKE '%wilkins%')
order by primaryId


That actually is faster (and I could simply change the LEFT JOIN to an INNER JOIN if e.g. address search criteria were supplied and thus couldn't allow NULL padding in address results). I didn't realize you could put WHERE type stuff into the ON clause in a join.

I still don't understand why when I EXPLAIN it the query thru c isn't using the index, though.

Nonamepub, I thought that putting in a NATURAL was kind of like INNER? but you're saying you can still make a NATURAL JOIN pad with NULL? I couldn't get it to work as it started throwing me a 'Column contactId in from clause is ambiguous' error when I replaced the LEFT JOINs with NATURAL JOINs, presumably because e.g. Address.contactId isn't sure whether to be natural joining with c.contactId or pc.contactId, given the self join.

[Also, it seems like it won't allow you to put anything in the ON clause if using a natural join so it wouldn't be possible to do what chazzy suggested. Or am I wrong?]

nonamepub
04-28-2008, 10:17 AM
The natural join will only combine tables based on matching columns. So, if you're joining two tables with different column names, but the data is suppose to be the same, you would use the inner join and specify the constraint such as: "c.primaryContactId = pc.contactId" (which is what you have).

chazzy
04-28-2008, 02:23 PM
the difference is clearly the use of INNER JOIN vs. LEFT JOIN. inner join requires a "scale-down" of values, where as left join doesn't. inner join requires a matching value on both sides, and left join doesn't (can handle nulls).

kirilisa
04-29-2008, 01:23 AM
So you're saying that even though in this case with regards to the self-join INNER JOIN and LEFT JOIN will return the same thing (because there happen to be no rows resulting from the self-join which would need to be padded with null), that it still affects the efficiency of the search? (I was under the impression that INNER JOINS were generally less efficient)

Nonamepub, I was referring to mysql's page here http://dev.mysql.com/doc/refman/5.0/en/join.html where it says "The NATURAL [LEFT] JOIN of two tables is defined to be semantically equivalent to an INNER JOIN or a LEFT JOIN with a USING clause that names all columns that exist in both tables."

chazzy
04-29-2008, 09:21 AM
INNER JOIN is more efficient actually. Left join has to figure out the possible null values, inner join drops the rows without a match.

But what you're saying is correct - even just using INNER JOIN vs LEFT JOIN will affect query performance. If you can guarantee that the row will exist in the target table, us INNER, otherwise use LEFT, unless you don't want the row returned if the target's missing.