SQL Topics

The price of perfection is infinite patience, but the cost of progress is a lack of time.


This site will be built up over a period of time; it is dedicated to helping individuals to apply SQL efficiently. There is good general SQL material available elsewhere. We are going to confine ourselves to useful oddball topics and areas of difficulty in SQL. Once enough topics exist, this page will only carry an index to each topic.

SQL date operations

One of the confusing portability problems for SQL is that of using vendor-specific functions not covered by the ANSI standard. I am not going to go into the politics as to why this is so, even for ubiquitous atomic database components such as dates among leading database vendors; I'll just limit myself to the technical issues.
SQL date arithmetic has two components: finding the elapsed number of days between two dates, and generating one date from another. Let's take a look at some of the portability problems to do this among presumably ANSI-standard database products: Sybase (and its derivative MS SQL), DB2 and Oracle. I'm also going to include date unit extraction from a date.
SQL date operations Adaptive Sybase Anywhere (and original Watcom SQL)
Sybase Adaptive Enterprise (and MS SQLServer)
Oracle DB2 (v5)
Advance/retard date

<date> [+-] <number of days>

dateadd(day, [+-]<number of days>, <date>) <date> [+-] <number of days> <date> [+-] <number of days>
Number of days between two dates <date1> - <date2> datediff(day, <date2>, <date1>) <date1> - <date2> days(<date1>) - days(<date2>)
Today's date today(*) getdate() sysdate current date
Day of a date day(<date>) datepart(dd, <date>) to_char(<date>, 'DD') day(<date>)
Month of a date month(<date>) datepart(mm, <date>) to_char(<date>, 'MM') month(<date>)
Year of a date year(<date>) datepart(yy, <date>) to_char(<date>, 'YYYY') year(<date>)
Quarter of a date quarter(<date>) datepart(qq, <date>) to_char(<date>, 'Q') quarter(<date>)
Weekday of a date dayname(<date>) datepart(dw, <date>) to_char(<date>, 'DAY') dayname(<date>)

 

This table actually doesn't seem that bad. The nitpicking problems show up in actual implementation code.

For example, addressing an Oracle database with the query "Customers that have orders with each order date at least greater than 30 days ago" requires the following code - note the use of the round() function to ensure non-fractional comparisons. Oracle's date arithmetic will otherwise produce fractional day values if there is a time component in the date:

select b1.NAME as "NAME"
from DEMO.CUSTOMER b1
where b1.CUSTOMER_ID in
     (select c1.CUSTOMER_ID
      from DEMO.SALES_ORDER c1
      where not exists
          (select null from DEMO.SALES_ORDER c3
           where round(sysdate) - c3.ORDER_DATE < 30

           and c1.CUSTOMER_ID = c3.CUSTOMER_ID))

A similar query for DB2 - "Department names for departments that haven't hired any empoyees in the last year" - is as follows:

select f1.DEPTNAME as "department names"
from MIKE.DEPARTMENT f1
where f1.DEPTNO in
    (select g1.WORKDEPT from MIKE.EMPLOYEE g1
     where not exists
         (select * from MIKE.EMPLOYEE g3
          where days( current date ) - days(g3.HIREDATE) < 365
          and g1.WORKDEPT = g3.WORKDEPT))

Mixing and matching WHERE and HAVING

This topic probably has more to do with the original design of SQL than with any standards committee.

The normal structure of an SQL sentence is as follows:

SELECT
FROM
[WHERE]
[AND|OR] [< more WHERE conditions>]
[GROUP BY]
[HAVING][NOT]
[AND|OR][<more HAVING conditions>]
[UNION]
[ORDER BY]

Most of the time this arrangement causes no difficulties. However, note that in the above the implied conjunction (logical operator) connecting the WHERE and HAVING clauses is AND. What happens if you need to say OR instead? Well, there are two basic solutions, both of them ugly.

One solution is to throw the WHERE conditions into the HAVING clause and risk not only making life difficult for the SQL optimizer, but also winding up with something that is incorrect. In the following example, if you replace prod_id with quantity you will very likely not get what you want, because you have to group.

select b1.lname as "lname"
from "DBA".customer b1
where b1.id in
     (select d1.cust_id
      from "DBA".sales_order d1
      where d1.id in
          (select c1.id from "DBA".sales_order_items c1
           group by c1.id, c1.prod_id
           having avg(c1.quantity) > 90
           or c1.prod_id > 100))

The second approach takes into account that HAVING conditions involve aggregations, and aggregations can usually be rewritten as correlated subqueries that fit into WHERE clauses. These can also reduce the optimizer's efficiency in some systems. Here I use quantity as I originally intended.

select b1.lname as "lname"
from "DBA".customer b1
where b1.id in
(select d1.cust_id
   from "DBA".sales_order d1
   where ( d1.id in
      (select c1.id
       from "DBA".sales_order_items c1
       where ( c1.quantity > 100
       or 90 <
           (select avg(c3.quantity)
            from "DBA".sales_order_items c3
            where c1.id = c3.id)))))

There is a third, elegant solution: use a union instead of an OR operator.

select m1.lname as "lname"
from "DBA".customer m1
where m1.id in
    (select o1.cust_id
     from "DBA".sales_order o1
     where o1.id in
         (select n1.id
          from "DBA".sales_order_items n1
          where n1.quantity > 100))
union all
select m1.lname as "lname"
from "DBA".customer m1
where m1.id in
    (select o1.cust_id
     from "DBA".sales_order o1
     where o1.id in
        (select n1.id
         from "DBA".sales_order_items n1
         group by n1.id
         having avg(n1.quantity) > 90))

Aggregations against a full table display

The knee-jerk idiom for this kind of query is a HAVING clause that groups everything in sight - not always a good idea.

Negated conditions don't always lead to similar SQL

Just because you found an idiom that works for a particular relationship between tables don't assume it will also work for other relationships. Sometimes an SQL idiom that works for closely related tables fails to produce a correct answer when used on distantly related tables.