Here are some SQL optimizations that are meant to be simple to implement and not compromise the overall code readability while giving you a performance boost.
1. UNION ALL vs. UNION
Both operators are used to combine the result sets of two or more SELECT statements. The difference is that UNION removes the duplicate records by performing a DISTINCT operation on the result set, whereas UNION ALL does not remove duplicates and is, therefore, faster than UNION.
Solution: Use UNION ALL instead of UNION when unioned results are mutually exclusive or when duplicates are not a problem.
2. Using leading wildcards when filtering
When you use leading wildcards in a query, e.g., WHERE name like ‘%Miles%’, the query is not sargable – which is a fancy way of saying that a query cannot take advantage of an index to speed up its execution.
If, on the other hand, you replace the leading wildcard with a trailing wildcard like the following: WHERE name like ‘Miles%’, your query will be able to leverage any of the indexes you have defined on the column when performing the search – the query will be sargable.
Solution: Replace ‘%string%’ with ‘string%’ where possible.
3. Using OR in the JOIN predicate vs. UNION ALL
The OR operator is an expensive database used to chain multiple JOIN conditions—especially when used in the JOIN clause. One way to boost the query performance is to split the JOIN predicates into individual queries and use UNION to combine the result sets.
Using OR:
SELECT p.name
FROM dim_date d
INNER JOIN fact_salesorder f ON p.dim_product = f.dim_productid OR p.market = f.store_market
Using UNION ALL:
SELECT p.name
FROM dim_product p
INNER JOIN fact_salesorder f ON p.dim_productid = f.dim_productid
UNION ALL
SELECT p.name
FROM dim_product p
INNER JOIN fact_salesorder f ON p.market = f.store_market AND p.dim_productid != f.dim_productid
Solution: Split conditions chained with OR in the JOIN predicate into multiple queries combined with UNION ALL.
4. Using ANY_VALUE(column) vs. GROUP BY column
When writing analytical queries, you will often use the GROUP BY clause to report metrics qualified by the selected attributes you want to report on.
But sometimes, it doesn’t make sense to add all the attributes in the GROUP BY - they are not grouping data at a higher granularity but are only used for display purposes alongside the other attributes and metrics.
For example, if you want to see the total sales by customer name and id, we might write the following query:
SELECT c.dim_customerid, c.name, sum(f.sales_amount)
FROM dim_customer c
INNER JOIN fact_salesorder f ON c.dim_customerid = f.dim_customerid
GROUP BY c.dim_customerid, c.name;
Suppose we know that each dim_customerid corresponds to one single customer name. In that case, we can safely remove the name attribute from the GROUP BY and instead select one single value from the group using the ANY_VALUE aggregate function in the select clause:
SELECT c.dim_customerid, any(c.name), sum(f.sales_amount)
FROM dim_customer c
INNER JOIN fact_salesorder f ON c.dim_customerid = f.dim_customerid
GROUP BY c.dim_customerid;
5. Using indexes to speed up the query performance.
Indexes are the primary way for users to accelerate query performance. While indexes are common with Online Transaction Processing applications, they are used sparingly with big data. If you want to find out more on how to use indexes with a data warehouse, check out our blog on Indexes in Action.