“My Looker is slow”
If you are familiar with (persistent) derived tables, datagroups and caching, aggregate awareness, and SQL Runner, you are probably heard a lot of “My Looker is slow” and are struggling with performance.
Too slow is any dashboard that takes more than a few seconds. The more Looks (e.g. chart) in each dashboard, the more queries to queue up. In short, each query behind a Look has to be even faster.
Looker has plenty of great options for improving performance. Looker engineering has had to optimize for Looker on Snowflake, Looker on BigQuery, Looker on Redshift - all because analysts, and especially executives, do not want to wait.
Stop. It’s not Looker.
It’s the wrong approach. It’s a last resort and a bandaid to avoid fixing the real problem; the data warehouse. It is impossible to fix all the issues at scale through Looker because no matter what Looker does, it still has to wait for some query to complete. Even with datagroups and caching, or with persistent derived tables, interactive analytics needs new data that is not in the cache. And for that first time data, all the data warehouses are too slow.
Try some short-term fixes
There are a few approaches used to improve performance in the short term, either in the data warehouse, in Looker, or both.
- Aggregation: Aggregate data either before loading or in the data warehouse to reduce the amount of data. It improves performance, but you can no longer drill down unless you maintain the original fact table in the data warehouse as well. Aggregation can be done in Looker, so you have some options.
- Hardware: Increase the size of a cluster, or add clusters. You should benchmark to estimate your current and future costs so you don’t have any surprises. There are too many stories of breaking the budget.
- Caching: Beyond caching in Looker, you may be able to use more caching or materialized views in the data warehouse. Assuming you do not have frequent updates and are mostly performing repetitive queries, this can work well. Caching will generally not work well for interactive and ad hoc queries.
- Pruning: Perform a combination of pruning or aggregating older data. This is a short-term fix if your data volumes are continuing to grow.
- Partition: Break up your data into different databases, tables or clusters. In many cases this has already been done.
The long-term solution
The longer term solution is already proven by many data-driven SaaS, high tech, and industry leaders. They solve it one use case at a time, which is possible because of the way data analytics and pipelines have evolved.
Take Appsflyer, one of the leading mobile analytics and attribution vendors. Over 12,000 companies rely on its analytics. Internally they have 1,000 Looker users analyzing tens of terabytes of data, and billions of rows, or more. When they could not deliver a specific report to executives, they decided to solve that problem differently. Athena couldn’t even perform the bigger queries, and the smaller ones were taking 1 minute or more. They were already using BigQuery, and evaluated Snowflake and other specialized vendors.
They decided to implement Firebolt specifically for that use case, quickly. Looker doesn’t really care which data warehouse it uses. It can use several at the same time. Appsflyer took the existing Athena schema, created the tables in Firebolt, ran SQL ingestion scripts, and modified LookML to point to Firebolt. The dashboards didn’t need to change.
The first project took a few weeks. New projects take a few days or less.
They achieved sub-seconds performance for every query, with 33-183x faster performance and 1/5th the cost of any of the alternatives.
What were some of the key reasons for such faster performance?
What were some of the key reasons for such faster performance?
- Sub-seconds data access - Firebolt has a unique storage and data access layer with F3 (Firebolt File Format) that does not fetch entire segments or partitions from S3, only exact data ranges. The result is much more intelligent data access with 10x or faster data access times.
- Query optimization - Firebolt takes Looker SQL and turns it into a physical query plan, using indexing, multi-level pushdown optimization, cost-based optimization, and just-in-time compilation - all tuned with Looker in mind - that delivers the fastest performance each time.
- Indexing - Indexing matters. Firebolt is the only cloud data warehouse that has indexes for predicates (tables), text searches, joins and aggregations. The combination delivers much faster data operations with much less computing needed per query.
- Query execution - Firebolt’s query execution engine takes all the query optimization and combines it with vectorized processing, LLVM, and a host of other innovations to deliver really fast execution of standard Looker SQL.
How to learn more
If you want to see Looker and Firebolt in action and have questions, attend a product showdown session with Looker.
If you’re interested in Appsflyer’s experience, read more about their experiences, and watch their webinar.
Or you can just do what Appsflyer and others did. Pick your use case, your slow Looker dashboards, and just get it done in a few weeks.