What is data warehousing
Data warehousing is the process of developing, deploying and managing a data warehouse. In order to perform data warehousing you also need to perform data integration, which is the process of integrating multiple data sources together for analytics, data migration or operations. You will also have analytics or business intelligence processes.
They are all separate processes, though they should work well together so that you can build new reports quickly.
Challenges
Apart from the technical challenges a cloud data warehouse or any data warehouse faces, data warehousing face a combination of challenges that force most companies to find their own balance between agility, consistency and governance.
- Agility: End users want and need new reports in days. But for many organizations, it can take 1-3 weeks to explain all the requirements for a new report, and 1-3 months to deliver it. Part of the reason is it can take 1-3 iterations to get the report right, and as many data warehouse administrators will tell you, if you touch a data warehouse it takes a month to test and roll out to production.
- Consistency: Data needs to be extracted from various applications and other sources, integrated, merged, cleansed and made consistent. This requires some combination of data integration, data quality, and other tools such as master data management.
- Governance: Beyond the tools and best practices required to ensure a level of data quality and consistency, there are also all kinds of internal business, customer, security, and regulatory requirements for different types of data.
Making data more accessible and shortening the time to new reports can come at the cost of data quality and consistency, and data governance.
Benefits
The biggest benefit of data warehousing and its related discipline of data governance is visibility. The more data you can expose, and the more agile users can be, the more employees can improve business performance, and the happier customers will be. But bad, ungoverned or exposed data can lead to bad decisions, data breaches or regulatory fines.
Increasingly, data warehousing has incorporated the use of data lakes for holding raw data, which has improved both data access and agility. You end up with two processes. The more traditional data integration and data governance processes are focused on moving data from the data sources into the data lake. These processes can still take weeks to months.
But they do enable faster data access by data engineers. Now whenever people need new reports, data engineers can find the best data in the data lake and load it on their own using SQL-based ELT (or ETL). This can shorten new report creation from weeks to hours.
Firebolt and data warehousing
Firebolt is a cloud data warehouse that supports most traditional and newer data warehouse processes. Many companies use Firebolt with a data lake, and use Firebolt for SQL-based ELT, to help deliver new reports much faster as well as deliver more self-service analytics, including more interactive ad hoc, operational and customer-facing analytics.