Understanding ‘Upsert’
Note: This article describes ‘Upsert’ functionality and is not specific to Firebolt. To understand how to perform ‘Upsert’ in Firebolt, please access Firebolt help center here.
When dealing with large volumes of data, a relational database management system (RDBMS) helps users manage their database. It is this support system that offers key features to handle data. One feature of a RDBMS is ‘Upsert’. The word ‘Upsert’ is derived from a combination of two words ‘update’ and ‘insert’. As the name goes, the Upsert command allows the user to alter existing data to an extent by inserting a whole new row or just updating an existing one. An example of a scenario where a database operation as Upsert can be of use is when updating or expanding employee information on a database. For instance, if the company wishes to add a new row to include employee ID or change employee contact details, Upsert can get it done. Although ‘Upsert’ is merely a command for an operation that must be executed, it is not universal (i.e., not all RDBMS have ‘Upsert’ as a command).
Here is an example of using upsert to update an existing table of employee details:
SQL Statement
UPSERT INTO employees (id, name, email) VALUES (2, ‘Shane’, ‘Shane@testxyzAsia.io’
);
Current Table
Table After Upsert
In the above example, 2nd row old employee details were replaced with new employee information.
The example given below shows how upsert is used to insert a new row of employee details.
SQL Statement
UPSERT INTO employees (id, name, email) VALUES (3, ‘Raj’, ‘Raj@testxyzHR.corp’
);
Current Table
Table After Upsert
‘Upsert’ is a simple function when performed on a database because the in-built indexes of a database facilitate identification of a specific record or entry, enabling easy changes to an entry or entries. Without indexes identifying specific records becomes difficult for operations like Upsert. Although indexes have been commonly used in RDBMS, they are not extensively found in data warehouses. Despite their lack of popularity, they are now considered essential in data warehouses due to the need to maintain a history of data with reference to the source data that is served into the ETL tool. A typical use case is when trying to maintain slowly changing dimensions in a data warehouse. In such cases, new records have to be inserted, data in the warehouse that is no longer in source have to be flagged or removed and updates done to the source must reflect in the warehouse. So, Upsert can help make some of these changes. However, it could be a tedious task to do the update and insert tasks separately for each instance. Hence, the MERGE command combines insert, update and delete operations such that these tasks are performed in one go without the need to write separate syntax for each. ‘Merge’ command can be used to combine data from a source table into a target table. Based on a specific condition, records can be updated, deleted or inserted simplifying the process of ingesting data.