Like a traditional data warehouse, cloud data warehouses collect, integrate, and store data from internal and external data sources. Data is typically transferred from a source system using a data pipeline. The data is extracted from the source system, transformed, and then loaded into the data warehouse—a process known as ELT (extract,transform,load). Data can also be sent directly to a central repository and then converted using ELT (extract, load, transform) processes. From there, users can use different business intelligence (BI) tools to access, mine, and report on data. Cloud data warehouses should also support streaming use cases to activate on data in real or near-real time.
Cloud data warehouses offer structured and semi-structured data storage, processing, integration, cleansing, loading, and so on within a public cloud environment. You can also use them with a cloud data lake to collect and store unstructured data. With some providers, it’s even possible to unify your data warehouse and data lake to maintain and centrally manage a single copy of your enterprise data.
Different cloud providers may take various approaches when it comes to cloud data warehouse services. For example, some cloud data warehouses may use a cluster-based architecture similar to a traditional data warehouse. In contrast, others adopt a modern server less architecture , which further minimizes data management responsibilities. However, most cloud data warehouses provide built-in data storage and capacity management features and automatic upgrades.
Other key capabilities that cloud data warehouses include:
Massively parallel processing (MPP)
Columnar data stores
Self-service ETL and ELT data integration
Disaster recovery features and automatic backups
Compliance and data governance tools
Built-in integrations for BI, AI, and machine learning