Data Warehouse Architecture Best Practices and Guiding Principles - 【2020 】
data warehouse architecture

Data Warehouse Architecture Best Practices and Guiding Principles

The organization of a data warehouse can have different structures in different implementations. Some may have one ODS (operational data store), while others may have multiple data marts. Some may have a small number of data sources, while others may have dozens of data sources. Given this, it is much more reasonable to present the different layers of a data warehouse architecture rather than discussing any specific system.

What is data warehouse architecture?

The data bank structure is, depending on the use and coordination of company-specific requirements, a central database. Each data warehouse construction has its advantages and disadvantages in development, operation and maintenance.

When starting to integrate it, it is important to know which formation concept will be used for further development. So that both developers and users have the same understanding.

Data warehouse architecture components

These are the key components within the data storage composition.

Databases

The central component of a data repository organization is a database in which all company data is stored and managed for reporting purposes. Of course, this means that you must choose which type of database you want to use to store data in your warehouse.

Read also
Data Warehouse Job Titles: Data Analyst

The following four types of databases can be used:

Typical relational databases

These are row-oriented databases that you can use every day. For example, Microsoft SQL Server, SAP, Oracle, and IBM DB2.

Analysis databases designed specifically for data warehousing to maintain and manage analysis

For example, Teradata and Greenplum.

Data warehousing applications

This is not exactly a type of storage database, but some retailers now offer applications that provide both data management software and data warehousing hardware. For example, SAP Hana, Oracle Exadata and IBM Netezza.

Cloud-based databases

These can be hosted and accessed in the cloud, so you don’t need to buy hardware to set up your data warehouse. For example, Amazon Redshift, Microsoft Azure SQL and Google BigQuery.

Extraction, transformation and loading tools (ETL)

ETL tools are fundamental to a data warehouse structure. With these tools, you can extract data from various sources, convert it into a suitable layout, and load it into a data store.

The ETL tool you choose determines the following:

  • The time spent extracting data.
  • Approaches to extracting data.
  • Type of transformations applied and the ease of doing so
  • Definition of business rules for data validation and cleaning to improve the analysis of the final product
  • The relocated data is filled
  • Structure the distribution of information from the fundamental repository to your BI applications

Metadata

Metadata describes the data warehouse and provides a framework for the data. It helps with the construction, storage, handling and use of the data warehouse.

Read also
Virtual Data Warehouse Tools

It can be divided into two types:

Technical metadata

This includes information that developers and managers can use to perform development and warehouse management tasks.

Business Metadata

This includes information that provides an easy to understand view of the data stored in the repository.

Metadata plays an important role for both companies and technical teams in understanding the data available in the warehouse and converting it into information.

Data warehousing: access tools

A data warehouse uses a database or a group of databases. Business users generally cannot work directly with databases. That’s why they use the support of several tools. Some of these tools include:

Query and reporting tools

They allow users to create business reports for analysis, which can take the form of spreadsheets, calculations or interactive images.

Application development tools

They help create customized reports and present them in interpretations for specific reporting purposes.

Data mining tools

They systematize the process of identifying matrices and links in large amounts of data using the latest statistical modeling methods.

OLAP Tools

They help build a multi-dimensional data warehouse and enable the analysis of company data from a variety of perspectives.

Data retention Bus

It defines the flow of data within a data storage architecture and contains a data mart. A data mart is an access level used to transfer data to users. It is used to partition data that is created for the respective user group.

Web-enabled data warehouse versus traditional architectures

One way to integrate the company’s internal data store and use it for analysis is to use a data warehouse. There are many ways to implement such a data bank. Many of these options can be classified into two areas. In the area of “traditional” data warehouses, where the layers of the constitution persist, and in the area of virtual databases, where the layers of the design are described more or less logically only and there are almost no physical representations.

Read also
Difference between Data Warehouse, Business Intelligence and Big Data

Traditional approaches attempt to optimize performance when processing analytical queries by storing redundant data. The presentation layer to be queried is often represented by a multidimensional data centre. Virtual or mostly semi-virtual approaches try to minimize redundancies by describing the processes in a logical way and only calculating them on demand on the fly.

Performance is sacrificed for greater flexibility and faster development. Therefore, these two approaches are at different extremes of the high performance/high flexibility trade-off.

Factors to be consider in selecting a data warehouse architecture

To understand which type of structure is more convenient for our company we must know the advantages and disadvantages offered by the different types. Let’s find out which data warehouse architecture is most successful.

Types of data warehouse constructions

A data repository formation defines the layout of the data and the storage structure. Because data must be organized and cleaned to be valuable, a data retention composition focuses on determining the most effective technique for extracting raw information in the staging area and transforming it into a simple consumable structure using a dimensional model that provides valuable business intelligence.

There are three main types of architectures to consider when designing a company’s data store.

Single-level composition

A single-level data warehouse organization is about creating a dense record and reducing the volume of stored data. This constitution is not suitable for businesses with complex data requirements and numerous data streams, although it is advantageous in eliminating redundancies.

Read also
SAP Data Warehouse Cloud

Two-tier architecture

This design divides the data sources of the material in the warehouse itself. The two-tier structure is not scalable, although it is more efficient in storing and organizing data. Furthermore, it only supports a nominal number of users.

Three-tier construction

This is the most common type of data warehouse architecture because it creates a well-organized data flow from raw information to valuable information.

The lowest level generally consists of the database server, which creates an abstraction layer for data from numerous sources, such as transactional databases, that are used for front-end applications.

The middle level contains an online analytical processing server (OLAP). From the user’s point of view, data at this level is shifted to a layout that is more suitable for varied analysis and testing.

The third and higher level is the client level, which contains the tools and application programming interface (API) used to analyze, query and report high-level data.

Metadata in the storage of data

A metadata system is an integral part of the data base formation. The metadata is stored and managed in this database. In this way, the information in the data warehouse can be found quickly and securely and can be used autonomously.

The metadata for a data bank has three main purposes: the administration of the system, the specification of the meaning of the stored content and the navigation component.

Developers and administrators of a data warehouse mainly need technically oriented metadata. This includes information about the data sources, rules for improving data quality, rules for transformation and consolidation steps, mapping information between the data sources and the data repository models, as well as the metadata of the data models in the database itself.

Read also
What is a Data Warehouse for a Sales Manager?

Users of the data warehouse primarily need metadata to understand and evaluate the data contained there. The metadata are particularly important as they allow semantic interpretation of the content of the data warehouse. These are, for example, definitions of commercial terms used or the connection of specialized vocabulary to data objects.

The descriptions of the reports that can be generated, the responsible contact persons and the requirements for access rights to certain data areas are also important metadata.

For stand-alone access to data in the storage of data, an end user-friendly navigation component is required, which is also based on metadata. For example, this provides functions for free querying, navigation, electronic distribution of reports and access to data in operational feeds.

Data Warehouse Architecture FAQS

What is data warehouse architecture?

The data warehouse architecture can be defined as the way data is collected within an enterprise or business. The architecture makes it easier for those in charge of the corresponding areas to find all the information by levels.

What is Enterprise Data Warehouse Architecture?

An enterprise data warehouse is the place where all the information of a particular company is going to be deposited. In it we can see all the information of origin. An example of this is what Google offers us with analytics or the CRM itself as sales force. The Enterprise Data Warehouse Architecture will allow us to see all the information dispersed in one place, in one platform in an easy and mostly fast way.