Real-time analysis of machine data, sensor values, social media discussions Relevant, cost-efficient and flexible data via the cloud, stored for years, in cross-format and clearly arranged in a uniform solution: IT departments, businesses, developers and large data specialists are looking for the “one-stop-shop” that combines IT landscapes, software platforms, applications and internal and external information. This sounds like a utopia, but thanks to digital networks and new technologies, we are constantly approaching it. But does this make the data warehouse superfluous?
On the one hand, a real-time platform that includes all data and IT solutions sounds unrealistic from today’s perspective. On the other hand, it would not replace a data warehouse (DWH), as both have completely different tasks to perform. The only thing that is certain is that, as a result of digitization and the Internet of Things, DWH and Big Data technologies are getting closer and closer.
Over the years, the demands on a data warehouse have hardly changed: It is still used as the central point of contact for all company information to prepare and analyze the relevant data. In contrast, the processing speed and the underlying data volume have increased, and both will continue to grow in the future.
The modern architecture of the data warehouse
An old accusation against DWH is that decisions are based on 90 percent qualitative data and only 10 percent quantitative information. However, a data warehouse consists of a collection of quantitative data. To refute this accusation, software companies and developers in the area of large data and data warehousing are currently working on better coupling qualitative information with the data warehouse.
Many consider the concept of the Data Lake architecture to be an accumulation of unstructured data, in which users first store all information in a Hadoop group for further processing. Data scientists then evaluate this information for relevance and potential. On the other hand, standard reporting and analysis still requires a data warehouse, especially when current trade figures need to be compared with historical data or proven quality criteria.
The question of relevance
There is already a fundamental difference between large data projects and data storage projects in the respective issue: In the latter, for example, experts have been able to coordinate with clients over the years which key performance indicators and results are relevant and should be included in the reports. However, in large data projects such clear specifications or results are missing because the questions to the project cannot even be clearly formulated in advance. In this context, unstructured data in particular pose the greatest challenge.
Data experts like mip GmbH also observe that companies want to test data lakes and new technologies without knowing exactly what is possible with the technology or what they should be looking for. This is often where inexperienced employees or students are deployed. However, the more unstructured the information becomes, the more difficult it becomes to derive reasonable and relevant objectives or questions from it.
The best results can be achieved here with a well-tested team of departmental experts and data specialists who are familiar with the company itself, on the one hand, and with new technologies and their possibilities, on the other. A data scientist may be able to use stochastic tools, but not necessarily understand the goals and processes that are important to a company.
Aids and tools such as advanced analytical solutions or, for example, adaptive computers such as the Watson developed by IBM can help in the search for relevant questions.
Finding the right balance between historical data and its connection to a Big Data or Data Lake environment
Recognizing valuable patterns
In addition to the right question, information from new sources, for example, from mobile applications, plays a decisive role in almost all areas. So far, there are no generally applicable rules for these. The experts at Big Data and DWH are working on obtaining the unstructured data from the sources in order to analyse them and detect the usable patterns contained in them.
In the next step, the data must be prepared for the data warehouse and transferred there. This is because patterns can only be identified, differentiated and evaluated if appropriate benchmarks are available and progress and trends can be compared. This is why historical data is still necessary.
In contrast, operational systems contain only limited historical data. This part of the IT infrastructure is mainly designed to process individual transactions one after the other. The transactions themselves are volatile. It is not the task of operational systems to collect or compare data over a longer period of time. This is again one of the main tasks of DWH.
Today, AI-controlled systems like Watson or learning robots are already in the spotlight of both the media and the industry. But to develop cognitive skills, these computers or machines have to learn first. In a cognitive process, specific patterns, systematics or profiles are compared and categorized with the information already stored. Only in this way can relationships between results be established, procedures optimized and new skills developed. Therefore, learning always requires a comparison with the past.
Wide range of applications
DWHs have a wide range of applications throughout the industry. In dynamic industries such as retail, consumer behavior can change overnight. To better predict these rapid fluctuations with predictive analytics tools, it is necessary to compare current results with past data profiles to identify purchase patterns.
In the fashion industry, for example, while it is difficult to predict changes in appearance or new color preferences, trends can be identified and predicted over the years based on the number of pieces sold or the selection of preferred fabrics.
In production, in turn, there are quality characteristics and criteria already established for the value of products or processes. They are also based on data from the past. The quality of products or manufacturing processes can only be validated and ultimately optimised if the current sensor or record data being reported is compared and evaluated with historical data records over a certain period of time.
Another application case basically lies in the tax law environment in which all companies operate: As a result, these companies are forced to document their data over a longer period of time.
Thus, DWH continues to exist as an essential part of the company’s success, according to a survey of data experts: 99 percent stated that data storage is important or indispensable to their business processes . For this reason, leading data experts such as mip GmbH advise companies to set up a high-performance central data warehouse, preferably together with specialists, and to prepare the acquired data in a targeted manner in order to create the prerequisites for modern and innovative technologies, tools and applications.
1] Dimensional Research: “The State of the Data Warehouse”, 2015.
Real Time Data Warehouse
What is near real time data warehouse?
Near Real-Time Data Warehouses: A Primer Take data warehousing technology… Today, near real-time data warehouse technology is available that updates the data warehouse far more frequently– in close to real time—so that users can respond to issues as they occur.
Why it is important for an airline to use a real time data warehouse?
It is important for an airline to use a real-time data warehouse because they need to know, essentially, the four W’s; Who, What, When, and Where. The implementation of real-time data allows each part of the airlines system to track a customer through each step of their travels, and past data.