Big Data, Smart Data – terms that are part of digitization, such as milk and coffee. Today, there is no economic sector that does not use data as a raw material. But what really lies behind the large amount of data? Volume, accuracy or variety are just some of the dimensions of information wealth. An attempt to explain.
Big Data is undoubtedly a part of the Industry 4.0 we know, and the data collected and evaluated there already serves as a basis for decision making. They help us review existing business processes, make adjustments to them and even show data analysts the business areas of the future. But at least they can do something with Big Data. Of course, it’s basically a huge amount of data. But behind the big data are so-called dimensions that evoke fine-grained information from unstructured data.
Definition of Big Data
A definition of general validity has so far been sought in vain. If one reads through the Internet, studies as well as technical journals and books, the following run could be the closest to the term: It is an unstructured collection of data that can no longer be domesticated with conventional IT infrastructure. Large and complex data sets are often collected, stored, searched, distributed, analyzed and displayed.
To do this, companies need special high-performance data centers that specialize in rapidly structuring these data volumes. However, this has nothing to do with artificial intelligence. Let’s take an example in terms of scale: If we take, for example, the millions of transactions that are executed every day on sales platforms like Amazon or eBay, real-time predictions of customer buying behavior generate data of around 100 petabytes.
This corresponds to 100,000,000 gigabytes or 3,125,000 smartphones with 32 gigabytes of storage space.
What is Smart data?
The term basically describes the end result after large amounts of data have been collected, sorted and analyzed. These are data sets that users often do something useful with. The general rule is: only those who understand the data can create added value. In the future, Smart Data should not only answer the question: What is happening in my plant right now? It should also answer the questions: Why is something happening or even what will happen soon?
It is only through intelligent processing that great data is converted into intelligent data. This requires the use of so-called semantic technologies. Imagine a Google search without semantics; without structured and pre-analyzed data. Not many targeted hits would be shown. In this way, Smart Data improves the speed and quality of data-supported decisions or ensures a proven accumulation of knowledge. Today, Smart Data makes it possible to digitally represent reality in detail. A good example is the key figures of e-commerce. After a simple evaluation, they show the merchant a fairly accurate picture of the customer.
What is Volume?
The data volumes mentioned above are also called data circle volumes. Of course, this also includes Smart Data. Prepared and fully analyzed data volumes can easily reach petaflop sizes. Google Maps can be mentioned as an example. Billions of analyzed data sets and the visual mapping material associated with them speak for themselves.
What is Variety in Big Data?
The diversity of available data and the actual sources of the data are described in the Variety dimension. This dimension therefore represents the greatest challenge on the road to intelligent information.
Patterns and their interrelationships are, for example, linked and compared with various other data and sources at the beginning of a large data analysis. A hint from the social media field: 30 billion posts and more different content are shared on Facebook every month. In addition, there are more than 400 million portable medical trackers and 400 million felt tweets are posted daily by 200 million active users – a diversity to be divided – to be brought back together at the end.
What is Velocity?
As a rule, the data volumes must be analyzed and made available quickly. Of course, it depends on the amount of data, but computers are programmed for special data sets.
We are talking about a processing speed of a few hours up to several days. An IBM manager once told me a few years ago: “To get the results that a high-performance computer calculates from the amount of data in a day, six billion people would have to use pocket calculators for more than 1,000 years.
What is Truthfulness?
It is not only since Donald Trump and his False News that the information has to be checked for correctness. This also includes the reliability, significance and trustworthiness of the data collected. The quality of the data is the most important thing. In the end, it also decides on the duration of the analysis and, in the end, on the mentioned correction of the intelligent information. It is important to note that an inadequate processing method will not produce the desired results, even if the data are really good.
What is Feasibility in Big Data?
Feasibility answers the question of where and what my data actually originates from. Is it information from text files, from sensors, or is it a piecemeal data set. But information from the Internet and mobile phone networks also falls within the dimension of Feasibility.
What is Visibility?
Information visibility is not always assumed today. In companies today there are billions of data records that fall under the term -Dark Data-. These data are not only not analyzed, but also not captured in terms of content or economy. For some data companies this is a real treasure.
What is Volatility?
How long can the data really be available? How long does the original source exist; how long is it accessible? To answer these questions, the predominant storage space is important, but the legal aspects must also be examined. How long can I keep the customer data or customized data? Am I allowed to do this?
What is Value in Big Data?
How valuable are the available data. Is it worthwhile to evaluate them through analysis? The following applies: Seen on its own, a piece of data or information initially has no value. It is only through analysis and relevant questions that the respective data set acquires economic or scientific importance. If the data is already valuable in the mass, experts can react to market situations with determination.
You might also be interested: