If you want to successfully master Big Data, you have to assemble the right mix of tools from various toolboxes. There is no one solution that will solve every problem.
It’s not a matter of replacing BI suites and data warehouses with new large data tools, but rather of integrating new techniques into existing systems in a meaningful way.
Successfully Mastering Large Data
It is true that data volumes are growing and will continue to grow in the future, and word has probably already spread to all companies. However, equating the dominance of the data avalanche with Big Data is not enough. The subject has many different facets. This is exactly what makes it anything but banal and easy for user companies to tackle the big data phenomenon. The following aspects play together:
In addition to the large volume of data, the number of information sources companies must monitor is also growing. It is no longer just the classic transactional systems from which data is entered into companies. Today, it is much more important to channel machine data and information from social networks correctly.
As the number of data sources increases, so does the variety of data. In addition to structured transaction data, which can be classically captured in relational database systems, there is little or no structured data such as text, images and videos. To analyze, manage and process these types of data in a meaningful way, new paths must be taken.
At the same time, data and information must be made available to an increasing number of users. This not only affects the employees of the company itself, but the entire value chain, from suppliers to customers. Therefore, not only the number of data sources is growing, but also the number of data consumers.
The different data sources, the different types and the increasing distribution of information pose new challenges for data protection. In addition, the increasingly complex information infrastructures are at risk of error and manipulation. Therefore, the importance of data integrity and quality continues to grow.
But the complexity surrounding Big Data doesn’t stop there. The landscape of offerings and solutions is as complex and opaque as the challenges posed by the flood of data. With the spread of the Big Data concept, a confusing vendor landscape has developed, say the analysts at Experton Group. Complex packages as well as individual modules appear on the market as large data solutions. In addition, there are suppliers who combine existing third-party products with their own solutions. It is increasingly difficult to maintain an overview here.
Wind turbine data
From the analysts’ point of view, the issue is also complicated by the fact that many suppliers based their communication on examples of theoretical application. Concrete references are a rarity in this still young market. Where they exist, they are often very specific and difficult to transfer to other companies. IBM’s Big Data sample project at Danish wind turbine manufacturer Vestas, which examines up to 160 different factors, and therefore petabyte size data, for the selection of the correct location, is an example of this.
The same is true for SAP’s “Oncolyzer”, which is designed to evaluate a wide variety of medical data according to HANA’s in-memory database in the shortest possible time, thus enabling individualized cancer therapy. In view of these individual cases, it remains difficult for other companies to find the right answer to their own large data problem.
The Big Five
The analysts have defined five different thematic areas that users should take into account when performing their searches:
- Large data infrastructure: data storage solutions, data and database connectivity, devices, computer hardware.
- Big Data Aggregation: Bringing together data from different sources, integration, data security, integrity and quality
- Big Data Analytics: Business Intelligence Solutions, Data Warehouse, Advanced Analysis
- Big Data Syndication: Visualization and delivery of results too many users, concepts such as Linked Open Data
- Big Data Consulting and Services: Consulting and Services
The challenges in terms of technology begin with the infrastructure. Three-quarters of all IT decision makers see the need to take action to address their storage and database systems. In contrast, only half of respondents felt there was an impact on analysis and reporting.
The DB market is booming
The demand on the infrastructure side includes database manufacturers. For a long time conditions in this market seemed clear. Relational database management systems (RDBMS) were established in the user companies. Complaints had been divided among the three main suppliers: Oracle, IBM and Microsoft. But for some time there had been rumors. In the wake of Big Data, classic systems are reaching their limits.
Discussions are getting stronger about what the future of databases might look like. Techniques like NoSQL, in-memory and Hadoop are attracting more attention.
SQL or NoSQL
Especially with the increasing flood of unstructured data, which can hardly be pressed into the network of a relational database, the interest in NoSQL systems is growing. The abbreviation means “Not only SQL”, so it is not primarily intended as a substitute for relational systems, but rather as a complement.
While conventional databases are based on tables and relations, NoSQL databases can use different data models. However, this also means that NoSQL is not just NoSQL. The different variants have both strengths and weaknesses, so it is important to check carefully whether the individual application scenario matches the respective NoSQL database.
Node by Node
The architecture is often based on many standard, interconnected servers. Scaling is achieved by simply adding more computing nodes. An outstanding example of this is Hadoop. The framework essentially consists of two parts: The Hadoop Distributed File System (HDFS) distributes the data to the different nodes. There the data is processed by the MapReduce algorithm developed by Google. The basic idea behind this: Break down the calculation tasks into many small sub-tasks and distribute them in a cluster.
This parallelization and the fact that the data is processed in its storage place should ensure that the results are available much faster. Today, Hadoop seems to be able to establish itself increasingly in the database industry. Vendors such as Cloudera and Intel are building their own open source stack deployments by supplementing the framework with additional tools. In addition, major database vendors such as Oracle, IBM, and Microsoft are now offering connectors to link their systems to Hadoop.
You might also be interested: