Big Data, what is that? What does it involve, what techniques are used and what is it for? The term Big Data was introduced in 2001 by Douglas Laney, the analyst of the Meta Group – now Gartner – at the time. He used it to describe what was happening in companies around the world at the time: the generation of data from increasingly large companies and customers from increasingly diverse sources. Laney defined “large data” as “data with a large volume, a wide variety of data formats, and a high speed (rate) at which new data is created.
In principle, Big Data today is not about creating aggregated summary tables from data as in the past, but rather it is about embedding the individual process down to the document level and thus recognizing the patterns that then “point the way forward. A number of technical terms mark the essence, processing and use of Big Data. In our glossary we explain the most important ones:
- Big Data: Glossary of Terms
- Ad Targeting
- Automatic Identification and Capture (AIDC)
- Behavior Analysis
- Business Intelligence (BI)
- Call Detail Record (CDR) analyze
- Clickstream Analytics
- Competitive Monitoring
- Complex Event Processing (CEP)
- Data Aggregation
- Data Analytics
- Data Architecture and design
- Data Exhaust
- Data Virtualization
- Distributed object
- Distributed Processing
- In-Database analytics
- In-memory database
- In-Memory Data Grid (IMDG)
- Machine-generated data
- Operational Data Store (ODS)
- Pattern recognition
- Predictive Analysis
- Recommendation Engine
- Risk analysis
- Sentiment Analysis
- Variable Pricing
- Parallel Data Analysis
- Query Anal
- Reference Data
Big Data: Glossary of Terms
The terms around Big Data…
Big Data – what is that really? Everybody talks about it, everybody understands something different. Look at our glossary with the most important and most used terms (some say “buzzwords”) and understand exactly what it means.
The attempt to attract the attention of potential customers, usually through “tailor-made” advertising.
A mathematical formula cast in software with which a data set is analyzed.
Software based algorithms and statistical methods are used to interpret the data. This requires an analytical platform composed of software or software plus hardware that provides the tools and computing power to perform various analytical queries. There are a number of different forms and purposes, which are described in more detail in this glossary.
Automatic Identification and Capture (AIDC)
Any method of automatically identifying and collecting data about a given situation and then storing it in a computer system. For example, information from an RFID chip that is read by a scanner.
Behavior Analysis uses information about human behavior to understand intentions and predict future behavior.
Business Intelligence (BI)
The general expression for the identification, origin and analysis of data.
Call Detail Record (CDR) analyze
It contains data that telecommunication companies collect on the use of mobile phone calls, such as time and duration of calls.
A distributed database management system for very large structured databases (“NoSQL” database system) based on open source (Apache)
It refers to the analysis of a user’s web activities by evaluating their clicks on a website.
Tables in which the contest activities are automatically stored on the web.
Complex Event Processing (CEP)
A process in which all activities in an organization’s systems are monitored and analyzed. If necessary, a real-time response can be given immediately.
The collection of data from different sources in order to prepare a report or for analysis.
A piece of software used to extract information from a data set. The result can be a report, a status or an action that is automatically initiated.
Data Architecture and design
It explains how the company’s data is structured. Usually this is done in three process steps: The conceptual assignment of the business units, the logical assignment of the relationships within the business unit, and the physical construction of a system that supports the activities.
The data that a person generates “next door” during his or her activity on the Internet.
The process of abstracting different data sources through a single data access layer.
A software that allows you to collaborate with distributed objects on another computer.
The removal of all data that associate a person with a certain information.
The execution of a process through different networked computers.
Apache Drill is an open source SQL search engine for Hadoop and NoSQL data management systems.
A free framework written in Java by the Apache Foundation for scalable and distributed software in a cluster. It is based on the well-known MapReduce algorithm from Google Inc. as well as suggestions from the Google file system.
SAP software and hardware platform with in-memory computing for real-time analysis and high transaction volumes.
Database Analytics refers to the integration of analysis methods into the database. The advantage is that the data do not have to be moved for analysis.
Any database system that uses the main memory for data storage
In-Memory Data Grid (IMDG)
Distributed data storage in the main memory of many servers for fast access and better scalability.
All data that is automatically generated by a calculation process, an application or a non-human source.
A method in which a large problem is divided into smaller ones and distributed to different computers in a network or group or to a network of different computers in different places (“map”) for processing. The results are collected and presented in a (reduced) report. Google has protected its process under the brand name “MapReduce”.
The different data sets are combined within an application in such a way that the result is improved.
Databases that are not structured in a relational way and that can handle large volumes of data they do not require fixed table layouts and scale horizontally. For example, Apache Cassandra is a NoSQL.
Operational Data Store (ODS)
It collects data from various sources so that more operations can be performed before the data is exported to a data warehouse.
The classification of automatically recognized patterns.
This form of analysis uses statistical functions in one or more data sets to predict future trends or events.
An algorithm is used to analyze customer orders on a website and immediately select and offer additional suitable products.
The application of statistical methods to one or more data sets in order to assess the risk of a project, action or decision.
In this process, people’s posts on social networks about a product or a company are statically evaluated.
The purchase price of a product follows supply and demand. This requires real-time monitoring of consumption and stock levels.
Parallel Data Analysis
An analytical problem is divided into sub-tasks and the algorithms are applied to each component of the problem simultaneously and in parallel.
In this process, a search query is optimized to obtain the best possible result.
Data that describes a physical or virtually existing object and its properties
You might also be interested: