Big Data Analytics: Glossary and Terminology - 【2020】
big data definition and glossary

Big Data Analytics: Glossary and Terminology

Big Data, what is that? What does it involve, what techniques are used and what is it for? The term Big Data was introduced in 2001 by Douglas Laney, the analyst of the Meta Group – now Gartner – at the time. He used it to describe what was happening in companies around the world at the time: the generation of data from increasingly large companies and customers from increasingly diverse sources. Laney defined “large data” as “data with a large volume, a wide variety of data formats, and a high speed (rate) at which new data is created.

In principle, Big Data today is not about creating aggregated summary tables from data as in the past, but rather it is about embedding the individual process down to the document level and thus recognizing the patterns that then “point the way forward. A number of technical terms mark the essence, processing and use of Big Data. In our glossary we explain the most important ones:


Read also
Big Data Analysis in Digital Marketing Research

Big Data: Glossary of Terms

The terms around Big Data…
Big Data – what is that really? Everybody talks about it, everybody understands something different. Look at our glossary with the most important and most used terms (some say “buzzwords”) and understand exactly what it means.

Ad Targeting

The attempt to attract the attention of potential customers, usually through “tailor-made” advertising.


A mathematical formula cast in software with which a data set is analyzed.


Software based algorithms and statistical methods are used to interpret the data. This requires an analytical platform composed of software or software plus hardware that provides the tools and computing power to perform various analytical queries. There are a number of different forms and purposes, which are described in more detail in this glossary.

Automatic Identification and Capture (AIDC)

Any method of automatically identifying and collecting data about a given situation and then storing it in a computer system. For example, information from an RFID chip that is read by a scanner.

Read also
Big Data, Machine Learning, Artificial Intelligence and more terms

Behavior Analysis

Behavior Analysis uses information about human behavior to understand intentions and predict future behavior.

Business Intelligence (BI)

The general expression for the identification, origin and analysis of data.

Call Detail Record (CDR) analyze

It contains data that telecommunication companies collect on the use of mobile phone calls, such as time and duration of calls.


A distributed database management system for very large structured databases (“NoSQL” database system) based on open source (Apache)

Clickstream Analytics

It refers to the analysis of a user’s web activities by evaluating their clicks on a website.

Competitive Monitoring

Tables in which the contest activities are automatically stored on the web.

Complex Event Processing (CEP)

A process in which all activities in an organization’s systems are monitored and analyzed. If necessary, a real-time response can be given immediately.

Data Aggregation

The collection of data from different sources in order to prepare a report or for analysis.

Data Analytics

A piece of software used to extract information from a data set. The result can be a report, a status or an action that is automatically initiated.

Data Architecture and design

It explains how the company’s data is structured. Usually this is done in three process steps: The conceptual assignment of the business units, the logical assignment of the relationships within the business unit, and the physical construction of a system that supports the activities.

Read also
Big Data - Learning Path for Beginners : Part 1

Data Exhaust

The data that a person generates “next door” during his or her activity on the Internet.

Data Virtualization

The process of abstracting different data sources through a single data access layer.

Distributed object

A software that allows you to collaborate with distributed objects on another computer.


The removal of all data that associate a person with a certain information.

Distributed Processing

The execution of a process through different networked computers.


Apache Drill is an open source SQL search engine for Hadoop and NoSQL data management systems.


A free framework written in Java by the Apache Foundation for scalable and distributed software in a cluster. It is based on the well-known MapReduce algorithm from Google Inc. as well as suggestions from the Google file system.


SAP software and hardware platform with in-memory computing for real-time analysis and high transaction volumes.

In-Database analytics

Database Analytics refers to the integration of analysis methods into the database. The advantage is that the data do not have to be moved for analysis.

Read also
Business Model Innovation Through Big Data: Data Market

In-memory database

Any database system that uses the main memory for data storage

In-Memory Data Grid (IMDG)

Distributed data storage in the main memory of many servers for fast access and better scalability.

Machine-generated data

All data that is automatically generated by a calculation process, an application or a non-human source.


A method in which a large problem is divided into smaller ones and distributed to different computers in a network or group or to a network of different computers in different places (“map”) for processing. The results are collected and presented in a (reduced) report. Google has protected its process under the brand name “MapReduce”.


The different data sets are combined within an application in such a way that the result is improved.


Databases that are not structured in a relational way and that can handle large volumes of data they do not require fixed table layouts and scale horizontally. For example, Apache Cassandra is a NoSQL.

Operational Data Store (ODS)

It collects data from various sources so that more operations can be performed before the data is exported to a data warehouse.

Pattern recognition

The classification of automatically recognized patterns.

Read also
Big Data Analytics in Human Resources Management

Predictive Analysis

This form of analysis uses statistical functions in one or more data sets to predict future trends or events.

Recommendation Engine

An algorithm is used to analyze customer orders on a website and immediately select and offer additional suitable products.

Risk analysis

The application of statistical methods to one or more data sets in order to assess the risk of a project, action or decision.

Sentiment Analysis

In this process, people’s posts on social networks about a product or a company are statically evaluated.

Variable Pricing

The purchase price of a product follows supply and demand. This requires real-time monitoring of consumption and stock levels.

Parallel Data Analysis

An analytical problem is divided into sub-tasks and the algorithms are applied to each component of the problem simultaneously and in parallel.

Query Anal

In this process, a search query is optimized to obtain the best possible result.

Reference Data

Data that describes a physical or virtually existing object and its properties

You might also be interested: