The words of cloud. Kodlot’s Data in Cloud Glossary.

Kodlot Glossary Cloud, data engineering, analytics platforms
Kodlot’s glossary

Here’s a glossary of important terms related to data and cloud engineering.


AB Testing

A method used to compare two or more versions of a webpage or application to determine which performs better and yields higher user engagement or conversion rates.

Agile Development

An iterative approach to software development that emphasizes collaboration, adaptability, and quick delivery of working software.

AI (Artificial Intelligence)

The simulation of human intelligence in machines that can perform tasks requiring human-like cognitive abilities.


The process of analyzing and interpreting data to gain insights and inform decision-making.

API (Application Programming Interface)

A set of rules and protocols that allow different software applications to communicate and interact with each other.

Artificial Neural Network

A network of interconnected nodes (neurons) that mimics the structure and function of the human brain used in machine learning and deep learning.

Auto Scaling

A feature provided by cloud providers that automatically adjusts the number of resources allocated to an application based on demand, ensuring optimal performance and cost efficiency.

AWS (Amazon Web Services)

Amazon provides a comprehensive cloud computing platform, offering a wide range of cloud services and infrastructure.


Microsoft operates a cloud computing platform that enables users to access, manage, and develop applications and services globally through their network of data centers.


Big Data

Extremely large and complex data sets that require specialized processing techniques to extract value and insights.


A decentralized and distributed digital ledger that records transactions across multiple computers, ensuring transparency and security.

Business Intelligence (BI)

The process of collecting, analyzing, and presenting data to help businesses make informed decisions and gain insights into their operations and market trends.


Cloud Computing

The delivery of on-demand computing resources over the internet, including servers, storage, databases, and software applications.

Cloud Migration

The process of moving applications, data, and other business elements from on-premises infrastructure to cloud-based infrastructure.


An approach to software development that leverages cloud technologies and services from the ground up, allowing applications to be designed, built, and operated specifically for the cloud environment.

Continuous Integration/Continuous Delivery (CI/CD)

A software development practice that involves frequently integrating code changes and delivering them to production environments.


Data Catalog

A centralized repository or tool that provides a comprehensive inventory and metadata information about the available datasets within an organization, enabling easier data discovery and understanding.

Data Cleansing

Identifying and correcting or removing errors, inconsistencies, and inaccuracies in data.

Data Governance

The management and control of data assets, including data quality, data privacy, and data security.

Data Integration

Combining data from different sources and systems into a unified view for analysis and decision-making.

Data Lake

A centralized repository that stores raw and unprocessed data in its native format, allowing for flexible and exploratory analysis.

Data Lakehouse

A hybrid data storage architecture that combines the flexibility and scalability of data lakes with the structure and query capabilities of data warehouses, enabling unified data analytics.

Data Mart

A subset of a data warehouse that focuses on specific business functions or departments, providing a more targeted and optimized view of data.

Data Masking

The process of obfuscating sensitive data in non-production environments to protect confidentiality while preserving the data’s characteristics and usability for development, testing, or analytics purposes.

Data Mining

Discovering patterns, correlations, and insights from large datasets using statistical and machine learning techniques.

Data Pipeline

A series of processes and tools that extract, transform, and load (ETL) data from various sources into a target destination for analysis.

Data Privacy

The protection of sensitive and personally identifiable information to ensure compliance with privacy regulations and maintain the confidentiality of individuals’ data.

Data Quality

The assessment and improvement of data to ensure accuracy, consistency, completeness, and validity for reliable analysis and decision-making.

Data Replication

The process of copying and synchronizing data from one source to one or more target systems ensures data consistency and availability across different locations or databases.

Data Replication

The process of copying and synchronizing data from one source to one or more target systems ensures data consistency and availability across different locations or databases.

Data Science

The interdisciplinary field that combines scientific methods, algorithms, and systems to extract knowledge and insights from data.

Data Visualization

The representation of data in graphical or visual formats to facilitate understanding, analysis, and communication.

Deep Learning

A subset of machine learning that uses artificial neural networks to model and solve complex problems.


A set of practices that combines software development (Dev) and IT operations (Ops) to streamline the software development lifecycle, improve collaboration, and achieve faster and more reliable releases.

Disaster Recovery (DR)

The set of strategies, policies, and procedures implemented to recover and restore IT infrastructure and operations after a natural or human-induced disaster.


An open-source platform that enables developers to package and distribute applications and their dependencies as containers.



A system or infrastructure can automatically scale resources up or down based on demand.

ELT (Extract, Load, Transform)

A data integration approach where data is first extracted from source systems, loaded into a target environment, and then transformed for analysis and reporting.

ETL (Extract, Transform, Load)

A traditional data integration approach where data is first extracted from source systems transformed into a consistent format and then loaded into a target system for analysis and reporting.


Fault Tolerance

The ability of a system to continue operating correctly in the event of failures or disruptions.


Google Cloud Provider (GCP)

A robust, high-performance, efficient infrastructure is provided for cloud computing, data analytics, and machine learning, delivering secure, reliable, and high-performance cloud services.



An open-source framework for distributed storage and processing of large datasets across clusters of computers.

High Availability

The design and implement systems or architectures that are resistant to failures and ensure continuous operation even during outages or disruptions.

Hybrid Cloud

A cloud computing environment that combines a private cloud and public cloud allows organizations to leverage the benefits of both while maintaining control over sensitive data.


IaaS (Infrastructure as a Service):

A cloud computing model that provides virtual machines, storage, and networking infrastructure as scalable and on-demand services.

Immutable Infrastructure

An infrastructure approach where components, such as servers or containers, are never modified after deployment, but instead, new instances are created with updates or changes.

Internet of Things (IoT)

The network of physical devices, vehicles, appliances, and other objects is embedded with sensors, software, and connectivity to exchange data and interact.



An open-source container orchestration platform for automating containerized applications’ deployment, scaling, and management.


An open-source container orchestration platform that automates the deployment, scaling, and management of containerized applications, simplifying the management of complex container environments.


Load Balancing

The distribution of network traffic across multiple servers or resources to optimize resource utilization, improve performance, and ensure high availability and scalability.


Machine Learning

A branch of artificial intelligence that focuses on developing algorithms and models that enable computers to learn and make predictions or decisions based on data.

Machine-to-Machine (M2M)

The direct communication and interaction between devices or systems without human intervention enables data exchange and automation.


An architectural approach where applications are built as a collection of small, independent, and loosely coupled services that can be developed, deployed, and scaled independently.


The strategy of using multiple cloud service providers to distribute workloads and leverage different cloud offerings, reducing vendor lock-in and increasing flexibility.


A software architecture that allows a single application or system to serve multiple customers or tenants, keeping their data and configurations isolated and secure.


Natural Language Processing (NLP)

A branch of artificial intelligence that focuses on the interaction between computers and human language, enabling machines to understand, interpret, and generate human language.

NoSQL (Not Only SQL)

A non-relational database approach that provides flexible data models and scalable storage solutions for handling unstructured and semi-structured data.



The ability to gain insights into a system’s internal states and behaviors by collecting and analyzing data from various sources, facilitating system monitoring, troubleshooting, and performance optimization.

Object Storage

A data storage architecture that manages data as objects, each with its unique identifier, metadata, and content, providing scalability and durability for unstructured data.


Coordinating and managing multiple systems, processes, or services to achieve a desired outcome or workflow.


Predictive Analytics

Statistical models and algorithms are used to analyze historical data and make predictions or forecasts about future events or trends.

Pipeline Orchestration

Coordinating and managing data pipelines, including scheduling, sequencing, and monitoring data processing tasks and workflows.

Public Cloud

A cloud computing model where cloud services and infrastructure are provided by third-party providers and made available to the general public over the internet.


Real-time Analytics

Analyzing data as it is generated or received enables immediate insights and decision-making.

Real-time Data

Data that is generated, processed, and made available for analysis immediately or with minimal delay, enabling instant insights and responses to changing conditions.


SaaS (Software as a Service)

A cloud computing model where software applications are delivered over the internet as a service, eliminating the need for local installation and maintenance.


A system or infrastructure can handle increasing workload or demand by adding or removing resources dynamically.

Serverless Architecture

An architectural paradigm where applications are built using serverless functions, allowing developers to focus on writing code without managing servers or infrastructure.

Serverless Computing

A cloud computing execution model where cloud providers dynamically allocate resources to run code in response to events or triggers without the need for provisioning or managing servers.

Serverless Functions

Units of code that are executed in a serverless computing environment, triggered by specific events or requests, and billed based on actual usage, providing a scalable and cost-efficient execution model.

SQL (Structured Query Language):

A programming language used to communicate with and manipulate relational databases.

Streaming Analytics

The real-time analysis of streaming data allows organizations to gain insights, detect patterns, and trigger actions based on the continuous data flow.

Streaming Data

Continuous and real-time data flows in high volumes from various sources, often requiring real-time processing and analysis.



An open-source infrastructure as a code tool used to define and provision cloud resources and infrastructure in a declarative manner.


Virtual Machine (VM)

An emulation of a computer system that runs on a host machine, enabling multiple operating systems to run concurrently.

Virtual Private Cloud (VPC)

A logically isolated section of a cloud provider’s infrastructure that allows users to define and control their own virtual network environment, including subnets, IP ranges, and routing.


The creation of virtual instances or representations of computing resources, such as servers, storage, or networks, to enable efficient resource utilization and isolation.


Workflow Automation

The use of technology to automate and streamline manual or repetitive business processes, reducing human intervention and errors and improving efficiency.

Workflow Automation

The use of technology to automate and streamline repetitive tasks, processes, or workflows, increasing efficiency and reducing errors.


XaaS (Anything as a Service)

An umbrella term that refers to the delivery of various services over the internet, where “X” can represent different types of services such as Software as a Service (SaaS), Platform as a Service (PaaS), or Infrastructure as a Service (IaaS).


YAML (YAML Ain’t Markup Language)

A human-readable data serialization format is used for configuration files in Infrastructure as Code and other applications.


Zero Downtime Deployment

A deployment strategy that ensures continuous availability of an application or system during software updates or deployments, avoiding any service interruptions.

Remember, these definitions are meant to provide a basic understanding of the terms. Their precise definitions and applications may vary depending on specific contexts and technologies.

Here you can find more interesting articles.

Get some more tips on how to grant read access to all tables in AWS Glue Data Catalog.