Data Management Glossary nnn
-
A
- Active Storage
- Adaptive Data Management
- AI Agents
- AI and Corporate Data
- AI Compute
- AI Data Extraction
- AI Data Governance
- AI Data Ingestion
- AI Data Leakage
- AI Data Management
- AI Data Pipelines
- AI Data Preparation
- AI Data Workflows
- AI Inferencing
- AI Infrastructure
- Air Gap
- Alternate Data Streams (ADS)
- Amazon (AWS) S3 Intelligent Tiering
- Amazon FSx
- Amazon Glacier (AWS Glacier)
- Amazon S3 (AWS S3)
- Amazon S3 Glacier Instant Retrieval
- Amazon Tiering
- Analytics-driven Data Management
- Application Programming Interface (API)
- Archival Storage
- Archiving
- Artificial Intelligence (AI)
- AWS DataSync
- AWS Lambda
- AWS Snowball
- AWS Storage
- Azure Data Box
- Azure NetApp Files
- Azure Storage
- Azure Tiering
-
C
- Capacity Planning
- Carbon footprint
- Carbon Usage Effectiveness
- Chain of Custody
- Chargeback
- Checksum
- Cloud Archiving
- Cloud Cost Optimization
- Cloud Costs
- Cloud Data Analytics
- Cloud Data Growth Analytics
- Cloud Data Management
- Cloud Data Migration
- Cloud Data Storage
- Cloud File Storage
- Cloud Migration
- Cloud NAS
- Cloud Object Storage
- Cloud Storage Gateway
- Cloud Tiering
- CloudPools
- Cold Data
- Common Internet File System (CIFS)
- Compression
-
D
- Dark Data
- Data Analytics
- Data Archiving
- Data Backup
- Data Center Consolidation
- Data Center Emissions
- Data Classification
- Data Curation
- Data Governance
- Data Hoarding
- Data Indexing
- Data Lake
- Data Lakehouse
- Data Lifecycle Management
- Data Lineage
- Data Literacy
- Data Management
- Data Management for AI
- Data Management Policy
- Data Migration
- Data Migration Chain of Custody
- Data Migration Plan
- Data Migration Software
- Data Migration Warm Cutover
- Data Mobilization
- Data Orchestration
- Data Protection
- Data Retention
- Data Retrieval
- Data Services
- Data Silos
- Data Sprawl
- Data Storage
- Data Storage Costs
- Data Storage Management Services (DSMS)
- Data Storage Optimization
- Data Storage Tags
- Data Tagging
- Data Tiering
- Data Transfer
- Data Virtualization
- Deduplication
- Deep Analytics
- Dell PowerScale
- Dell PowerScale SmartPools
- Department Showback
- Digital Business
- Digital Pathology Data Management
- Direct Data Access
- Director (Komprise Director)
- Disaster Recovery
- Dynamic Data Analytics
- Dynamic Links
-
E
-
F
-
G
-
H
-
I
-
K
-
M
-
N
-
O
-
P
-
R
-
S
- S3
- S3 Data Migration
- S3 Intelligent Tiering
- Scale-Out Grid
- Scale-Out Storage
- Secondary Storage
- Sensitive Data Detection
- Shadow AI
- Shadow IT
- Sharding
- Shared-Nothing Architecture
- Showback
- Smart Data Workflows
- SmartPools
- SMB Data Migration
- SMB protocol (Server Message Block)
- Solid State Drives (SSDs)
- Storage Area Network (SAN)
- Storage Array
- Storage as a Service
- Storage as a Service (STaaS)
- Storage Assessment
- Storage Costs
- Storage Efficiency
- Storage Insights
- Storage Metrics
- Storage Pool
- Storage Reclamation
- Storage Refresh
- Storage Tiering
- Stubs
- Sustainable Data Management
- Symbolic Link
- System Metadata
-
U
- Unstructured Data
- Unstructured Data AI
- Unstructured Data Analytics
- Unstructured Data Classification
- Unstructured Data Governance
- Unstructured Data Management
- Unstructured Data Migration
- Unstructured Data Preparation
- Unstructured Data Storage
- Unstructured Data Tiering
- Unstructured Data Workflows
- Unstructured Metadata
Data Lake
A data lake is data stored in its natural state. The term typically refers to unstructured data that is sitting on different storage environments and clouds. The data lake supports data of all types – for example, you may have videos, blogs, log files, seismic files and genomics data in a single data lake. You can think of each of your Network Attached Storage (NAS) devices as a data lake.
One big challenge with data lakes is to comb through them and find the relevant data you need. With unstructured data, you may have billions of files strewn across different data lakes, and finding data that fits specific criteria can be like finding a needle in a haystack
A virtual data lake is a collection of data that fits certain criteria – and as the name implies, it is virtual because the data is not moved. The data continues to reside in its original location, but the virtual data lake gives a discrete handle to manipulate that entire data set. The Komprise Global File Index can be considered to be a virtual data lake for file and object metadata.
Some key aspects of data lakes – both physical and virtual:
- Data Lakes Support a Variety of Data Formats: Data lakes are not restricted to data of any particular type.
- Data Lakes Retain All Data: Even if you do a search and find some data that does not fit your criteria, the data is not deleted from the data lake. A virtual data lake provides a discrete handle to the subset of data across different storage silos that fits specific criteria, but nothing is moved or deleted.
- Virtual Data Lakes Do Not Physically Move Data: Virtual data lakes do not physically move the data, but provide a virtual aggregation of all data that fits certain criteria. Deep Analytics can be used to specify criteria.
Related Terms
Getting Started with Komprise:
- Learn about Intelligent Data Management
- Schedule a demonstration with our team
- Read the latest State of Unstructured Data Management Report
