Data Wiki

Actian Data Wiki

Commonly used terms in the data world, all in one place.

Active metadata is metadata that is automatically generated, updated, and made accessible across the data ecosystem.

Agentic AI – Autonomous AI systems that proactively perform tasks and make decisions with minimal human input.

AI governance is framework and policies to ensure responsible, ethical, and compliant use of AI systems.

AI-assisted refers to tasks, decisions, or processes that are enhanced or supported by Artificial Intelligence, where humans remain in control and make the final judgments.

A business glossary is a standard definitions for business terms to align understanding across teams.

Compliance and privacy ensure adherence to regulations like GDPR, CCPA, and HIPAA.

A data catalog is a structured inventory of data assets to improve discoverability and understanding.

A data contract is a formal agreement between data producers and consumers that defines data expectations, formats, and SLAs to ensure quality and consistency.

Data democratization means making data accessible and understandable to non-technical users.

Data fabric is a centralized data architecture to transport, store, access, and manage data across environments.

Data governance is a set of policies, processes, and roles that ensure the quality, security, and availability of an organization’s data, promoting its proper use and management throughout its lifecycle.

Data lineage refers to tracing the origin, movement, and transformation of data across systems.

Data literacy is the ability of stakeholders to read, understand, and communicate using data.

Data management is the process of collecting, storing, organizing, and maintaining analytics data in a way that ensures its accessibility, reliability, and security.

A data mesh is a decentralized data architecture focused on domain ownership.

Data monetization is turning data assets into financial value through direct or indirect means.

Data observability is monitoring the health and reliability of data pipelines and systems.

Data ownership refers to the person responsible for the overall management and governance of a specific dataset.

A data product is a curated, governed, and reusable dataset built with user needs in mind, treated as a product with clear ownership and lifecycle management.

Data profiling is analyzing data to understand its structure, content, and quality.

Data quality is measuring the accuracy, completeness, and reliability of data.

Data readiness is the state of data being clean, complete, and context-rich enough for analytics or AI use.

Data residency ensures data remains within specific geographic or regulatory boundaries.

Data sensitivity classification is tagging data by level of PII (Personally Identifiable Information) and risk.

Data sharing is the sharing of data inside and outside companies, with analytical use cases in mind.

Data sovereignty is a concept that data is subject to the laws and regulations of the nation where it is collected.

Data stewardship is the practice of overseeing an organization’s data assets to ensure they are accessible, reliable, and secure.

Data strategy is the overarching plan to manage, use, and derive value from data assets.

Data trust is confidence in data accuracy, lineage, and governance.

Data virtualization is abstracting data access without physically replicating data sources.

DataOps is applying DevOps principles to data pipelines for better agility and quality.

Enterprise Data Marketplace (EDM) is a platform for sharing and exchanging data products within an organization.

Federated data governance is a decentralized governance model where individual domains manage their data with shared standards and policies to ensure consistency, compliance, and accountability across the organization.

A federated knowledge graph is a graph where parts of the graph are isolated to specific domains, to express the domain uniquely, without forcing other domains to follow the same ontology/graph structure.

A flexible metamodel is a metamodel that is powered by a knowledge graph.

Governance by design is embedding governance controls and policies directly into data contracts.

A knowledge graph is a semi-structured database that is completely flexible in how it is organized, how it is searched, and can be visualized as a network.

A LLM (Large Language Model) is an AI model trained on large amounts of text to understand and generate human-like language.

A Master Data Management (MDM) is creating a single source of truth for key business entities.

Metadata management is the process of organizing, controlling, and using metadata (data about data) to improve data accessibility, quality, and usability, ultimately enabling beer data governance and business decision-making.

A metamodel is a “model of a model” – it defines the structure, rules, and relationships for constructing other models within a given domain.

Ontology means the related concepts within a domain. An ontology goes beyond a taxonomy by describing how the concepts relate and interact.

PII (Personally Identifiable Information) is sensitive data requiring special handling and protection.

Policy enforcement is automatically applying data usage rules and controls.

RAG (Retrieval-Augmented Generation) is an AI technique that enhances the accuracy and relevance of LLM (Large Language Model) outputs by allowing them to access and incorporate information from external knowledge sources, rather than relying solely on their pre-trained data.

A semantic layer is a business-friendly abstraction of complex data sources to enable better understanding.

Synthetic data is artificially generated data used for testing or privacy-preserving analytics.

Taxonomy is a hierarchical classification of data into categories and subcategories.