• Skip to primary navigation
  • Skip to main content
  • Skip to footer
Silectis

Silectis

Simplifying data engineering, accelerating insights.

  • Home
  • Platform
    • Features
    • Technology
    • Request a Demo
    • Free Trial
  • How It Works
    • Our Process
    • Customer Success
    • Case Studies
  • Partners
    • Amazon Web Services
    • Google Cloud Platform
    • Looker
  • Resources
    • Blog
    • Magpie Case Studies
    • White Papers & Reports
    • Documentation
  • About Us
    • Careers
    • Contact
  • GET A DEMO
Home » Blog » Data Engineering » Data Engineering Glossary

Data Engineering Glossary

If you’re new to data engineering or are a practitioner of a related field, such as data science, or business intelligence, we thought it might be helpful to have a handy list of commonly used terms available for you to get up to speed. This data engineering glossary is by no means exhaustive, but should provide some foundational context and information.

Advanced Analytics The process of discovering deeper insights in data than typically enabled by most business intelligence (BI) tools. Advanced analytics can be performed by employing sophisticated tools and techniques, including machine learning (ML) and artificial intelligence (AI), data/text mining, semantic analysis, sentiment analysis, network and cluster analysis, multivariate statistics, and more.
Apache AirflowApache Airflow is a platform to “programmatically author, schedule and monitor workflows.”
Artificial IntelligenceAI is a broad term used to describe engineered systems that have been taught to do a task that typically requires human intelligence.
BI (Business Intelligence)Strategies and systems used by enterprises to conduct data analysis and make pertinent business decisions.
Big DataLarge volumes of structured or unstructured data.
Big Data ProcessingIn order to extract value or insights out of big data, one must first process it using big data processing software or frameworks, such as Hadoop.
Big QueryGoogle’s cloud data warehouse.
CassandraA database built by the Apache Foundation.
Data ArchitectureData architecture is a composition of models, rules, and standards for all data systems and interactions between them.
Data CatalogAn organized inventory of data assets relying on metadata to help with data management.
Data EngineeringData engineering is a process by which data engineers make data useful. Data engineers design, build, and maintain data pipelines that transform data from a raw state to a useful one, ready for analysis or data science modeling.
Data IngestionThe process by which data is moved from one or more sources into a storage destination where it can be put into a data pipeline and transformed for later analysis or modeling.
Data IntegrationCombining data from various, disparate sources into one unified view.
Data LakeA storage repository where data is stored in its raw format. Data lakes allow for more flexibility than a more rigid data warehouse.
Data LineageData lineage describes the origin and changes to data over time
Data ManagementData management is the practice of collecting, maintaining, and utilizing data securely and effectively.
Data MigrationThe process of permanently moving data from one storage system to another. Data migration may involve transofrming data as part of the migration process.
Data MiningThe process of finding patterns, correlations, or anomalies within data sets to predict outcomes.
Data PipelineA data pipeline is a set of steps that ingest and integrate data from raw sources and move the data to a destination for analysis or data science. Data pipelines can be automated and maintained so that consumers of the data always have reliable data to work with.
Data ScienceData science is a practice that uses scientific methods, algorithms and systems to find insights within structured and unstructured data.
Data VisualizationGraphic representation of a set or sets of data.
Data WarehouseA storage system used for data analysis and reporting.
DatabaseA collection of structured data.
ETLExtract, transform, load: the three-step data integration process used to blend data from different sources.
Flat FileA type of database that stores data in a plain text format.
FlinkA big data processing tool built by the Apache Foundation, with the ability to process streaming data in real time.
Hadoop / HDFSApache’s open-source software framework for processing big data. HDFS stands for Hadoop Distributed File System.
JSONJavaScript Object Notation – a data-interchange format for storing and transporting data.
KafkaApache Kafka is the Apache Foundation’s open-source software platform for streaming.
Kubernetes / k8sOpen-source system for automating application deployment, scaling, and management of applications. Also called k8s.
Machine learning (ML)ML generally refers to algorithms built to identify patterns in big data.
MapReduceMapReduce is a component of the Hadoop framework that’s used to access big data stored within the Hadoop File System
MetadataA set of data that describes and gives information about other data.
MySQLAn open-source relational databse management system with a client-server model.
NoSQLA non-relational database
Open SourceSoftware that is available to freely use and modify
ParquetA column-oriented data storage format that’s part of the Hadoop ecosystem.
PostgreSQLA free, open-source relational database management system, also known as Postgres.
PySparkA collaboration of Apache Spark and the Python programming language
RedShiftAmazon’s cloud data warehouse
S3Amazon’s object storage (simple storage service)
SQLStructured Query Language – a domain-specific language that tells a server what to do with data.

Note: We will continue to add to the above Data Engineering Glossary over time.

Footer

  • LinkedIn
  • Twitter

1701 RHODE ISLAND AVE. NW, SECOND FLOOR, WASHINGTON, DC, 20036,
Email : INFO@SILECT.IS – Phone : (202) 899-6320
Copyright © 2023 Silectis, Inc. All Rights Reserved. Silectis® is a registered trademark of Silectis, Inc. Unauthorized use is expressly prohibited.
Apache, Apache Spark, Spark and the Spark logo are trademarks of the Apache Software Foundation.

Privacy Policy