Data Engineering

Advance Your SQL Skills with dbt for Data Engineering

Managing SQL code at scale is one of the biggest challenges in data engineering. As data teams grow and pipelines become more complex, traditional approaches to SQL development quickly become unwieldy. This LinkedIn Learning course explores how dbt (data build tool) transforms the way we think about SQL development, bringing software engineering best practices to analytics engineering. Course Approach Real-World Problem Solving: Each chapter presents actual situations and challenges that data engineers face, with focused code examples showing practical solutions. ...

The Future in Tech: Data Engineering Powers AI Revolution

Originally streamed live on August 3, 2023 - LinkedIn Learning’s “The Future in Tech” series Data engineering is the unsung hero fueling the rapid growth and consumption of artificial intelligence. It transforms AI’s potential into reality, driving digital innovation and reshaping the world. In this comprehensive discussion, we explore how data engineering unlocks and enables democratized use of Artificial Intelligence. Video: The Future in Tech - Data Engineering and AI Discussion (1,668 views) ...

Hands-On Introduction: Data Engineering

In this course, instructor Vinoo Ganesh gives you an overview of the fundamental skills you need to become a data engineer. Learn how to solve complex data problems in a scalable, concrete way. Explore the core principles of the data engineer toolkit—including ELT, OLTP/OLAP, orchestration, DAGs, and more—as well as how to set up a local Apache Airflow deployment and full-scale data engineering ETL pipeline. Along the way, Vinoo helps you boost your technical skill set using real-world, hands-on scenarios. ...

The Efficiently Guide to Snowflake (Top Down)

Originally published on Efficiently (Substack) The majority of my career has been focused on making data systems more efficient — whether that means performance, scalability, or cost. This series aims to democratize knowledge about how to Efficiently operationalize data. TLDR 4 changes you can make right now to run Snowflake more Efficiently: File a Snowflake support ticket and request access to the GET_QUERY_STATS function ALTER WAREHOUSE <warehouseName> SET AUTO_SUSPEND = 60; For multi-cluster warehouses: ALTER WAREHOUSE <warehouseName> SET MIN_CLUSTER_COUNT = 1; ALTER WAREHOUSE <warehouseName> SET SCALING_POLICY = ECONOMY; ALTER WAREHOUSE <warehouseName> SET STATEMENT_TIMEOUT_IN_SECONDS=36000 Snowflake + Driving Snowflake optimization resembles efficient driving. There are four parallel constraints: ...

Hands-On: Predicate Pushdown

Originally published on Efficiently (Substack) We’ve spoken a lot about on-disk and distributed storage, as well as blocks. All of this theory is great, let’s talk about this in practice. In this post, I’m going to: Read a CSV dataset into Spark Write the dataset into 5 Parquet files (treating each file as a block) Introspect metadata existing on the files Run queries demonstrating predicate pushdown power Hands-On: Setup The tutorial uses an airports dataset. Download it via: ...