Knowledge Hub
Deep dives into data engineering, governance patterns, cloud architecture, and practical tutorials to level up your data stack.
Stay ahead of the curve
Get notified when we publish new insights on data engineering, governance, and cloud architecture.
Apache Airflow Tutorial: Build Production DAGs
Step-by-step Apache Airflow tutorial with runnable DAGs, TaskFlow API examples, scheduling patterns, and production pitfalls to avoid.
Python for Data Engineering: The Practical Toolkit
The Python libraries, patterns, and practices that separate production data engineering from scripts — with runnable code examples for ETL, API ingestion, and testing.
SQL Window Functions Tutorial: Rank, Aggregate, Compare
Learn SQL window functions with runnable examples — rankings, running totals, LAG/LEAD, and common pitfalls across PostgreSQL, Spark SQL, and BigQuery.
Apache Spark Tutorial: From Zero to Your First Data Pipeline
A hands-on Apache Spark tutorial covering core concepts, PySpark DataFrames, transformations, and real-world pipeline patterns for data engineers.
DuckDB Tutorial: Analytical SQL Directly in Your Browser
Get started with DuckDB in 15 minutes. Learn read_parquet, read_csv_auto, PIVOT, and when DuckDB beats SQLite and PostgreSQL for analytical SQL.
Building a REST API Data Pipeline in Python
A step-by-step guide to building a production-grade REST API data pipeline in Python. Covers authentication, pagination, rate limits, schema validation, and common pitfalls with real runnable code.
Excel to SQL: A Migration Guide for Business Analysts
Complete guide to Excel to SQL migration for business analysts. 25-row concept mapping table, SQL code examples, common pitfalls, and tips for making the switch stick.