PoddsändningarTeknologiThe Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

Astronomer
The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Senaste avsnittet

101 avsnitt

  • The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

    Building Airflow CTL with Buğra Öztürk at Mollie

    2026-04-30 | 19 min.
    Buğra Öztürk, Senior Data Engineer at Mollie and Committer and PMC member on the Apache Airflow project, joins us to walk through Airflow CTL — what it is, how it differs from the existing Airflow CLI and where it is headed under AIP-94.

    Key Takeaways:

    00:00 Introduction.
    03:10 Buğra has contributed to Airflow since 2022, from docs changes up to Committer and PMC member — a path he hopes inspires others to start small and contribute.
    04:05 Airflow CTL solves secure user interaction by abstracting database credentials behind the public core API.
    05:13 Airflow CLI and Airflow CTL are complementary — CLI handles administration and database management while CTL handles secure user interactions via the API.
    07:08 Airflow CTL authenticates via the API, acquires a JWT token and stores it securely in the OS keyring — running on the user's machine and never requiring direct database access.
    08:21 Concrete use cases include local DAG development without the UI and CI/CD automation using headless mode with short-lived JWT tokens.
    10:08 AIP-94 describes the long-term vision — decoupling all remote commands from the Airflow CLI and routing them through Airflow CTL.
    13:12 Airflow CTL is currently at 0.X and already being used in CI and deployment automations. The move to 1.0 with full CLI parity is the next milestone under AIP-94.
    16:09 Multi-team deployment becoming generally available in a future Airflow release is Buğra's most-anticipated upcoming feature beyond Airflow CTL.

    Resources Mentioned:

    Buğra Öztürk
    https://www.linkedin.com/in/bugraozturk93/

    Mollie
    https://www.linkedin.com/company/mollie/

    Mollie | Website
    https://www.mollie.com/

    Apache Airflow CTL
    https://airflow.apache.org/

    AIP-94 on Airflow Confluence
    https://lists.apache.org/thread/d2o1pr78wxdp1wozq519stp0pkcv6k6c

    Apache Airflow GitHub
    https://www.github.com/apache/airflow

    Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.

    #AI #Automation #Airflow #MachineLearning
  • The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

    Introducing Airflow’s Common AI Provider with Pavan Kumar Gopidesu and Kaxil Naik

    2026-04-23 | 28 min.
    In this episode, we explore the newly released Apache Airflow common AI provider — what problem it solves, how it was built and what's coming next.

    Kaxil Naik, Senior Director of Engineering at Astronomer and Apache Airflow PMC member, and Pavan Kumar Gopidesu, Lead Data Engineer at Experian and Apache Airflow PMC member, join us to walk through the provider's first release and the technical decisions behind it.

    Key Takeaways:

    00:00 Introduction.
    04:05 The common AI provider was born from a real production problem.
    07:10 Airflow already had the primitives needed for durable agent execution, making it the natural foundation for AI orchestration.
    09:15 The LLM schema compare operator uses Apache DataFusion to fetch source schemas.
    11:07 Apache DataFusion was chosen for its speed.
    13:09 Hook tool sets expose Airflow's provider hooks to agents with an allowed methods list that blocks destructive operations.
    15:20 Passing durable=True to an LLM operator caches tool calls and LLM outputs mid-task.
    18:13 The provider offers three abstraction levels.
    21:20 The provider currently requires Airflow 3 — the team is open to adding Airflow 2.11 support if demand is high enough.
    24:10 MCP server configs can be stored as Airflow connections.

    Resources Mentioned:

    Kaxil Naik
    https://www.linkedin.com/in/kaxil/

    Pavan Kumar Gopidesu
    https://www.linkedin.com/in/pavan-kumar-gopidesu/

    Astronomer | LinkedIn
    https://www.linkedin.com/company/astronomer/

    Astronomer | Website
    https://www.astronomer.io

    Experian
    https://www.linkedin.com/company/experian/

    Apache Airflow
    https://www.linkedin.com/company/apache-airflow

    Apache Airflow common AI provider docs
    https://airflow.apache.org/docs/apache-airflow-providers-common-ai/stable/commits.html

    Apache DataFusion
    https://datafusion.apache.org/

    Pydantic AI
    https://pydantic.dev/docs/ai/overview/

    Airflow Slack
    https://airflow.apache.org/docs/apache-airflow-providers-slack/stable/index.html

    Introducing the Common AI Provider: LLM and AI Agent Support for Apache Airflow
    https://airflow.apache.org/blog/common-ai-provider/

    Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.

    #Automation #Airflow #MachineLearning
  • The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

    Building AI Debugging Agents Into Airflow DAGs at Jeppesen ForeFlight with Samantha Blaney Cuevas

    2026-04-16 | 22 min.
    Aviation data pipelines run on strict 28-day publication cycles, and the margin for error is zero. In this episode, we're joined by Samantha Blaney Cuevas, Software Engineer at Jeppesen ForeFlight, to explore how her team orchestrates a complex, time-sensitive data pipeline with Airflow and where AI is starting to fit into that picture.

    Key Takeaways:

    00:00 Introduction.
    04:05 Airflow orchestrates almost all business logic and data transformations across the cycle, with custom timetables built to track busy and slow periods programmatically.
    06:10 Cycle-aware sensing tasks handle irregular source deliveries, including duplicates and early or late arrivals, without disrupting the pipeline.
    08:07 The two main AI use cases are pipeline debugging and cycle awareness — both designed to reduce the manual overhead of monitoring a complex DAG dependency graph.
    09:03 The Data Port agent is a two-task DAG that routes Slack pipeline alerts to either a predefined command list or an AI token, depending on whether the fix is already known.
    13:10 AI is still in development at Jeppesen ForeFlight — the team is focused on token efficiency and scoping how much autonomy to give agents across different environments.
    15:04 Airflow setup and MCP configuration were straightforward — the harder design work was deciding which environments agents could access across QA staging and production.
    17:06 Airflow's skills repo and agent tooling are helping onboard new developers and extend pipeline awareness to analysts who work alongside engineers on the cycle.
    19:10 Samantha would like to see single-task retries with different parameters in Airflow — resetting one task without clearing the full pipeline run.
    21:05 A future AI use case under consideration is live DAG editing and re-upload within Airflow to make one-off fixes without halting pipeline progress.

    Resources Mentioned:

    Samantha Blaney Cuevas
    https://www.linkedin.com/in/samantha-blaney/

    Jeppesen ForeFlight | LinkedIn
    https://www.linkedin.com/company/jeppesen-foreflight/

    Jeppesen ForeFlight | Website
    http://www.foreflight.com

    Astronomer Airflow Skills Repo
    http://www.github.com/astronomer/airflow-llm-providers-demo

    Apache Airflow
    https://airflow.apache.org/

    Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.

    #AI #Automation #Airflow
  • The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

    Introducing Airflow 3.2

    2026-04-09 | 26 min.
    We introduce Airflow 3.2 and its updates for teams that build and operate data pipelines.
    Astronomer’s Head of Customer Education, Marc Lamberti, and Senior Manager of Developer Relations, Kenten Danas, break down what’s new, from asset partitioning to Async Python tasks and DAG versioning. They explore how these updates improve scheduling, performance and observability in production workflows.

    Key Takeaways:

    00:00 Introduction.
    02:10 Airflow 3 architecture separates workers from the metadata database.
    03:05 Plugin versioning and UI-based backfills simplify operations.
    06:20 Asset partitioning enables granular, partition-level scheduling.
    07:15 Triggering DAGs on partitions instead of full datasets.
    11:05 Deferrable operators reduce worker slot usage.
    12:00 Async operators reduce database pressure and overhead.
    14:10 Async improves throughput, not single task speed.
    22:20 Inlets and outlets improve asset lineage visibility.
    23:00 DAG version markers show changes directly in the UI.

    Resources Mentioned:

    Marc Lamberti
    https://www.linkedin.com/in/marclamberti/

    Apache Airflow
    https://airflow.apache.org/

    Astronomer | LinkedIn
    https://www.linkedin.com/company/astronomer/

    Astronomer | Website
    https://www.astronomer.io/

    3.2 Webinar
    https://www.astronomer.io/events/webinars/introducing-airflow-3-2-video

    Asset Partitioning Guide
    https://www.astronomer.io/docs/learn/airflow-partitioned-runs

    Asynchronous Processes Guide
    https://www.astronomer.io/docs/learn/deferrable-operators

    Release Notes
    https://airflow.apache.org/docs/apache-airflow/stable/release_notes.html#airflow-3-2-0-2026-04-07

    Provider Registry
    https://airflow.apache.org/registry/

    Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.

    #AI #Automation #Airflow #MachineLearning
  • The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

    Reflections on a Decade of Data Engineering at Seattle Data Guy

    2026-04-03 | 26 min.
    Lessons from the past decade of data engineering reveal how much the ecosystem has changed and what has stayed surprisingly consistent.

    In this episode, Benjamin Rogojan, Owner and Data Consultant at Seattle Data Guy, joins us to reflect on how the data engineering landscape has evolved alongside Apache Airflow. We explore when Airflow makes sense as an orchestrator, why batch processing is still dominant and how AI is reshaping the workflows and responsibilities of modern data engineers.

    Key Takeaways:

    00:00 Introduction.
    03:00 Airflow becomes valuable when workflows involve many pipelines, teams and dependencies.
    05:00 Data engineers are still focused on making data accessible and aligning work with business needs.
    05:30 Batch pipelines remain the most common approach even as real-time use cases grow.
    07:45 Many “real-time” requests are actually event-driven batch workflows.
    09:00 Airflow replaced many custom-built pipeline systems with built-in dependency management.
    11:00 Modern orchestration tools often build on Airflow concepts or differentiate from them.
    14:00 AI can assist with writing SQL and pipelines but still requires experienced engineers.
    15:30 Organizations are collecting increasingly granular data creating more engineering demand.
    19:00 The data stack has shifted rapidly from Hadoop-era systems to modern cloud platforms.

    Resources Mentioned:

    Benjamin Rogojan
    https://www.linkedin.com/in/benjaminrogojan/

    Seattle Data Guy
    https://www.linkedin.com/company/seattle-data-guy/

    Apache Airflow
    https://airflow.apache.org

    Airflow Summit / Airflow Conference
    https://airflowsummit.org

    Snowflake
    https://www.snowflake.com

    HubSpot Data Sharing / APIs
    https://developers.hubspot.com

    MLflow
    https://mlflow.org

    Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.

    #AI #Automation #Airflow

Fler podcasts i Teknologi

Om The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

Welcome to The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI— the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward. Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems. Podcast Webpage: https://www.astronomer.io/podcast/
Podcast-webbplats

Lyssna på The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI, Laddstationen – med Elbilen i Sverige och många andra poddar från världens alla hörn med radio.se-appen

Hämta den kostnadsfria radio.se-appen

  • Bokmärk stationer och podcasts
  • Strömma via Wi-Fi eller Bluetooth
  • Stödjer Carplay & Android Auto
  • Många andra appfunktioner

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI: Poddsändningar i Familj

Sociala nätverk
v8.8.13| © 2007-2026 radio.de GmbH
Generated: 5/3/2026 - 7:43:58 PM