3 Comments

Thank you, Sven, for sharing your thoughts on Starlake. I work in the agriculture sector, and we use Starlake daily to extract and ingest data, generating DAGs from templates to orchestrate everything. For now, our ingestion DAGs are scheduled rather than event-triggered. We are considering using transformations and taking advantage of column-level lineage to identify the origin of downstream data.

Expand full comment

Thanks Sven for this thoughtful article! Addressing your point about what makes Starlake a game-changer: one of our banking clients reduced their data pipeline development time by up to 90% using Starlake. They're thrilled because it automates complex data engineering tasks, improves data quality, and accelerates insights. This transformation has been a 10x improvement over their previous setup. We're excited to continue helping companies bridge the data engineering gap and achieve outstanding results!

Expand full comment

Great read! I really appreciate your in-depth and honest review of Starlake.

Just wanted to offer a small clarification: Starlake allows you to define schedules for workload runs.

It supports:

Cron-based scheduling: Lets users define fixed schedules for workload execution based on time patterns (e.g., "run this process every day at 2 AM").

Event-driven orchestration: Supports triggering workloads dynamically when specific events occur, using dataset-aware Directed Acyclic Graphs (DAGs). By tracking data origins, transformations, and destinations, Starlake determines how datasets are interconnected.

Additionally, by leveraging data lineage and the schedules of dependent datasets, Starlake automatically aligns these schedules, ensuring the expected data freshness. This automatic alignment reduces the need for manual adjustments or complex configurations, simplifying workload management while maintaining reliability.

Keep up the excellent work!

Expand full comment