- Container management: See https://docs.metaflow.org/metaflow/dependencies, - Search for artifacts: see https://docs.metaflow.org/metaflow/client, - Automatic publishing of web apps: we have this internally but it is not open-source yet.

Use Airflow to author workflows as directed acyclic graphs (DAGs) of tasks. If you don't consider them basically equivalent, what would you say are the key differences? Seems to rely on AWS Batch for production DAG execution. We will fix these links.

; concepts - collaboration, versioning, archiving, dependency management). Looking forward to test metaflow out myself. Provides rich GUI with features including DAG visualization, execution progress monitoring, scheduling, and triggering. The Airflow scheduler executes your tasks on an array of workers while following the specified dependencies. a social scientist working on datasets which can be held entirely in memory). Netflix has been using Metaflow internally for about two years, so we have many war stories :).

The centralized DAG scheduler seems like a pretty important part. Could you please explain why yes or why no :) Is metaflow not used in netflix suggestions? and then export that to airflow/etc compatible format ? re: 3. > I wouldn’t qualify metaflow as anti-UI. I am still missing well established standards for data formats, workflow definitions and project descriptions - hopefully open source ninjas will deliver on this front before proprietary pirats will destroy the field with progress-inhibiting closed things.

This isn't a scheduler that will trigger 'flows' though is it? Hi omarhaneef, Is there a reason to use this over DVC[1] which is language and framework agnostic and supports a large number of storage backends? Do you use more traditional monitoring that will alert you somehow if a workflow fails, or a workflow hasn't run at all for X hrs/days?

We support AWS today but technically other clouds could be supported too. Integration with common packages for Data Science: PyTorch, Ignite, pandas, OpenCV. This allows you to resume workflows, reproduce past results, and inspect anything about the workflow e.g.

), Package dependencies which are not used in many cases (e.g. It would be great to have a scheduler and monitoring UI that are equally lightweight. Happy to answer any questions! Is it true that right now that to run a DAG "every day at 3am UTC" requires an external service?

> It's a reasonable orchestration engine. Looking forward to exploring it. Support automatic pipeline resuming option using the intermediate data files in local or cloud (AWS, GCP, Azure) or databases as defined in. https://github.com/quantumblacklabs/kedro. We are a bit on the fence about it internally. If I search "Scheduler" in your docs the top result is a roadmap item, and searching "Cron" turns up nothing. Their flexibility and customizability is unbeatable. Thanks for taking a look at Metaflow. Metaflow is an new workflow tool developed by a team at Netflix. Why not use secrets manager for this? Thanks for open sourcing this! This article compares open-source Python packages for pipeline/workflow development: Airflow, Luigi, Gokart, Metaflow, Kedro, PipelineX. https://github.com/janushendersonassetallocation/loman/tree/... https://docs.metaflow.org/internals-of-metaflow/technical-ov... https://spark.apache.org/docs/latest/ml-guide.html, https://docs.metaflow.org/metaflow/dependencies. MLflow - An open source machine learning platform.

Can you point us to the offending links? Analogously, I imagine there are initial frictions when moving to Metaflow.

i have used airflow in the past and it seems they have addressed various pain points with this new library. Your first part of the understanding matches my expectation too. Dynamic Workflows 8.

What would be involved in getting it to work with a different cloud? I read over the page a couple of times, and couldn’t deduce that, and I think it’s a really enticing feature!

Prefect is built by the Airflow core devs after they took their initial learnings and built something new.

- We have spent time and effort in keeping the API surface area clean and highly usable. This is encapsulated by their "Datastore" model which can locally or in S3 persist flow code, config and data. The fact that metaflow works directly in Python piques my interest.

For tracking the execution of production runs, we have historically relied on the UI of the scheduler itself (meson). What about databricks made you abandon sklearn? Yes - definitely think so. Lean project template compared with pure Kedro.

