Home
Best practices for writing airflow dags
tags: #airflow
- Idempotency:
- for a set input, running the program once has the same effect as running the program multiple times. A good idea is to make every task atomic and idempotent.
- Idempotency paves the way for one of Airflow’s most useful features: Retries
- Set retries:
- It can be possible for tasks to be killed off unexpectedly. If this happens one might see a zombie process in airflow logs
- Retries can be set at different levels with following precedence:
- Tasks: Pass
retriesparams to the task operator - DAGs: Include
retriesin DAG’sdefault_argsobject - Deployments: Set env variable
AIRFLOW__CORE__DEFAULT_TASK_RETRIES
- Tasks: Pass
- Setting retries to
2will protect a task from most problems common to distributed environments.
- Using template fields, variables and macros
- Bad practice:
yesterday = datetime.today() - timedelta(1) - Good practice:
yesterday =- because
datetime.today()is relative to the current date, not the DAG execution date.
- because
- Bad practice: