Scheduling#
Because Cosmos uses Airflow to power scheduling, you can leverage Airflow’s scheduling capabilities to schedule your dbt projects. This includes cron-based scheduling, timetables, and data-aware scheduling. For more info on Airflow’s scheduling capabilities, check out the Airflow documentation or check out the Astronomer documentation.
Time-Based Scheduling#
To schedule a dbt project on a time-based schedule, you can use Airflow’s scheduling options. For example, to run a dbt project every day starting on January 1, 2023, you can use the following DAG:
from cosmos import DbtDag
jaffle_shop = DbtDag(
# ...
start_date=datetime(2023, 1, 1),
schedule="@daily",
)
Data-Aware Scheduling#
By default, Cosmos emits Airflow Datasets when running dbt projects. This allows you to use Airflow’s data-aware scheduling capabilities to schedule your dbt projects. Cosmos emits datasets in the following format:
Dataset("DBT://{connection_id}/{project_name}/{model_name}")
For example, let’s say you have:
A dbt project (
project_one
) with a model calledmy_model
that runs dailyA second dbt project (
project_two
) with a model calledmy_other_model
that you want to run immediately aftermy_model
Then, you can use Airflow’s data-aware scheduling capabilities to schedule my_other_model
to run after my_model
. For example, you can use the following DAGs:
from cosmos import DbtDag, get_dbt_dataset
project_one = DbtDag(
# ...
start_date=datetime(2023, 1, 1),
schedule="@daily",
)
project_two = DbtDag(
# for airflow <=2.3
# schedule=[get_dbt_dataset("my_conn", "project_one", "my_model")],
# for airflow > 2.3
schedule=[get_dbt_dataset("my_conn", "project_one", "my_model")],
dbt_project_name="project_two",
)
In this scenario, project_one
runs once a day and project_two
runs immediately after project_one
. You can view these dependencies in Airflow’s UI.