How to create a Cylc CESM workflow from scratch

This section is helpful if you would like to learn how to create your own workflows from scratch or you would like to learn more about how Cylc works.

In summary, if you can run something on the command line or through the queue, Cylc can run it for you. All you need to do is tell Cylc what you want to run, when it should run, and how to run it.

Most users create a head directory to store all of their Cylc “suites”. Suites is the term to use to describe one workflow to run. The convention that has been adopted at NCAR involves creating a top level cylc suite directory off of your home directory. Then for every “suite” or workflow you create, users create a separate directory off of their main cylc suite directory. Typically, the CESM casename has been used for this directory name, but it can be whatever makes sense to you. Within that directory, you will create a file named “suite.rc” (this exact name is expected). This file is what tells Cylc all about your workflow.

Below is a copy of a suite.rc file that is similar to what is ran for CMIP6 production, except it has been shortened for simplicity. These files tell Cylc all about the tasks you want to run. These files are autogenerated by the CESM Workflow script so users typically don’t have to worry about their contents. If you want, you can copy this example and use it as a starting point for creating your own Cylc script.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
 #!Jinja2
 {% set dates_atm_averages = ['0005-01-01','0010-01-01'] %}
 {% set dates_atm_diagnostics = ['0005-01-01','0010-01-01'] %}
 {% set dates_case_run = ['0003-01-01', '0007-01-01', '0011-01-01'] %}
 {% set dates_case_st_archive = ['0003-01-01', '0007-01-01', '0011-01-01'] %}
 {% set dates_ice_averages = ['0005-01-01','0010-01-01'] %}
 {% set dates_ice_diagnostics = ['0005-01-01','0010-01-01'] %}
 {% set dates_lnd_averages = ['0005-01-01','0010-01-01'] %}
 {% set dates_lnd_diagnostics = ['0005-01-01','0010-01-01'] %}
 {% set dates_ocn_averages = ['0005-01-01','0010-01-01'] %}
 {% set dates_ocn_diagnostics = ['0005-01-01','0010-01-01'] %}
 {% set dates_timeseriesL = ['0011-01-01'] %}
 {% set dates_xconform = ['0011-01-01'] %}
 {% set ATMDIAG_test_first_yr = [1,5] %}
 {% set ATMDIAG_test_nyrs = [5,5] %}
 {% set OCNDIAG_YEAR0 = [1,1] %}
 {% set OCNDIAG_YEAR1 = [5,10] %}
 {% set OCNDIAG_TSERIES_YEAR0 = [1,1] %}
 {% set OCNDIAG_TSERIES_YEAR1 = [5,10] %}
 {% set LNDDIAG_clim_first_yr_1 = [1,5] %}
 {% set LNDDIAG_trends_first_yr_1 = [1,5] %}
 {% set LNDDIAG_clim_num_yrs_1 = [5,5] %}
 {% set LNDDIAG_trends_num_yrs_1 = [5,5] %}
 {% set ICEDIAG_BEGYR_DIFF = [1,1] %}
 {% set ICEDIAG_ENDYR_DIFF = [5,10] %}
 {% set ICEDIAG_BEGYR_CONT = [1,1] %}
 {% set ICEDIAG_ENDYR_CONT = [5,10] %}
 {% set ICEDIAG_YRS_TO_AVG = [5,10] %}
 title = helloworld
 [cylc]
     [[environment]]
         MAIL_ADDRESS=johnsmith@ucar.edu,janesmith@ucar.edu
     [[event hooks]]
         shutdown handler = cylc email-suite
 [scheduling]
     [[dependencies]]
         graph = """
                 case_run_0003-01-01 => case_st_archive_0003-01-01
                 case_st_archive_0003-01-01 => case_run_0005-01-01
                 case_run_0005-01-01 => case_st_archive_0005-01-01
                 case_st_archive_0005-01-01 => atm_averages_0005-01-01 & ocn_averages_0005-01-01 & lnd_averages_0005-01-01 & ice_averages_0005-01-01 & case_run_0007-01-01
                 atm_averages_0005-01-01 => atm_diagnostics_0005-01-01 => atm_diagnostics_0005-01-01_post
                 ocn_averages_0005-01-01 => ocn_diagnostics_0005-01-01 => ocn_diagnostics_0005-01-01_post
                 lnd_averages_0005-01-01 => lnd_diagnostics_0005-01-01 => lnd_diagnostics_0005-01-01_post
                 ice_averages_0005-01-01 => ice_diagnostics_0005-01-01 => ice_diagnostics_0005-01-01_post
                 case_run_0007-01-01 => case_st_archive_0007-01-01
                 case_st_archive_0007-01-01 => case_run_0009-01-01
                 case_run_0009-01-01 => case_st_archive_0009-01-01
                 case_st_archive_0009-01-01 => case_run_0011-01-01
                 case_run_0011-01-01 => case_st_archive_0011-01-01
                 case_st_archive_0011-01-01 => atm_averages_0010-01-01 & ocn_averages_0010-01-01 & lnd_averages_0010-01-01 & ice_averages_0010-01-01 & timeseriesL_0011-01-01
                 atm_averages_0010-01-01 => atm_diagnostics_0010-01-01 => atm_diagnostics_0010-01-01_post
                 ocn_averages_0010-01-01 => ocn_diagnostics_0010-01-01 => ocn_diagnostics_0010-01-01_post
                 lnd_averages_0010-01-01 => lnd_diagnostics_0010-01-01 => lnd_diagnostics_0010-01-01_post
                 ice_averages_0010-01-01 => ice_diagnostics_0010-01-01 => ice_diagnostics_0010-01-01_post
                 timeseriesL_0011-01-01 => xconform_0011-01-01
                """

 [runtime]
     [[root]]
         [[[environment]]]
         {% for i in range(0,dates_atm_averages|length) %}
         [[atm_averages_{{dates_atm_averages[i]}} ]]
         script = cd /gpfs/fs1/work/cmip6/cases/DECK/helloworld/postprocess/; ./pp_config --set ATMDIAG_test_first_yr={{ATMDIAG_test_first_yr[i]}}; ./pp_config --set ATMDIAG_test_nyrs={{ATMDIAG_test_nyrs[i]}};  /gpfs/fs1/work/cmip6/cases/DECK/helloworld/postprocess/atm_averages
         [[[job]]]
                 method = pbs
                 execution time limit = PT12H
         [[[directives]]]
                 -N = averages
                 -q = regular
                 -l = select=4:ncpus=18:mpiprocs=18
                 -A = ACCT00099
         [[[event hooks]]]
                 started handler = cylc email-suite
                 succeeded handler = cylc email-suite
                 failed handler = cylc email-suite
         {% endfor %}

         {% for i in range(0,dates_atm_diagnostics|length) %}
         [[atm_diagnostics_{{dates_atm_diagnostics[i]}} ]]
         script = cd /gpfs/fs1/work/cmip6/cases/DECK/helloworld; /gpfs/fs1/work/cmip6/cases/DECK/helloworld/postprocess/atm_diagnostics
         [[[job]]]
                 method = pbs
                 execution time limit = PT12H
         [[[directives]]]
                 -N = diagnostics
                 -q = regular
                 -l = select=1:ncpus=18:mpiprocs=18
                 -A = ACCT00099
         [[[event hooks]]]
                 started handler = cylc email-suite
                 succeeded handler = cylc email-suite
                 failed handler = cylc email-suite
         {% endfor %}

         {% for i in range(0,dates_atm_diagnostics|length) %}
         [[atm_diagnostics_{{dates_atm_diagnostics[i]}}_post ]]
         script = cd /gpfs/fs1/work/cmip6/cases/DECK/helloworld/postprocess/; /gpfs/fs1/work/cmip6/cases/DECK/helloworld/postprocess/copy_html
         {% endfor %}

         {% for i in range(0,dates_case_run|length) %}
         [[case_run_{{dates_case_run[i]}} ]]
         script = cd /gpfs/fs1/work/cmip6/cases/DECK/helloworld; /gpfs/fs1/work/cmip6/cases/DECK/helloworld/case.run.cylc
         [[[job]]]
                 method = pbs
                 execution time limit = PT12H
                 execution retry delays = PT30S, PT120S, PT600S
         [[[directives]]]
                 -A = ACCT00099
                 -q = regular
                 -N = helloworld.run
                 -r = n
                 -j = oe
                 -S = /bin/bash
                 -l = select=141:ncpus=36:mpiprocs=12:ompthreads=3
         [[[event hooks]]]
                 started handler = cylc email-suite
                 succeeded handler = cylc email-suite
                 failed handler = cylc email-suite
         {% endfor %}

         {% for i in range(0,dates_case_st_archive|length) %}
         [[case_st_archive_{{dates_case_st_archive[i]}} ]]
         script = cd /gpfs/fs1/work/cmip6/cases/DECK/helloworld; /gpfs/fs1/work/cmip6/cases/DECK/helloworld/case.st_archive.cylc
         [[[job]]]
                 method = pbs
                 execution time limit = PT1H
         [[[directives]]]
                 -A = ACCT00099
                 -q = regular
                 -N = helloworld.st_archive
                 -r = n
                 -j = oe
                 -S = /bin/bash
                 -l = select=1:mpiprocs=1:ompthreads=1
         [[[event hooks]]]
                 started handler = cylc email-suite
                 succeeded handler = cylc email-suite
                 failed handler = cylc email-suite
         {% endfor %}

         {% for i in range(0,dates_ice_averages|length) %}
         [[ice_averages_{{dates_ice_averages[i]}} ]]
         script = cd /gpfs/fs1/work/cmip6/cases/DECK/helloworld/postprocess/; ./pp_config --set ICEDIAG_BEGYR_DIFF={{ICEDIAG_BEGYR_DIFF[i]}}; ./pp_config --set ICEDIAG_ENDYR_DIFF={{ICEDIAG_ENDYR_DIFF[i]}}; ./pp_config --set ICEDIAG_BEGYR_CONT={{ICEDIAG_BEGYR_CONT[i]}}; ./pp_config --set ICEDIAG_ENDYR_CONT={{ICEDIAG_ENDYR_CONT[i]}}; ./pp_config --set ICEDIAG_YRS_TO_AVG={{ICEDIAG_YRS_TO_AVG[i]}};  /gpfs/fs1/work/cmip6/cases/DECK/helloworld/postprocess/ice_averages
         [[[job]]]
                 method = pbs
                 execution time limit = PT12H
         [[[directives]]]
                 -N = averages
                 -q = regular
                 -l = select=4:ncpus=4:mpiprocs=4
                 -A = ACCT00099
         [[[event hooks]]]
                 started handler = cylc email-suite
                 succeeded handler = cylc email-suite
                 failed handler = cylc email-suite
         {% endfor %}

         {% for i in range(0,dates_ice_diagnostics|length) %}
         [[ice_diagnostics_{{dates_ice_diagnostics[i]}} ]]
         script = cd /gpfs/fs1/work/cmip6/cases/DECK/helloworld; /gpfs/fs1/work/cmip6/cases/DECK/helloworld/postprocess/ice_diagnostics
         [[[job]]]
                 method = pbs
                 execution time limit = PT12H
         [[[directives]]]
                 -N = diagnostics
                 -q = regular
                 -l = select=1:ncpus=8:mpiprocs=8
                 -A = ACCT00099
         [[[event hooks]]]
                 started handler = cylc email-suite
                 succeeded handler = cylc email-suite
                 failed handler = cylc email-suite
         {% endfor %}

         {% for i in range(0,dates_ice_diagnostics|length) %}
         [[ice_diagnostics_{{dates_ice_diagnostics[i]}}_post ]]
         script = cd /gpfs/fs1/work/cmip6/cases/DECK/helloworld/postprocess/; /gpfs/fs1/work/cmip6/cases/DECK/helloworld/postprocess/copy_html
         {% endfor %}

         {% for i in range(0,dates_lnd_averages|length) %}
         [[lnd_averages_{{dates_lnd_averages[i]}} ]]
         script = cd /gpfs/fs1/work/cmip6/cases/DECK/helloworld/postprocess/; ./pp_config --set LNDDIAG_clim_first_yr_1={{LNDDIAG_clim_first_yr_1[i]}}; ./pp_config --set LNDDIAG_trends_first_yr_1={{LNDDIAG_trends_first_yr_1[i]}}; ./pp_config --set LNDDIAG_clim_num_yrs_1={{LNDDIAG_clim_num_yrs_1[i]}}; ./pp_config --set LNDDIAG_trends_num_yrs_1={{LNDDIAG_trends_num_yrs_1[i]}};  /gpfs/fs1/work/cmip6/cases/DECK/helloworld/postprocess/lnd_averages
         [[[job]]]
                 method = pbs
                 execution time limit = PT12H
         [[[directives]]]
                 -N = averages
                 -q = regular
                 -l = select=4:ncpus=18:mpiprocs=18
                 -A = ACCT00099
         [[[event hooks]]]
                 started handler = cylc email-suite
                 succeeded handler = cylc email-suite
                 failed handler = cylc email-suite
         {% endfor %}

         {% for i in range(0,dates_lnd_diagnostics|length) %}
         [[lnd_diagnostics_{{dates_lnd_diagnostics[i]}} ]]
         script = cd /gpfs/fs1/work/cmip6/cases/DECK/helloworld; /gpfs/fs1/work/cmip6/cases/DECK/helloworld/postprocess/lnd_diagnostics
         [[[job]]]
                 method = pbs
                 execution time limit = PT12H
         [[[directives]]]
                 -N = diagnostics
                 -q = regular
                 -l = select=1:ncpus=16:mpiprocs=16
                 -A = ACCT00099
         [[[event hooks]]]
                 started handler = cylc email-suite
                 succeeded handler = cylc email-suite
                 failed handler = cylc email-suite
         {% endfor %}

         {% for i in range(0,dates_lnd_diagnostics|length) %}
         [[lnd_diagnostics_{{dates_lnd_diagnostics[i]}}_post ]]
         script = cd /gpfs/fs1/work/cmip6/cases/DECK/helloworld/postprocess/; /gpfs/fs1/work/cmip6/cases/DECK/helloworld/postprocess/copy_html
         {% endfor %}

         {% for i in range(0,dates_ocn_averages|length) %}
         [[ocn_averages_{{dates_ocn_averages[i]}} ]]
         script = cd /gpfs/fs1/work/cmip6/cases/DECK/helloworld/postprocess/; ./pp_config --set OCNDIAG_YEAR0={{OCNDIAG_YEAR0[i]}}; ./pp_config --set OCNDIAG_YEAR1={{OCNDIAG_YEAR1[i]}}; ./pp_config --set OCNDIAG_TSERIES_YEAR0={{OCNDIAG_TSERIES_YEAR0[i]}}; ./pp_config --set OCNDIAG_TSERIES_YEAR1={{OCNDIAG_TSERIES_YEAR1[i]}};  /gpfs/fs1/work/cmip6/cases/DECK/helloworld/postprocess/ocn_averages
         [[[job]]]
                 method = pbs
                 execution time limit = PT12H
         [[[directives]]]
                 -N = averages
                 -q = regular
                 -l = select=4:ncpus=4:mpiprocs=4
                 -A = ACCT00099
         [[[event hooks]]]
                 started handler = cylc email-suite
                 succeeded handler = cylc email-suite
                 failed handler = cylc email-suite
         {% endfor %}

         {% for i in range(0,dates_ocn_diagnostics|length) %}
         [[ocn_diagnostics_{{dates_ocn_diagnostics[i]}} ]]
         script = cd /gpfs/fs1/work/cmip6/cases/DECK/helloworld; /gpfs/fs1/work/cmip6/cases/DECK/helloworld/postprocess/ocn_diagnostics
         [[[job]]]
                 method = pbs
                 execution time limit = PT12H
         [[[directives]]]
                 -N = diagnostics
                 -q = regular
                 -l = select=1:ncpus=16:mpiprocs=16
                 -A = ACCT00099
         [[[event hooks]]]
                 started handler = cylc email-suite
                 succeeded handler = cylc email-suite
                 failed handler = cylc email-suite
         {% endfor %}

         {% for i in range(0,dates_ocn_diagnostics|length) %}
         [[ocn_diagnostics_{{dates_ocn_diagnostics[i]}}_post ]]
         script = cd /gpfs/fs1/work/cmip6/cases/DECK/helloworld/postprocess/; /gpfs/fs1/work/cmip6/cases/DECK/helloworld/postprocess/copy_html
         {% endfor %}

         {% for i in range(0,dates_timeseriesL|length) %}
         [[timeseriesL_{{dates_timeseriesL[i]}} ]]
         script = cd /gpfs/fs1/work/cmip6/cases/DECK/helloworld; /gpfs/fs1/work/cmip6/cases/DECK/helloworld/postprocess/timeseriesL
         [[[job]]]
                 method = pbs
                 execution time limit = PT12H
         [[[directives]]]
                 -N = timeseries
                 -q = regular
                 -l = select=16:ncpus=9:mpiprocs=9
                 -A = ACCT00099
         [[[event hooks]]]
                 started handler = cylc email-suite
                 succeeded handler = cylc email-suite
                 failed handler = cylc email-suite
         {% endfor %}

         {% for i in range(0,dates_xconform|length) %}
         [[xconform_{{dates_xconform[i]}} ]]
         script = cd /gpfs/fs1/work/cmip6/cases/DECK/helloworld; /gpfs/fs1/work/cmip6/cases/DECK/helloworld/postprocess/xconform
         [[[job]]]
                 method = pbs
                 execution time limit = PT12H
         [[[directives]]]
                 -N = xconform
                 -q = regular
                 -l = select=16:ncpus=4:mpiprocs=4
                 -A = ACCT00099
         [[[event hooks]]]
                 started handler = cylc email-suite
                 succeeded handler = cylc email-suite
                 failed handler = cylc email-suite
         {% endfor %}

Once this file is setup, you will need to register your suite.rc file. Once registered, you can run any Cylc command anywhere on the file system and the command knows where your suite is and what it contains. To register your suite.rc file, you need to execute this command:

cylc register <whatever I want to name my suite>.suite /the/path/to/the/suite.rc/file/

You can name the suite whatever you would like. For easier recognition, the standard has been <casename>.suite. For easier recognition, the standard has been <casename>.suite. Whatever you name it, you will have to remember this name because you will use this name within every cylc and gcylc call. If you forget, you can see a list of all of your registered suites in your ~/cylc-run/ directory. This directory is created by Cylc automatically and contains all of the run time information, including the suite’s database that contains all of the status information.

Here are some other useful commands while creating a new Cylc suite or modifying an existing one:

cylc graph <whatever you named your suite>.suite  # This creates a graphical representation of your workflow.  It's helpful to see if you've connected the dependencies correctly.

cylc validate <whatever you named your suite>.suite # This checks for incorrect syntax

Line-by-line description of the above suite.rc file

Line 1: This file uses Jinja2 templating in order to make this file easier to read and to make it more compact

Lines 2-13: These arrays are part of the Jinja2 templating that list the dates to run each task for. The variables dates_*_averages must have the same number of entries as dates_*_diagnostics for the same component type. For example, if dates_atm_averages has two entries, dates_atm_diagnostics must also have two entries.

Lines 14-28: These arrays are also part of the Jinja templating. They list the values that need to be changed in the post-processing xml files for the diagnostics. The number of values in the arrays must match the number of values in their corresponding arrays in lines 2-13. For example, if dates_atm_averages and dates_atm_diagnostics both have two entries, ATMDIAG_test_first_yr and ATMDIAG_test_nyrs must also have two entries.

Line 29: The title of your workflow or suite.

Line 30: The section of the suite.rc file where non task-specific information is specified.

Line 31: The section of the suite.rc file where environment setting are specified.

Line 32: Sets the email addresses that will be receive job update messages. Multiple addresses can be specified as a comma separated list.

Line 33: The section of the suite.rc file where you can indicate event hooks.

Line 34: Indicates that you would like an email sent to the list indicating a Cylc suite has shutdown manually or after a workflow has completed.

Line 35: The section that describes the scheduling of tasks.

Line 36: The section that describes the dependencies between all of the tasks.

Lines 37-57: These lines describe the workflow “graph”. Below is some of the common syntax shown in this section.

task1 => task2  #means that task1 can run and after it finishes successfully, run task2.

task1 => task2 & task3  #means that task1 can run and after it finishes successfully, run task2 and task3.

Lines 59-61: The start of the runtime section that sets what tasks to execute and how to execute them.

Line 62: This line is part of the Jinja2 templating. It will loop over the dates specified in dates_atm_averages and will create the same code for each date when the suite.rc is compiled. If this loop did not exist, you would have to specify this same information for each date specified.

Line 63: This line is part of the Jinja2 templating. It specifies the unique name of the task that is being described. You can notice the templating variables, that will create a unique task name based on the loop index.

Line 64: This line lists what needs to be ran. In this several commands are listed, separated by semicolons. The first task will cd into the case’s post-processing directory. From there it will run the pp_config command several times to change the variables in the env_diags_atm.xml file. Then it will run the averaging script.

Line 65: This next section describes job specific information

Line 66: This line sets the scheduler to be pbs

Line 67: This line sets the wallclock time to be 12 hours.

Line 68: This section sets the directives to use in the job’s submission script. These can usually be found at the top of the post-processing script you would like to run. In the case of running CESM, these can be found at the top of the .case.run script in your case directory.

Line 69: Sets the name of the job.

Line 70: Sets the name of the queue to run in.

Line 71: Sets the resource sizes for the job.

Line 72: Sets the account to submit the job under.

Line 73: Event hooks for this particular job.

Lines 74-76: Email the list when this task has started, succeeded, and/or failed.

Lines 77: End the for loop creating copies of this task for each date specified.

Several of the sections follow the same formatting. They contain the same looping methods to create the multiple tasks for the date ranges specified and the job, directives, and event hooks sections. The post tasks are also similar to each other and are described below.

Lines 96-99: These lines describe the task of copying the post-processing to a webhost. You’ll notice this section is missing the sections found in other tasks. This is because it is ran on the command line and does not need all of the queue submission information. Instead, it only lists the script variable that lists the commands to run.

Another section that is unique is found within the case_run_{{dates_case_run[i]}} section:

Line 107: This line allows Cylc to resubmit your task if it fails (it receives a non-zero exit call). In this particular case, it will try to run CESM 3 additional times and officially fail if all attempts fail. The first re-try will wait 30 seconds before it submits again. The other attempts will wait 2 minutes and then 10 minutes respectfully before submitting the CESM task again. This was used as a fault-tolerance mechanism to resubmit the job again at varying times to avoid machine issues.