2. Spec Options¶
Contents
All the following keys are optional. If the user spec doesn’t specify them, the default options will be used from the default specs.
2.1. General Options¶
-
name
¶
- Type
String
Name of the spec.
-
description
¶
- Type
String
Description of the spec.
2.2. Experiments¶
-
experiment_default
¶
- Type
Mapping
The default options for each experiment in
experiments
. Available options are:-
experiment_default.name
¶
- Type
String
- Default
<default>
Name of the experiment.
-
experiment_default.envs
¶
- Type
Mapping
- Format
"<Environment variable name 1>": "<Value 1>" "<Environment variable name 2>": "<Value 2>" ...
- Default
{}
- Example
experiment_default: envs: LOCAL_DIR: $HOME/test REMOTE_FILE: $HOME/test.log
Default environment variables.
These environment variables will be added to each experiment no matter what the command type is. It’s convenient to set commonly used environment variables here.
-
experiment_default.depends_on
¶
- Type
Mapping
- Format
"<Command type 1>": - "<Name of other experiment 1>" - "<Name of other experiment 2>" - ... "<Command type 2>": - ... ...
- Default
{}
- Example
experiment_default: depends_on: run: - Installation - Configuration
A list of dependent experiments.
Noodles will check whether all experiments in
depends_on
are already successfully deployed. If one of the experiments hasn’t been deployed, Noodles would skip it and retry this experiment in the next deployment round.
-
experiment_default.requirements
¶
- Type
Mapping
- Format
"<Command type 1>": - "<Requirement ID 1>": "<Expression 1>" - "<Requirement ID 2>": "<Expression 2>" - "static:<Requirement ID 3>": "<Expression 3>" - "dynamic:<Requirement ID 4>": "<Expression 4>" - ... "<Command type 2>": - ... ...
- Default
{}
- Example
experiment_default: requirements: run: - cpu_usage: "<=0.7" - cpu_load: "<=32" - gpu_usage: "<=0.5" - static:free_quota: ">=0.2" - dynamic:has_lock_file: "==No" stop: - has_lock_file: "==Yes" download: - count_output_files: "!=0"
A list of requirements.
Each requirement ID (e.g.,
cpu_usage
) is mapped to a group of commands specified inrequirements
. The mapped commands would be run on each server in \(S\) (See Deploy Experiments to Servers). The metric is then compared to the expression, which is composed of<Operator><Value>
.List of available operators are:
==
!=
<
>
<=
>=
Noodles would try to convert the value following operator using Python builtin function
ast.literal_eval()
. It it fails, the original value would be used in its original string form.Each command can specify an extra prefix
static:
ordynamic:
. If a prefixstatic:
is used, Noodles will not check the requirement when some other experiment has checked it before in the same deployment round.On the other hand, if a prefix
dynamic:
is used, every requirement will be checked for each experiment. By default,dynamic:
behavior is chosen when the prefix is omitted. See Check Requirements for the procedure.
-
experiment_default.commands
¶
- Type
Mapping
- Format
"<Command type 1>": "<Command 1>" "<Command type 2>": - "<Command A>" - "<Command B>" - "local:<Command C>" - "remote:<Command D>" - ... ...
- Default
{}
- Example
experiment_default: commands: run: "touch exp.lock && ./exp1.sh $PARAMETER" stop: - pkill exp1 - rm exp.lock download: - local:scp $NOODLES_SERVER_AUTHORITY:$REMOTE_FILE $LOCAL_DIR
Commands to be run on a server.
Commands will only be executed on servers when the above requirements are met.
Each command can specify an extra prefix
local:
orremote:
to specify whether to run the command on local or remote machine respectively. By default, all commands are deployed to remaining servers selected fromservers
if no prefix is specified.It’s useful to add
local:
to a command to check the problems on local machine if the command fails on a remote server.
-
experiment_default.write_outputs
¶
- Type
Mapping
- Format
"Command type 1>": stdout_to: "<Path to write STDOUT output>" stderr_to: "<Path to write STDERR output>" "Command type 2>": stdout_to: ... stderr_to: ... ...
- Default
{}
- Example
experiment_default: write_outputs: run: stdout_to: $LOCAL_DIR/exp1.stdout.log stderr_to: $LOCAL_DIR/exp1.stderr.log
The paths to write STDOUT and STDERR outputs.
By default, STDOUT is written to the terminal and STDERR is written to the terminal only when it’s nonempty.
User can redirect the outputs to files by specifying paths in options
stdout_to
and/orstderr_to
for STDOUT and STDERR respectively under the optionwrite_outputs
. Ifstderr_to
is specified, STDERR will be written no matter it’s empty or not.
-
before_all_experiments
¶
- Type
List
- Format
Same as
experiments
- Default
[]
- Example
See example in
experiments
A list of experiments to runs before running
experiments
.
-
experiments
¶
- Type
List
- Format
- "<Experiment 1>" - "<Experiment 2>" - ...
- Default
[]
- Example
experiments: - name: Experiment 1 envs: LOCAL_DIR: $HOME/exp1 REMOTE_FILE: $HOME/exp1.log PARAMETER: 3 depends_on: run: - Installation - Configuration requirements: run: - cpu_usage: "<=0.7" - cpu_load: "<=32" - gpu_usage: "<=0.5" - free_quota: ">=0.2" - has_lock_file: "==No" stop: - has_lock_file: "==Yes" download: - count_output_files: "!=0" commands: run: "touch exp.lock && ./exp1.sh $PARAMETER" stop: - pkill exp1 - rm exp.lock download: - local:scp $NOODLES_SERVER_AUTHORITY:$REMOTE_FILE $LOCAL_DIR write_outputs: run: stdout_to: $LOCAL_DIR/exp1.stdout.log stderr_to: $LOCAL_DIR/exp1.stderr.log - name: Experiment 2 envs: LOCAL_DIR: $HOME/exp2 REMOTE_FILE: $HOME/exp2.log PARAMETER: 5 ... ...
A list of main experiments.
Each experiment would override the options in
experiment_default
. Seeexperiment_default
.
-
after_all_experiments
¶
- Type
List
- Format
Same as
experiments
- Default
[]
- Example
See example in
experiments
A list of experiments to run after running
experiments
.
2.3. Servers¶
-
server_default
¶
- Type
Mapping
- Example
server_default: name: <default> private_key_path: $HOME/.ssh/id_rsa port: 22 username: user1 hostname: server1.example.com
The default options for each server in
servers
. Available options are:-
server_default.name
¶
- Type
String
- Default
<default>
Default name of the server.
-
server_default.private_key_path
¶
- Type
String
- Default
HOME/.ssh/id_rsa
Path to the private key on local machine.
-
server_default.port
¶
- Type
Integer
- Default
22
Port to connect.
-
server_default.username
¶
- Type
String
- Default
$USER
Username on the server (e.g.,
user1
).
-
server_default.hostname
¶
- Type
String
- Default
localhost
Hostname of the server (e.g.,
example.com
,123.123.123.123
).If the hostname is a special value
localhost
, the commands will be run on local machine without ssh.
-
servers
¶
- Type
List
- Format
- "<Server 1>" - "<Server 2>" - ...
- Default
[]
- Example
servers: - name: Server 1 username: user1 hostname: server1.example.com - name: Server 2 private_key_path: temp/id_rsa port: 64 username: user2 hostname: 123.123.123.123 - name: Local server username: $USER hostname: localhost
A list of servers.
Each server would override the options in
server_default
. Seeserver_default
.
2.4. Requirements¶
-
requirements
¶
- Type
Mapping
- Default
See
requirements
in default specs.- Format
"<Requirement ID 1>": "<Command 1>" "<Requirement ID 2>": - "<Command A>" - "<Command B>" - ... ...
- Example
requirements: # Check whether the file exists has_file: "[ -f $PATH ] && echo -n Yes || echo -n No" # Get average CPU usage over 3 seconds (Output: Three floats between 0-100 (%)) (Reference: https://askubuntu.com/a/941997) cpu_usage: - "(grep 'cpu ' /proc/stat;sleep 0.1;grep 'cpu ' /proc/stat) | awk -v RS='' '{print ($13-$2+$15-$4)*100/($13-$2+$15-$4+$16-$5)}'" - "sleep 1.5" - "(grep 'cpu ' /proc/stat;sleep 0.1;grep 'cpu ' /proc/stat) | awk -v RS='' '{print ($13-$2+$15-$4)*100/($13-$2+$15-$4+$16-$5)}'" - "sleep 1.5" - "(grep 'cpu ' /proc/stat;sleep 0.1;grep 'cpu ' /proc/stat) | awk -v RS='' '{print ($13-$2+$15-$4)*100/($13-$2+$15-$4+$16-$5)}'" # Get average CPU load over last 1 minute (Output: CPU load greater or equal to 0.0) (Reference: https://stackoverflow.com/a/24839903) cpu_load: awk '{print $1}' /proc/loadavg
Commands to run to check requirements on servers.
After executing the commands on server, Noodles would try to convert the text in each line in STDOUT output using Python builtin function
ast.literal_eval()
and calculate the mean as the metric. If it fails, the original STDOUT would be used as metric.
2.5. Deployment¶
-
write_status_to
¶
- Type
Mapping
- Default
{}
- Format
"<Command type 1>": "<Path 1>" "<Command type 2>": "<Path 2>" ...
The path for Noodles to write deployment status.
The list of status to be written are:
Command type: <The given command type>
User spec path: <The given path of the user spec>
Start time: <Start time of current Noodles run>
Previous round time: <Time of the end of previous deployment round>
Elapsed time: <Elapsed time from start time>
Stage: <Current stage (
before_all_experiments
,experiments
orafter_all_experiments
)>Round #: <Current deployment round number>
Deployed experiments: <A list of deployed experiment names>
Undeployed experiments: <A list of undeployed experiment names>
Undeployed experiments (For command): <A filter string of comma-separated undeployed experiment names>
The file will be updated before the first deployment and after each successful deployment.
-
round_interval
¶
- Type
Number
- Default
10
The interval to run each deployment round.
See how it’s used in Deploy Experiments to Servers.
-
deployment_interval
¶
- Type
Number
- Default
0
The interval to deploy each experiment in each round.
See how it’s used in Deploy One Experiment.
-
commands_interval
¶
- Type
Number
- Default
0
The interval to execute the commands.
2.6. Error Handling¶
-
check_any_errors
¶
- Type
Boolean
- Default
True
Whether to check any nonzero return code and nonempty stderr.
When it’s turned on, any nonzero return code and/or nonempty STDERR will trigger Noodles to check the first matching error handler in
error_handlers
. Otherwise, any errors will be ignored.
-
error_handlers
¶
- Type
List
- Default
[]
- Format
- name: "<Name 1>" return_code: "<return code pattern 1>" stderr_pattern: "<STDERR pattern 1>" commands: "<Response command 1>" action: "<Action to take 1>" - name: "<Name 2>" return_code: "<return code pattern 2>" stderr_pattern: "<STDERR pattern 2>" commands: - "<Response command A>" - "<Response command B>" - ... action: "<Action to take 2>" ...
- Example
error_handlers: - name: Abort when command not found return_code: "\\d+" stderr_pattern: "^bash: line \\d+: .+: command not found\\s+$" action: abort - name: Ignore SSH resolve hostname error return_code: 255 stderr_pattern: "^ssh: Could not resolve hostname .+: Name or service not known\\s+$" action: retry - name: Ignore git clone already exists error return_code: 0 stderr_pattern: "^fatal: destination path '.+' already exists and is not an empty directory.\\s+$" action: continue - name: Send email when unknown error occurred (Catch all) return_code: ".+" stderr_pattern: "[\\S\\s]+" commands: "echo Check unknown error | mail -s \"Error!\" user1@gmail.com" action: abort
List of error handlers.
If the option
check_any_errors
is turned on and any nonzero return code and/or nonempty STDERR are generated after executing some commands, Noodles will check each error handler in this option and find the first match.Available actions are:
abort
(Noodles will abort the whole procedure, raise an error and exit)retry
(Noodles will skip the experiment and retry it in the next deployment round)continue
(Noodles will ignore the errors and treat the deployment as successful)
If there is a match, the response commands will be executed no matter what the action is, then the predefined action will be taken, otherwise no response commands will be executed and the abort action will be taken.
When the response commands are executed, the currently chosen server spec and environment variables from the current experiment are used.
If
return_code
is an integer, it will be directly compared to nonzero return code returned from the commands, otherwise it’s treated as a regex pattern and the following procedure is taken.return_code
andstderr_pattern
will be passed into the argumentpattern
in Python builtin functionre.fullmatch(pattern, string)
, the argumentstring
will be the return code or STDERR string to be handled.A error handler is only matched when both
return_code
andstderr_pattern
are matched.