fi_mon_sampler(1) Libfabric Programmer's Manual
NAME
fi_mon_sampler - Simple sampler for ofi_hook_monitor provider.
SYNOPSIS
fi_mon_sampler [OPTIONS] <target> sample from file(s) at <target>
DESCRIPTION
Extract data from the ofi_hook_monitor provider via communication files. <target>
can either be
one communication file or a folder of files. Data is exported based on -f <format>
and either printed
to stdout (only for single files), or stored per communication file at -o <outpath>
.
The sampler can watch the communication files for changes via the option -w <msec>
for repeated sampling.
The name format of the output files is based on the ofi_hook_monitor provider and is as follows:
<ppid>_<pid>_<sequential id>_<job id>_<provider name>
.
ppid
and pid
are taken from the perspective of the monitored application.
In a batched environment running SLURM, job id
is set to the SLURM job ID, otherwise it is set to 0.
HOW TO RUN
Launch a libfabric application with FI_HOOK=monitor
to enable the ofi_hook_monitor provider.
Adjust the monitor provider settings according to fi_hook
(7).
Then launch the sampler via fi_mon_sampler -o <output> <target>
.
By default, the ofi_hook_monitor provider stores data at /dev/shm/ofi/<uid>/<hostname>
.
The sampler will generate output files in the directory specified at <output>
, one for each monitored provider.
OPTIONS
- -w <msec>
- Watch files for changes, check every <msec> milliseconds.
- -f <format>
- Output format. Currently only supports CSV.
- -o <outpath>
- Output file path. Uses stdout if unset.
USAGE EXAMPLES
Launch a libfabric application and enable the ofi_hook_monitor provider:
FI_HOOK=monitor fi_pingpong [OPTIONS]
Launch another fi_pingpong
with the respective settings.
Finally, launch the sampler:
fi_mon_sampler -o $HOME -w 1000 -f csv /dev/shm/ofi/$UID/$HOSTNAME
OUTPUT
Output files will be generated in the folder specified at -o <output>
.
In -f csv
mode, this will contain a CSV file with data for all monitored libfabric functions.
For each function, both the count
and sum
counters are exported,
indicated by the column name suffix _c
and _s
respectively.
In addition, each function is monitored for each data size bucket.
Refer to fi_hook
(7) for more details.
Example CSV output, first four columns, first three rows:
mon_recv_0_64_c,mon_recv_0_64_s,mon_recv_64_512_c,mon_recv_64_512_s
0,0,0,0
22529,0,0,0
113664,0,0,0