Performance-Monitoring Counters Library, for Intel/AMD Processors and Linux
This example introduces
rabbit command-line options --duration m,n --user m,n --os m,n
an experiment to measure operating system activity
Return to Main Menu
Command-Line Options
--duration m,n
m, n = 1-bit flag, default 0
Some events may extend over several cycles. Count occurrences (0)
or cycles (1) associated with the event.
Most events count occurences or cycles independent of this option.
Use 'rabbit -d' for more information.
--user m,n
--os m,n
m, n = 1-bit flag, default 1
When enabled (1), events are counted if the Current Privilege Level
is in user-mode or system-mode. For the Pentium, user-mode is CPL
3; for the Pentium Pro/II/III and Athlon it is 1, 2 or 3. For the
Pentium, system-mode is CPL 0, 1 or 2; for the Pentium Pro/II/III and
Athlon it is 0. When disabled (0), events are not counted.
Formally, CPL 0 = OS kernel; 1, 2 = OS services; 3 = applications.
Linux typically uses only CPL 0 and 3.
On the Athlon, four counters are available.
Be careful not to use '-o' when you mean '--o'. '-o 0' will create
and write to the directory '0' (-output).
Example
To see what fraction of system activity is taken by the operating
system and daemons, select the events simply to count cycles. We
use the same event in both counters, separating the user and system
modes on a 200 MHz Pentium Pro running Linux 2.0.28 (h% is the prompt).
h% rabbit --e 121 -d
0x79 121 cycles processor is not halted
0x79 121 cycles processor is not halted
Since only two events are used, and we are only looking for an average, we
do not need to sample frequently. The first case is an idle system:
h% rabbit -s 1 --e 121 --user 1,0 --os 0,1 sleep 150
------------------------ performance counters ------------------------
Host processor: h
Command executed: sleep 150
Options: --duration 0,0 --user 1,0 --os 0,1
Options: --mesi 0xf,0xf --bus_agent 1,1 --compare 0,0 --invert 0,0
Options: --Enable 1,1 --PC 1,1 --APIC 0,0
Sampling: rate = 1 sample per second, 151 taken
Event Events Events/sec
---------------------------------- ---------------- ----------------
0x79 121 cpu_clk_unhalted 1168492 7815.64
0x79 121 cpu_clk_unhalted 29900221370 199992202.93
resource usage:
time = 0.00 sec user, 0.00 sec sys, 149.51 sec real, 0.00% of cpu
page reclaims, faults = 7, 59
Almost all the system activity on an idle system occurs in a timed delay
loop inside the scheduler, ultimately waiting for an interrupt. Note that
the Pentium II (450 MHz, Linux 2.0.36) will show different results, and you
might have some fun trying to explain why:
holmes% rabbit -s 1 --e 121 --user 1,0 --os 0,1 sleep 150
------------------------ performance counters ------------------------
Host processor: holmes
Command executed: sleep 150
Options: --duration 0,0 --user 1,0 --os 0,1
Options: --mesi 0xf,0xf --bus_agent 1,1 --compare 0,0 --invert 0,0
Options: --MMX 0x3f,0x3f
Options: --Enable 1,1 --PC 1,1 --APIC 0,0
Sampling: rate = 1 sample per second, 151 taken
Event Events Events/sec
---------------------------------- ---------------- ----------------
0x79 121 cpu_clk_unhalted 1095160 7282.76
0x79 121 cpu_clk_unhalted 88178872 586384.85
resource usage:
time = 0.00 sec user, 0.01 sec sys, 150.38 sec real, 0.01% of cpu
page reclaims, faults = 10, 73
The second case is a user program with no system calls:
h% rabbit -s 1 --e 121 --user 1,0 --os 0,1 foo
< output from foo omitted >
------------------------ performance counters ------------------------
Host processor: h
Command executed: foo
command exited with non-zero status 33
Options: --duration 0,0 --user 1,0 --os 0,1
Options: --mesi 0xf,0xf --bus_agent 1,1 --compare 0,0 --invert 0,0
Options: --Enable 1,1 --PC 1,1 --APIC 0,0
Sampling: rate = 1 sample per second, 151 taken
Event Events Events/sec
---------------------------------- ---------------- ----------------
0x79 121 cpu_clk_unhalted 30006478362 199621756.24
0x79 121 cpu_clk_unhalted 56859145 378262.40
resource usage:
time = 150.76 sec user, 0.05 sec sys, 150.32 sec real, 100.33% of cpu
page reclaims, faults = 7, 50
A simple check of the data,
7815.64 + 199992202.93 = 200000018.57, and
199621756.24 + 378262.40 = 200000018.64,
shows that the operating system took about 378262.40 / 200000018.64 = 0.19%
of the cycles otherwise available to the user program. You can expect these
numbers to vary somewhat, depending on system configuration and transitory
conditions.
Performance-Monitoring Counters Library, for Intel/AMD Processors and Linux
Author: Don Heller, dheller@scl.ameslab.gov
Last revised: 30 October 2000