Performance-Monitoring Counters Library, for Intel/AMD Processors and Linux
This example introduces
   rabbit command-line options --duration m,n --user m,n --os m,n
   an experiment to measure operating system activity

Return to Main Menu


Command-Line Options --duration m,n m, n = 1-bit flag, default 0 Some events may extend over several cycles. Count occurrences (0) or cycles (1) associated with the event. Most events count occurences or cycles independent of this option. Use 'rabbit -d' for more information. --user m,n --os m,n m, n = 1-bit flag, default 1 When enabled (1), events are counted if the Current Privilege Level is in user-mode or system-mode. For the Pentium, user-mode is CPL 3; for the Pentium Pro/II/III and Athlon it is 1, 2 or 3. For the Pentium, system-mode is CPL 0, 1 or 2; for the Pentium Pro/II/III and Athlon it is 0. When disabled (0), events are not counted. Formally, CPL 0 = OS kernel; 1, 2 = OS services; 3 = applications. Linux typically uses only CPL 0 and 3. On the Athlon, four counters are available. Be careful not to use '-o' when you mean '--o'. '-o 0' will create and write to the directory '0' (-output).
Example To see what fraction of system activity is taken by the operating system and daemons, select the events simply to count cycles. We use the same event in both counters, separating the user and system modes on a 200 MHz Pentium Pro running Linux 2.0.28 (h% is the prompt). h% rabbit --e 121 -d 0x79 121 cycles processor is not halted 0x79 121 cycles processor is not halted Since only two events are used, and we are only looking for an average, we do not need to sample frequently. The first case is an idle system: h% rabbit -s 1 --e 121 --user 1,0 --os 0,1 sleep 150 ------------------------ performance counters ------------------------ Host processor: h Command executed: sleep 150 Options: --duration 0,0 --user 1,0 --os 0,1 Options: --mesi 0xf,0xf --bus_agent 1,1 --compare 0,0 --invert 0,0 Options: --Enable 1,1 --PC 1,1 --APIC 0,0 Sampling: rate = 1 sample per second, 151 taken Event Events Events/sec ---------------------------------- ---------------- ---------------- 0x79 121 cpu_clk_unhalted 1168492 7815.64 0x79 121 cpu_clk_unhalted 29900221370 199992202.93 resource usage: time = 0.00 sec user, 0.00 sec sys, 149.51 sec real, 0.00% of cpu page reclaims, faults = 7, 59 Almost all the system activity on an idle system occurs in a timed delay loop inside the scheduler, ultimately waiting for an interrupt. Note that the Pentium II (450 MHz, Linux 2.0.36) will show different results, and you might have some fun trying to explain why: holmes% rabbit -s 1 --e 121 --user 1,0 --os 0,1 sleep 150 ------------------------ performance counters ------------------------ Host processor: holmes Command executed: sleep 150 Options: --duration 0,0 --user 1,0 --os 0,1 Options: --mesi 0xf,0xf --bus_agent 1,1 --compare 0,0 --invert 0,0 Options: --MMX 0x3f,0x3f Options: --Enable 1,1 --PC 1,1 --APIC 0,0 Sampling: rate = 1 sample per second, 151 taken Event Events Events/sec ---------------------------------- ---------------- ---------------- 0x79 121 cpu_clk_unhalted 1095160 7282.76 0x79 121 cpu_clk_unhalted 88178872 586384.85 resource usage: time = 0.00 sec user, 0.01 sec sys, 150.38 sec real, 0.01% of cpu page reclaims, faults = 10, 73 The second case is a user program with no system calls: h% rabbit -s 1 --e 121 --user 1,0 --os 0,1 foo < output from foo omitted > ------------------------ performance counters ------------------------ Host processor: h Command executed: foo command exited with non-zero status 33 Options: --duration 0,0 --user 1,0 --os 0,1 Options: --mesi 0xf,0xf --bus_agent 1,1 --compare 0,0 --invert 0,0 Options: --Enable 1,1 --PC 1,1 --APIC 0,0 Sampling: rate = 1 sample per second, 151 taken Event Events Events/sec ---------------------------------- ---------------- ---------------- 0x79 121 cpu_clk_unhalted 30006478362 199621756.24 0x79 121 cpu_clk_unhalted 56859145 378262.40 resource usage: time = 150.76 sec user, 0.05 sec sys, 150.32 sec real, 100.33% of cpu page reclaims, faults = 7, 50 A simple check of the data, 7815.64 + 199992202.93 = 200000018.57, and 199621756.24 + 378262.40 = 200000018.64, shows that the operating system took about 378262.40 / 200000018.64 = 0.19% of the cycles otherwise available to the user program. You can expect these numbers to vary somewhat, depending on system configuration and transitory conditions.

Performance-Monitoring Counters Library, for Intel/AMD Processors and Linux
Author: Don Heller, dheller@scl.ameslab.gov
Last revised: 30 October 2000