Performance-Monitoring Counters Library, for Intel/AMD Processors and Linux
This example introduces
pmc_control_t -- complete description of a measurement experiment
pmc_control_null -- pmc_control_t initializer
pmc_control_print() -- print a pmc_control_t structure
hardware control fields -- processor-specific details
Previous example -- pmc_lib.h
Download this example
Next example -- pmc_getargs()
Return to Main Menu
Compile with
gcc -o menu1 -O `pmc_options` menu1.c -lpmc
Try this example:
menu1
#include <pmc_lib.h>
int main(int argc, char * argv[])
{
pmc_control_t Ctl = pmc_control_null;
pmc_control_print(stdout, "initialized by pmc_control_null", &Ctl);
exit(0);
}
Synopsis
typedef struct { ... } pmc_control_t;
extern const pmc_control_t pmc_control_null;
void pmc_control_print
(FILE * outfile, const char * const description,
const pmc_control_t * const ctl);
The control information for the library is encapsulated in an object of type
pmc_control_t, which is defined in pmc_lib.h. The components of this type
do not normally need to be manipulated directly. The actual definition of
pmc_control_t is system-dependent, as we are using hardware features that
differ from one processor to another, and operating system features that may
differ from one Linux release to another. The pmc_control_t object points to
some arrays whose sizes are not determined until runtime, but the size of the
object itself does not change.
pmc_control_null is provided for clean initialization at the start of the
program.
It is allowed to have more than one pmc_control_t object declared, but this
is discouraged. Ctl can be local to main(), as in this example, or global;
it will always be passed to the library routines by address, and should not
be deallocated. If Ctl is global, it cannot be initialized cleanly at
compile-time (there are linkage problems associated with some character
strings: "initializer element is not constant"). Ctl can be modified
directly, or indirectly from the command line by pmc_getargs();
the latter is preferred.
pmc_control_print() may be useful for debugging; it is used here and in the
next example only to demonstrate the components of pmc_control_t. Always use
pmc_getargs() to set *ctl, or initialize *ctl with pmc_control_null, before
using pmc_control_print(). If ctl points to a pmc_control_t structure that
has not been initialized properly, or if certain data structures internal to
the library have not been initialized by pmc_getargs(), then pmc_control_print()
will print garbage, or dereference an invalid pointer, or both, or worse.
What is being controlled? The Intel processors since the Pentium (and also
the AMD Athlon, MIPS R10000, IBM Power2, Sun UltraSPARC, HP PA-8000, and
Compaq/DEC Alpha, in their own ways) have three or more counters that can
provide useful information to the programmer and system designer. The first
is a 64-bit cycle counter that is the purest measure of elapsed time. The
second and third are two 40-bit event counters whose control is selected
from many different events, occurring in user or system mode, and so on.
The specifics are processor-dependent. The Athlon has four 48-bit event
counters.
The library is designed so the decisions about which events to measure, and
which subsidiary controls to exert, can easily be delayed until runtime.
For the two event counters, there are two independent sets of selection and
control bits. The Current Privilege Level (CPL) indicates whether the
processor is executing user code or system code (0 = OS kernel; 1, 2 = OS
services; 3 = applications), but the separation used by the Pentium and
Pentium Pro for the counters is different. Some events can only be selected
for a particular counter. The processors with MMX and SSE have more events
defined.
Intel Pentium and Pentium Processor with MMX Technology
one 32-bit control register for two counters
event select field, 6 bits
use pmc_getargs() for the list of events and their codes
user mode flag, 1 bit
enable or disable counting for CPL = 3
operating system mode flag, 1 bit
enable or disable counting for CPL = 0,1,2
counter control, 1 bit
count events or clocks (duration)
pin control, 1 bit
the external pin PM0 or PM1 changes when the counter increments
or when the counter overflows (usually not relevant)
Intel Pentium Pro/II/III, AMD Athlon
two 32-bit control registers, one for each counter (four on the Athlon)
event select field, 8 bits
use pmc_getargs() for the list of events and their codes
unit mask field, 8 bits
a secondary qualifier for some events
user mode flag, 1 bit
enable or disable counting for CPL = 1,2,3
operating system mode flag, 1 bit
enable or disable counting for CPL = 0
counter control, 1 bit
count events or clocks (duration)
pin control, 1 bit
the external pin PM0 or PM1 changes when the counter increments
or when the counter overflows (usually not relevant)
APIC interrupt enable flag, 1 bit
generate local APIC interrupt on overflow, enable or disable
(APIC = advanced programmable interrupt controller)
enable counters flag, 1 bit
enable or disable both counters
counter mask field, 8 bits
if zero, increment counter by the number of events in this cycle
if nonzero, increment the counter only if this many events occur
in this cycle (but see invert flag)
invert flag, 1 bit
if the counter mask field is nonzero, either
increment the counter by 1 if >= mask events occurred in this cycle
increment counter by 1 if < mask events occurred in this cycle
The rabbit examples employing --compare and --user show how these controls
can be used for program analysis.
There are some combinations of controls that are ineffective, and there are
some events that are not counted exactly right by the processor. The Intel
and AMD documentation is available through their web sites; see
pmc_processor.c for some references, or a summary for the Pentium Pro/II/III.
On a 450-MHz Pentium II, this example yields
pmc_control_t (initialized by pmc_control_null) =
event[] = 0, 0
duration[] = 0, 0
user[] = 1, 1
os[] = 1, 1
pc[] = 0, 0
mesi[] = 0xf, 0xf
bus[] = 1, 1
mmx[] = 0x3f, 0x3f
compare[] = 0, 0
invert[] = 0, 0
apic[] = 0, 0
enable[] = 1, 1
label = ''
mhz = 450
sample_rate = 100
flush_rate = 0
rotate_rate = 1
clean = 0
stats = 0
raw = 0
trim = 0
file_input = 0
file_output = 0
input_file = ''
output_directory = ''
event_pairs = 0
replication = 1
The component arrays events[], labels[], counters[] are omitted if there
is nothing to print; the next example will introduce them. There are
event_pairs events and labels, and event_pairs * replication counters.
Each of the components shown here can be modified from the command line
by pmc_getargs().
The following components are used by pmc_counter_init() to construct the
selector, clean and stats fields of a pmc_counter_t:
event event selection
duration counter control
user user mode
os operating system mode
pc pin control
mesi unit mask for L2 cache
bus unit mask for external bus logic
mmx unit mask for MMX instructions
compare counter mask
invert counter mask, invert flag
apic local APIC enable
enable global enable
clean measurement overhead removal
stats level of statistical detail retained
Forward References
pmc_getargs()
pmc_counter_t [more details]
pmc_counter_init()
Performance-Monitoring Counters Library, for Intel/AMD Processors and Linux
Author: Don Heller, dheller@scl.ameslab.gov
Last revised: 2 August 2000