Performance-Monitoring Counters Library, for Intel/AMD Processors and Linux
This example introduces
   pmc_control_t            -- complete description of a measurement experiment
   pmc_control_null         -- pmc_control_t initializer
   pmc_control_print()      -- print a pmc_control_t structure
   hardware control fields  -- processor-specific details

Previous example       -- pmc_lib.h
Download this example
Next example           -- pmc_getargs()
Return to Main Menu


Compile with gcc -o menu1 -O `pmc_options` menu1.c -lpmc Try this example: menu1
#include <pmc_lib.h> int main(int argc, char * argv[]) { pmc_control_t Ctl = pmc_control_null; pmc_control_print(stdout, "initialized by pmc_control_null", &Ctl); exit(0); }
Synopsis typedef struct { ... } pmc_control_t; extern const pmc_control_t pmc_control_null; void pmc_control_print (FILE * outfile, const char * const description, const pmc_control_t * const ctl);
The control information for the library is encapsulated in an object of type pmc_control_t, which is defined in pmc_lib.h. The components of this type do not normally need to be manipulated directly. The actual definition of pmc_control_t is system-dependent, as we are using hardware features that differ from one processor to another, and operating system features that may differ from one Linux release to another. The pmc_control_t object points to some arrays whose sizes are not determined until runtime, but the size of the object itself does not change. pmc_control_null is provided for clean initialization at the start of the program. It is allowed to have more than one pmc_control_t object declared, but this is discouraged. Ctl can be local to main(), as in this example, or global; it will always be passed to the library routines by address, and should not be deallocated. If Ctl is global, it cannot be initialized cleanly at compile-time (there are linkage problems associated with some character strings: "initializer element is not constant"). Ctl can be modified directly, or indirectly from the command line by pmc_getargs(); the latter is preferred. pmc_control_print() may be useful for debugging; it is used here and in the next example only to demonstrate the components of pmc_control_t. Always use pmc_getargs() to set *ctl, or initialize *ctl with pmc_control_null, before using pmc_control_print(). If ctl points to a pmc_control_t structure that has not been initialized properly, or if certain data structures internal to the library have not been initialized by pmc_getargs(), then pmc_control_print() will print garbage, or dereference an invalid pointer, or both, or worse.
What is being controlled? The Intel processors since the Pentium (and also the AMD Athlon, MIPS R10000, IBM Power2, Sun UltraSPARC, HP PA-8000, and Compaq/DEC Alpha, in their own ways) have three or more counters that can provide useful information to the programmer and system designer. The first is a 64-bit cycle counter that is the purest measure of elapsed time. The second and third are two 40-bit event counters whose control is selected from many different events, occurring in user or system mode, and so on. The specifics are processor-dependent. The Athlon has four 48-bit event counters. The library is designed so the decisions about which events to measure, and which subsidiary controls to exert, can easily be delayed until runtime. For the two event counters, there are two independent sets of selection and control bits. The Current Privilege Level (CPL) indicates whether the processor is executing user code or system code (0 = OS kernel; 1, 2 = OS services; 3 = applications), but the separation used by the Pentium and Pentium Pro for the counters is different. Some events can only be selected for a particular counter. The processors with MMX and SSE have more events defined. Intel Pentium and Pentium Processor with MMX Technology one 32-bit control register for two counters event select field, 6 bits use pmc_getargs() for the list of events and their codes user mode flag, 1 bit enable or disable counting for CPL = 3 operating system mode flag, 1 bit enable or disable counting for CPL = 0,1,2 counter control, 1 bit count events or clocks (duration) pin control, 1 bit the external pin PM0 or PM1 changes when the counter increments or when the counter overflows (usually not relevant) Intel Pentium Pro/II/III, AMD Athlon two 32-bit control registers, one for each counter (four on the Athlon) event select field, 8 bits use pmc_getargs() for the list of events and their codes unit mask field, 8 bits a secondary qualifier for some events user mode flag, 1 bit enable or disable counting for CPL = 1,2,3 operating system mode flag, 1 bit enable or disable counting for CPL = 0 counter control, 1 bit count events or clocks (duration) pin control, 1 bit the external pin PM0 or PM1 changes when the counter increments or when the counter overflows (usually not relevant) APIC interrupt enable flag, 1 bit generate local APIC interrupt on overflow, enable or disable (APIC = advanced programmable interrupt controller) enable counters flag, 1 bit enable or disable both counters counter mask field, 8 bits if zero, increment counter by the number of events in this cycle if nonzero, increment the counter only if this many events occur in this cycle (but see invert flag) invert flag, 1 bit if the counter mask field is nonzero, either increment the counter by 1 if >= mask events occurred in this cycle increment counter by 1 if < mask events occurred in this cycle The rabbit examples employing --compare and --user show how these controls can be used for program analysis. There are some combinations of controls that are ineffective, and there are some events that are not counted exactly right by the processor. The Intel and AMD documentation is available through their web sites; see pmc_processor.c for some references, or a summary for the Pentium Pro/II/III.
On a 450-MHz Pentium II, this example yields pmc_control_t (initialized by pmc_control_null) = event[] = 0, 0 duration[] = 0, 0 user[] = 1, 1 os[] = 1, 1 pc[] = 0, 0 mesi[] = 0xf, 0xf bus[] = 1, 1 mmx[] = 0x3f, 0x3f compare[] = 0, 0 invert[] = 0, 0 apic[] = 0, 0 enable[] = 1, 1 label = '' mhz = 450 sample_rate = 100 flush_rate = 0 rotate_rate = 1 clean = 0 stats = 0 raw = 0 trim = 0 file_input = 0 file_output = 0 input_file = '' output_directory = '' event_pairs = 0 replication = 1 The component arrays events[], labels[], counters[] are omitted if there is nothing to print; the next example will introduce them. There are event_pairs events and labels, and event_pairs * replication counters. Each of the components shown here can be modified from the command line by pmc_getargs(). The following components are used by pmc_counter_init() to construct the selector, clean and stats fields of a pmc_counter_t: event event selection duration counter control user user mode os operating system mode pc pin control mesi unit mask for L2 cache bus unit mask for external bus logic mmx unit mask for MMX instructions compare counter mask invert counter mask, invert flag apic local APIC enable enable global enable clean measurement overhead removal stats level of statistical detail retained
Forward References pmc_getargs() pmc_counter_t [more details] pmc_counter_init()

Performance-Monitoring Counters Library, for Intel/AMD Processors and Linux
Author: Don Heller, dheller@scl.ameslab.gov
Last revised: 2 August 2000