Resolve named Intel performance events to perf
This library allows to resolve named Intel performance counter events
(for example INST_RETIRED.ANY)
by name and turn them into perf_event_attr attributes. It also
supports listing all events and resolving numeric events back to names.
The standard workflow is the user calling "event_download.py"
to download the current list, and then
these functions can resolve or walk names. Alternatively
a JSON event file from https://download.01.org/perfmon
can be specified through the EVENTMAP= environment variable.
read_events - Read JSON performance counter event list
int
read_events
(char * fn)
Arguments
- fn
- File name to read. NULL to chose default location.
Description
Read the JSON event list fn. The other functions in the library
automatically read the default event list for the current CPU,
but calling this explicitly is useful to chose a specific one.
Return
-1 on failure, otherwise 0.
resolve_event - Resolve named performance counter event
int
resolve_event
(char * name,
struct perf_event_attr * attr)
Arguments
- name
- Name of performance counter event (case in-sensitive)
- attr
- perf_event_attr to initialize with name.
Description
The attr structure is cleared initially.
The user typically has to set up attr->sample_type/read_format
_after_ this call.
Return
-1 on failure, otherwise 0.
walk_events - Walk all the available performance counter events
int
walk_events
(int (*func) (void *data, char *name, char *event, char *desc),
void * data)
Arguments
- func
- Callback to call on each event.
- data
- Abstract data pointer to pass to callback.
Description
The callback gets passed the data argument, the name of the
event, the translated event in perf form (cpu/.../) and a
description of the event.
Return
-1 on failure, otherwise 0.
rmap_event - Map numeric event back to name and description.
int
rmap_event
(unsigned event,
char ** name,
char ** desc)
Arguments
- event
- Event code (umask +
- name
- Put pointer to event name into this. No need to free.
- desc
- Put pointer to description into this. No need to free. Can be NULL.
Description
Offcore matrix events are not fully supported.
Ignores bits other than umask/event for now, so some events using cmask,inv
may be misidentified.
Return
-1 on failure, otherwise 0.
json_events - Read JSON event file from disk and call event callback.
int
json_events
(const char * fn,
int (*func) (void *data, char *name, char *event, char *desc),
void * data)
Arguments
- fn
- File name to read or NULL for default.
- func
- Callback to call for each event
- data
- Abstract pointer to pass to func.
Description
The callback gets the data pointer, the event name, the event
in perf format and a description passed.
Call func with each event in the json file
Return
-1 on failure, otherwise 0.
get_cpu_str - Return string describing the current CPU.
char *
get_cpu_str
( void)
Arguments
- void
- no arguments
Description
Used to store JSON event lists in the cache directory.
format_raw_event - Format a resolved event for perf's command line tool
char *
format_raw_event
(struct perf_event_attr * attr,
char * name)
Arguments
- attr
- Previously resolved perf_event_attr.
- name
- Name to add to the event or NULL.
Return a string of the formatted event. The caller must free string.
A simple perf library to manage the perf ring buffer
This library provides a simple wrapping layer for the perf
mmap ring buffer. This allows to access perf events in
zero-copy from a user program.
perf_iter_init - Initialize iterator for perf ring buffer
void
perf_iter_init
(struct perf_iter * iter,
struct perf_fd * pfd)
Arguments
- iter
- Iterator to initialize.
- pfd
- perf_fd from perf_fd_open to use with the iterator.
Description
Needs to be called first to start walking a perf buffer.
perf_buffer_read - Access data in perf ring iterator.
struct perf_event_header *
perf_buffer_read
(struct perf_iter * iter,
void * buffer,
int bufsize)
Arguments
- iter
- Iterator to copy data from
- buffer
- Temporary buffer to use for wrapped events
- bufsize
- Size of buffer
Description
Return the next available perf_event_header in the ring buffer.
This normally does zero copy, but for wrapped events
they are copied into the temporary buffer supplied and a
pointer into that is returned.
Return
NULL when nothing available, otherwise perf_event_header.
perf_iter_continue - Allow the kernel to log over our data.
void
perf_iter_continue
(struct perf_iter * iter)
Arguments
- iter
- Iterator.
Tell the kernel we are finished with the data and it can
continue logging.
perf_fd_open - Open a perf event with ring buffer for the current thread
int
perf_fd_open
(struct perf_fd * p,
struct perf_event_attr * attr,
int buf_size_shift)
Arguments
- p
- perf_fd to initialize
- attr
- perf event attribute to use
- buf_size_shift
- log2 of buffer size.
Return
-1 on error, otherwise 0.
perf_fd_close - Close perf_fd
void
perf_fd_close
(struct perf_fd * p)
Arguments
- p
- pfd to close.
perf_enable - Start perf collection on pfd
int
perf_enable
(struct perf_fd * p)
Arguments
- p
- perf fd
Return
-1 for error, otherwise 0.
perf_disable - Stop perf collection on pfd
int
perf_disable
(struct perf_fd * p)
Arguments
- p
- perf fd
Return
-1 for error, otherwise 0.
interrupts_init - Initialize interrupt counter per thread
void
interrupts_init
( void)
Arguments
- void
- no arguments
Description
Must be called for each application thread.
interrupts_exit - Free interrupt counter per thread.
void
interrupts_exit
( void)
Arguments
- void
- no arguments
Description
Must be called for each application thread.
get_interrupts - get current interrupt counter.
unsigned long long
get_interrupts
( void)
Arguments
- void
- no arguments
Description
Get the current hardware interrupt count. When the number changed
for a measurement period you had some sort of context switch.
The sample for this period should be discarded.
This returns absolute numbers.
Measuring of predefined counter groups in a process
Higher level interface to measure CPU performance counters in process
context. The program calls the appropiate functions around
code that should be measured in individual thread.
The data is accumulated globally and printed
measure_group_init - Initialize a measurement group
void
measure_group_init
(struct measure * g,
char * name)
Arguments
- g
- measurement group (usually predefined)
- name
- name of measurements or NULL
Description
Initialize a measurement group and allocate the counters.
All measurements with the same name are printed together (so multiple
names can be used to measure different parts of the program)
Exits when the counters cannot be allocated.
Has to be freed in the same thread with measure_group_finish
Only one measurement group per thread can be active at a time.
measure_group_start - Start measuring in a measurement group.
void
measure_group_start
( void)
Arguments
- void
- no arguments
Description
Start a measurement period for the current group in this thread.
Multiple measurement periods are accumulated.
measure_group_stop - Stop measuring a measurement group
void
measure_group_stop
( void)
Arguments
- void
- no arguments
Description
Stop the measurement for the current measurement group.
measure_group_finish - Free the counter resources of a group
void
measure_group_finish
( void)
Arguments
- void
- no arguments
Description
Has to be called in the thread that executed measure_group_init
measure_print_all - Print the accumulated data for all measurement groups
void
measure_print_all
(FILE * fh)
Arguments
- fh
- stdio file descriptor to output data
measure_free_all - Free the accumulated data from past measurements
void
measure_free_all
( void)
Arguments
- void
- no arguments
Ring 3 counting for CPU performance counters
This library allows accessing CPU performance counters from ring 3
using the perf_events subsystem. This is useful to measure specific
parts of programs (e.g. excluding initialization code)
Requires a Linux 3.3+ kernel
rdpmc_open - initialize a simple ring 3 readable performance counter
int
rdpmc_open
(unsigned counter,
struct rdpmc_ctx * ctx)
Arguments
- counter
- Raw event descriptor (UUEE UU unit mask EE event)
- ctx
- Pointer to struct rdpmc_ctx that is initialized
Description
The counter will be set up to count CPU events excluding the kernel.
Must be called for each thread using the counter.
The caller must make sure counter is suitable for the running CPU.
Only works in 3.3+ kernels.
Must be closed with rdpmc_close
rdpmc_open_attr - initialize a raw ring 3 readable performance counter
int
rdpmc_open_attr
(struct perf_event_attr * attr,
struct rdpmc_ctx * ctx,
struct rdpmc_ctx * leader_ctx)
Arguments
- attr
- perf struct perf_event_attr for the counter
- ctx
- Pointer to struct rdpmc_ctx that is initialized.
- leader_ctx
- context of group leader or NULL
Description
This allows more flexible setup with a custom perf_event_attr.
For simple uses rdpmc_open should be used instead.
Must be called for each thread using the counter.
Must be closed with rdpmc_close
rdpmc_close - free a ring 3 readable performance counter
void
rdpmc_close
(struct rdpmc_ctx * ctx)
Arguments
- ctx
- Pointer to rdpmc_ctx context.
Description
Must be called by each thread for each context it initialized.
rdpmc_read - read a ring 3 readable performance counter
unsigned long long
rdpmc_read
(struct rdpmc_ctx * ctx)
Arguments
- ctx
- Pointer to initialized rdpmc_ctx structure.
Description
Read the current value of a running performance counter.