Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add high-precision, low-distortion, but expensive timer, as build time option. #31872

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

markshannon
Copy link
Member

@markshannon markshannon commented Mar 14, 2022

Work in progress: The timer thread should not be a Python thread.

This works as follows:

  • Spins up a thread to update a global "now" variable from the system monotonic clock every ~10μs.
  • On entry to each recorded function, grab "now" from global variable and push it to timer stack
  • On exit from recorded function, grab "now" from global variable and increment named time by the time delta.

Calling the system monotonic clock for each event is too expensive and distorts the results.

Some care is needed to get this to work properly. Specifically, the machine needs to have spare cores, so that timer thread runs on a different CPU core from the timed program. The timed program needs to be single-threaded.
When benchmarking, we expect both of these to be true.

Provided that those two requirements are met, this approach has low-overhead (a few memory accesses and sometimes a cache miss to fetch "now") and high precision ("now" is updated every ~10μs).

Example output:

$ rm /tmp/py_timings/* ; ./python -c "pass" ; python3 Tools/scripts/sum_timers.py

dealloc: 468.0 μs 5.9%
eval: 2.2 ms 27.4%
free: 152.0 μs 1.9%
gc: 1.2 ms 15.1%
import_: 65.0 μs 0.8%
main: 1.4 ms 17.6%
malloc: 632.0 μs 7.9%
native: 1.9 ms 23.5%

$ rm /tmp/py_timings/* ; time ./python -m test ; python3 Tools/scripts/sum_timers.py

...
dealloc: 14.4 secs 0.7%
eval: 489.3 secs 24.1%
free: 5.8 secs 0.3%
gc: 395.1 secs 19.4%
import_: 838.1 ms 0.0%
main: 3.1 secs 0.2%
malloc: 8.7 secs 0.4%
native: 1115.0 secs 54.9%

@@ -56,6 +56,7 @@ PyAPI_FUNC(int) _Py_IsFinalizing(void);
/* Random */
PyAPI_FUNC(int) _PyOS_URandom(void *buffer, Py_ssize_t size);
PyAPI_FUNC(int) _PyOS_URandomNonblock(void *buffer, Py_ssize_t size);
extern int _PyOS_URandomNonblockNoRaise(void *buffer, Py_ssize_t size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move the declaration of this private function to Include/internal/pycore_pylifecycle.h.

@@ -1477,6 +1477,20 @@ AS_VAR_IF([enable_pystats], [yes], [
AC_DEFINE([Py_STATS], [1], [Define if you want to enable internal statistics gathering.])
])

# Check for --enable-pytimers
AC_MSG_CHECKING([for --enable-pytimers])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document the new configuration option in https://docs.python.org/dev/using/configure.html doc.

int now = _Py_atomic_load(&(the_timer.now));
*the_timer.top = now - *the_timer.top;
the_timer.top++;
*the_timer.top = now;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add an assertion to detect overflow? (out of times bounds)

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#ifdef Py_TIMERS sounds very generic. Would it be to find a longer name to avoid name conflict, or make it private (rename to _Py_TIMERS)? Examples:

  • Py_RECORD_CEVAL_TIMING
  • Py_CEVAL_TIMERS
  • Py_INSTRUMENT_CALLS
  • etc.

I'm mostly worried about adding a name in the "public" namespace, if it's private, i don't care :-)


#ifdef Py_TIMERS

typedef struct timer_records {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might help to document each timer, maybe just mention which function is related. Like:

     // pymain_main()
     uint64_t main;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants