Disables rerouting of blocked alarm signals while inside an app handler. This is not bulletproof: I have it considering the handler as exited if we see sigreturn or sigprocmask, but there is no requirement that the app use either siglongjmp or sigreturn to exit a handler and to not use sigprocmask inside the handler. But, if we get it wrong, it will fall back to not rerouting alarms at all, which we can live with as those are typically not used as required messages and routed to a single thread.
The rerouting of alarms was seen to cause problems on large apps where normally they can accumulate due to DR overhead causing a new one to arrive while still in the handler for theprior one. Rerouting sent them to new threads, rather than marking pending and dropping if several accumulated.
Adds an on-by-default option -reroute_alarm_signals that can be used to disable routing of all alarm signals.
Adds a test to linux.sigmask which fails without this change.
The test was not easy to set up and it revealed two bugs in the rerouting, which are fixed here:
- Don't re-route synchronous fault signals.
- Properly re-route receive-now signals.
Adds a new rstat counting rerouted signals.
Issue: #2311
Activity
Forgot to include in the description yet another issue I had to fix to get this test to work:
Partially fixes #5017 (closed) by delivering an alarm signal to the app when there is no itimer for anyone.
And to add #5017 (closed) to the Issue: line.
added Google-Affecting Google-Verified labels
requested review from @abhinav92003
mentioned in issue #5484
467 467 */ 468 468 kernel_sigset_t app_sigblocked; 469 469 mutex_t sigblocked_lock; 470 /* This is a not-guaranteed-accurate indicator of whether we're inside an 471 * app signal handler. We can't know for sure when a handler ends if the 472 * app exits with a longjmp instead of siglongjmp. 473 */ - Last updated by Derek Bruening
127 pthread_t init_thread = (pthread_t)arg; 128 unblocked_thread = pthread_self(); 129 intercept_signal(SIGALRM, alarm_handler, false); 130 131 /* Get init thread inside its handler. */ 132 pthread_kill(init_thread, SIGALRM); 133 wait_cond_var(child_ready); 134 reset_cond_var(child_ready); 135 136 print("init thread now inside handler: setting up itimer\n"); 137 struct itimerval t; 138 t.it_interval.tv_sec = 0; 139 t.it_interval.tv_usec = 10000; 140 t.it_value.tv_sec = 0; 141 t.it_value.tv_usec = 10000; 142 int res = setitimer(ITIMER_REAL, &t, NULL); The kernel would pick an arbitrary thread to send this signal. When running under DR, it could be the
unblocked_thread
too right? But we need it to be delivered to theinit_thread
to test this scenario. So, to ensure we do test this scenario are we just sending enough signals and hoping that one will be delivered to theinit_thread
?We indeed have no guarantees, but in my experiments all day yesterday I could never get a signal to be sent to any thread except the initial thread for any process-wide signal, whether from an itimer or sent via SYS_kill (even when sent from a different thread): I tried quite hard! POSIX lets the kernel send it anywhere but it seems quite biased in implementation. I have a comment about this and how I had to invert the test from how I first wrote it. I don't know that we can make a test with a higher % chance of testing what we want on future kernels -- it took a long time just to get this test doing what I wanted.
2434 2449 thread_sig_info_t *info = (thread_sig_info_t *)dcontext->signal_field; 2435 2450 int i; 2436 2451 kernel_sigset_t safe_set; 2452 2453 /* We assume any change to the mask ends a handler: e.g., sigprocmask. */ 2454 info->in_app_handler = false; 4872 4894 frame->pretcode); /* pretcode has same offs for plain */ 4873 4895 #endif 4874 4896 4897 if (receive_now && reroute) { 4898 /* We can't delay, but we can't have interrupted any DR locks, so we 4899 * can call this now. 4900 */ 4901 handled = reroute_to_unmasked_thread(dcontext, frame, sig); 4902 if (handled) 4903 receive_now = false; 4904 else 4905 blocked = true; 145 213 if (pthread_join(thread, &retval) != 0) 146 214 perror("failed to join thread"); 147 215 216 /* Test alarm signal rerouting. Since process-wide signals are overwhelimingly 217 * delivered to the initial thread (I can't get them to go anywhere else), we 218 * need this thread to be the one sitting in a SIGALRM handler while we test whether 219 * signals are rerouted from there. We create a thread to put us in the handler 220 * and drive the test. 221 */ 222 if (pthread_create(&thread, NULL, test_alarm_signals, (void *)pthread_self()) != 0) { 223 perror("failed to create thread"); 224 exit(1); 225 } 226 sigemptyset(&set); 227 while (!should_exit) { 228 sigsuspend(&set); In my tests with the reroute suppression from this PR disabled, this init thread would enter the handler many times b/c of the rerouted signals so the loop was needed for that. It is also generally how you should always use things like sigsuspend as they can be interrupted by other things (e.g., ptrace attach!) so it seems good programming practice to always loop (maybe testing for EINTR though is generally what would happen).