This approach eliminates the originally reported race. It also gets rid of the deadlock reported in gh-96071, so we can remove the workaround added then.
This is mostly a cherry-pick of 1c0a104 (AKA gh-126989). The difference is we add PyInterpreterState.threads_preallocated at the end of PyInterpreterState, instead of adding PyInterpreterState.threads.preallocated. That avoids ABI disruption.
gh-120182 added new global state (interp_count), but didn't add thread-safety for it. This change eliminates the possible race.
(cherry picked from commit 2c66318cdc, AKA gh-120529)
Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>