The methods are already wrapped with a lock, which makes them thread-safe in
free-threaded build. This replaces `PyThread_acquire_lock` with `PyMutex` and
removes some macros and allocation handling code.
Also add a test for free-threading to ensure we aren't getting data races and
that the locking is working.