gh-82045: Correct and deduplicate "isprintable" docs; add test. (GH-130118)

We had the definition of what makes a character "printable" documented in three places, giving two different definitions.
The definition in the comment on `_PyUnicode_IsPrintable` was inverted; correct that.

With that correction, the two definitions turn out to be equivalent -- but to confirm that, you have to go look up, or happen to know, that those are the only five "Other" categories and only three "Separator" categories in the Unicode character database.  That makes it hard for the reader to tell whether they really are the same, or if there's some subtle difference in the intended semantics.

Fix that by cutting the C API docs' and the C comment's copies of the subtle details, in favor of referring to the Python-level docs. That ensures it's explicit that these are all meant to agree, and also lets us concentrate improvements to the wording in one place.

Speaking of which, borrow some ideas from the C comment, along with other tweaks, to hopefully add a bit more clarity to that one newly-centralized copy in the docs.

Also add a thorough test that the implementation agrees with this definition.

Author:    Greg Price <gnprice@gmail.com>

Co-authored-by: Greg Price <gnprice@gmail.com>
This commit is contained in:
Stan Ulbrych 2025-02-14 17:16:47 +00:00 committed by GitHub
parent 6666b38c28
commit 3402e133ef
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
6 changed files with 34 additions and 34 deletions

View file

@ -142,18 +142,10 @@ int _PyUnicode_IsNumeric(Py_UCS4 ch)
return (ctype->flags & NUMERIC_MASK) != 0;
}
/* Returns 1 for Unicode characters to be hex-escaped when repr()ed,
0 otherwise.
All characters except those characters defined in the Unicode character
database as following categories are considered printable.
* Cc (Other, Control)
* Cf (Other, Format)
* Cs (Other, Surrogate)
* Co (Other, Private Use)
* Cn (Other, Not Assigned)
* Zl Separator, Line ('\u2028', LINE SEPARATOR)
* Zp Separator, Paragraph ('\u2029', PARAGRAPH SEPARATOR)
* Zs (Separator, Space) other than ASCII space('\x20').
/* Returns 1 for Unicode characters that repr() may use in its output,
and 0 for characters to be hex-escaped.
See documentation of `str.isprintable` for details.
*/
int _PyUnicode_IsPrintable(Py_UCS4 ch)
{