Commit Graph

8 Commits

Author SHA1 Message Date
Alex Vandiver 1f9373585a string_validation: Use enumerate rather than `range(len(...))`. 2022-01-11 15:59:50 -08:00
Alex Vandiver ce09c8b65f string_validation: Make `unicode_non_chars` a set, for efficiency. 2022-01-11 15:59:38 -08:00
Alex Vandiver df50280c54 string_validation: Loosen to allow some `Cn` unicode characters.
Under the unicodedata distributed with Python 3.6, some Emoji are
classified as `Cn`, and not `So`:

```
$ unicode 1f929 --long
U+1F929 GRINNING FACE WITH STAR EYES
UTF-8: f0 9f a4 a9 UTF-16BE: d83edd29 Decimal: 🤩 Octal: \0374451
🤩
Category: So (Symbol, Other); East Asian width: W (wide)
Unicode block: 1F900..1F9FF; Supplemental Symbols and Pictographs
Bidi: ON (Other Neutrals)

$ python3.6 -c 'import unicodedata; print(unicodedata.category("\U0001f929"))'
Cn

$ python3.7 -c 'import unicodedata; print(unicodedata.category("\U0001f929"))'
So
```

Drop `Cn` from the list of excluded Unicode character classes, and
replace it with an explicit list of the 66 non-characters, which are
invariant.

Co-authored-by: Shlok Patel <shlokcpatel2001@gmail.com>
2022-01-11 15:17:53 -08:00
Alex Vandiver 4f482c234c string_validation: Standardize missing topic with missing stream name.
Co-authored-by: Shlok Patel <shlokcpatel2001@gmail.com>
2022-01-11 15:17:53 -08:00
Alex Vandiver 58c8eebda2 string_validation: Make check_stream_topic merely check, not alter.
Co-authored-by: Shlok Patel <shlokcpatel2001@gmail.com>
2022-01-11 15:17:53 -08:00
Alex Vandiver 1cdb93f6aa string_validation: Factor out topic validation.
Co-authored-by: Shlok Patel <shlokcpatel2001@gmail.com>
2022-01-11 15:17:53 -08:00
Alex Vandiver 94dbb540b1 string_validation: Give a more specific message for empty stream names.
Co-authored-by: Shlok Patel <shlokcpatel2001@gmail.com>
2022-01-11 15:17:53 -08:00
Alex Vandiver 3574637fbf string_validation: Factor out stream name validation.
Co-authored-by: Shlok Patel <shlokcpatel2001@gmail.com>
2022-01-11 15:17:53 -08:00