Skip to content

Add Turkic patronymic detection to patronymic_name_order#198

Merged
derek73 merged 12 commits into
masterfrom
feat/turkic-patronymics
Jul 3, 2026
Merged

Add Turkic patronymic detection to patronymic_name_order#198
derek73 merged 12 commits into
masterfrom
feat/turkic-patronymics

Conversation

@derek73

@derek73 derek73 commented Jul 3, 2026

Copy link
Copy Markdown
Owner

Summary

  • Extends the opt-in patronymic_name_order flag to also detect and rotate reversed, no-comma Azerbaijani/Central-Asian Turkic formal-order names (Surname GivenName PatronymicRoot Marker, e.g. "Aliyev Vusal Said oglu" → first=Vusal, middle="Said oglu", last=Aliyev), alongside the existing East-Slavic rotation (closes Support Turkic patronymics (oglu/qizi/uly) in patronymic_name_order #185)
  • Renames is_patronymic()is_east_slavic_patronymic() and the patronymic/patronymic_cyrillic regex keys to east_slavic_patronymic/east_slavic_patronymic_cyrillic, now that a second, structurally different patronymic family exists (pure rename, single call site, zero behavior change)
  • Fixes a pre-existing bug found during review: east_slavic_patronymic_cyrillic was missing re.I, so capitalized irregular-form patronymics like "Ильич" failed to match and rotate, while the Latin equivalent ("Ilyich") worked fine

Test plan

  • Full suite passes: uv run pytest tests/ -q (1148 passed, 22 xfailed)
  • uv run mypy nameparser/ clean
  • uv run ruff check nameparser/ tests/ clean
  • uv run sphinx-build -b html docs docs/_build -q -W clean
  • Manually verified all three documented cases (natural order, comma order, reversed order) rotate/parse correctly
  • Manually verified the Cyrillic capitalization fix: HumanName("Иванов Иван Ильич", constants=Constants(patronymic_name_order=True)) now rotates correctly

🤖 Generated with Claude Code

derek73 added 9 commits July 2, 2026 03:31
Makes room for a second, structurally different patronymic family
(Turkic). Pure rename, no behavior change — single call site, not
referenced by name in docs.
Extends patronymic_name_order to handle the Azerbaijani/Central-Asian
4-token shape (Surname GivenName PatronymicRoot Marker), e.g.
'Aliyev Vusal Said oglu'. Standalone marker words, not suffixes, so
detection is whole-word matched against a strict 4-token guard (#185).
…cstring

The Constants class-attribute docstring (the canonical Sphinx-linked
API doc, referenced via :py:obj: from docs/customize.rst) only
described the East-Slavic behavior, drifting from the customize.rst
prose already updated for Turkic support. Adds the missing sentence
and a second doctest example (#185).
Unlike its Latin sibling and the new turkic_patronymic_marker_cyrillic
pattern, this pattern had no re.I flag. The irregular-form alternatives
(ильич, кузьмич, лукич, фомич, фокич) are short enough that the
capitalized first letter falls within the matched suffix itself, so
capitalized real-world patronymics like "Ильич" failed to match and
HumanName("Иванов Иван Ильич", constants=Constants(patronymic_name_order=True))
did not rotate, while the equivalent Latin-script name did.

Also strengthens test_no_regex_collision_latin/_cyrillic with positive
sanity assertions confirming each word list actually matches its own
family's regex, so the non-collision assertions are non-vacuous.
@derek73 derek73 self-assigned this Jul 3, 2026
@derek73 derek73 added this to the v1.3.0 milestone Jul 3, 2026
derek73 added 3 commits July 3, 2026 02:35
…guard asymmetry

Addresses PR review feedback on #198:
- Add a regex-level test confirming is_turkic_patronymic_marker() only
  matches whole marker words, not substrings (e.g. "ogluu", "Bogluchik"),
  guarding against a future accidental .match()->.search() swap.
- Soften docs/customize.rst's "mirroring the strictness of the
  East-Slavic guard" claim, which overstated parity: the Turkic guard
  has no middle-token disambiguation check analogous to East-Slavic's,
  since marker words are a small closed set unlikely to coincide with
  an ordinary given name.
…atronymic_name_order

Consistency fix: this handler only ever implemented East-Slavic
rotation logic (sibling to handle_turkic_patronymic_name_order()), so
the generic name was misleading now that a family-specific sibling
exists — the same asymmetry that is_patronymic -> is_east_slavic_patronymic
already fixed for the helper method. The patronymic_name_order flag
itself stays generic, since it's the public umbrella opt-in switch
covering both families by design.

Pure rename, zero behavior change — single call site, not referenced
by name in shipped docs.
…mic_order.py

Consistency with the handle_east_slavic_patronymic_name_order() and
is_east_slavic_patronymic() renames, and mirrors the naming of the
sibling test_turkic_patronymic_order.py. Pure file rename, no content
changes — not referenced by path anywhere outside gitignored planning docs.
@derek73 derek73 merged commit 03f2125 into master Jul 3, 2026
8 checks passed
@derek73 derek73 deleted the feat/turkic-patronymics branch July 3, 2026 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Turkic patronymics (oglu/qizi/uly) in patronymic_name_order

1 participant