-
Notifications
You must be signed in to change notification settings - Fork 43
Open
Description
The original plan was to lift the regexes directly, but I’d forgotten that Standard Ebooks is a GPL3 codebase, and here is MIT. Obviously we can’t copy everything directly over, so the new plan is that I’ll copy over my original contributions, and anything that anyone else agrees should be contributed.
At Standard Ebooks we use python-titlecase to format a bunch of stuff throughout our productions (thanks!) but we also have some additional rules and changes to meet our specific needs. These start at [redacted]; the comments as a list give you a good overview:
- Uppercase Roman numerals, but only if they are valid Roman numerals and they are not
MIX(which is much more likely to be an English word than a Roman numeral) orDIwhich may be an Italian word - Lowercase
and,oreven if preceded by punctuation - pip_titlecase capitalizes all prepositions preceded by parenthesis; we only want to capitalize ones that aren't the first word of a subtitle OK: From Sergeant Bulmer (of the Detective Police) to Mr. Pendril OK: Three Men in a Boat (To Say Nothing of the Dog)
- Uppercase words preceded by en or em dash
- Lowercase
and, if it's not the very first word, and not preceded by an em-dash - Lowercase
the, if preceded by a dash (likePuss-in-BootsorJack-in-the-Box) - Lowercase "in", if followed by a semicolon (but not words like "inheritance")
- Lowercase
th’, sometimes used poetically - Lowercase
o’ - Uppercase words that begin compound words, like
to-night(which might appear in poetry) - Lowercase
from,with, as long as they're not the first word and not preceded by a parenthesis Capitalise the first word after an opening quote or italicisation that signifies a workthis relies on SE specific markup- Lowercase
theif preceded byvs. - Lowercase
de,von,van,le,duas inCharles de Gaulle,Werner von Braun, etc., and if not the first word and not preceded by an “ - Uppercase word following
Or,, since it is probably a subtitle - Uppercase word following
:, exceptor,, which indicates a kind of subtitle - Uppercase words after an initial contraction, like
O'KeefeorL'Affaire. But only if there's at least 3 letters after, to prevent catching things likeI'morE're - Uppercase letter after
Mc - Uppercase first letter after beginning contraction
- Uppercase first letter
- Lowercase
by - Lowercase leading
d’, as inMarie d’Elle - Uppercase
l’as inl’Affaire, but not if it's a the first letter - Uppercase leading
A-as inA-Breaking - Uppercase some known initialisms
- Lowercase
À(as inÀ La Carte) unless it's the first word - Uppercase initialisms
- Uppercase No. as in Number
- Lowercase V. as in versus in a legal case
- Lowercase
mm(millimeters, as in50 mm gun) unless it's followed by a period in which case it's likelyMm.(Monsieurs) - Lowercase
al-(as in the Arabic definite article) unless it’s the first word - …and some special cases
Would any of these be things that python-titlecase are interested in? I’d be happy to upstream them as PRs.
minchinweb
Metadata
Metadata
Assignees
Labels
No labels