Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 100 additions & 0 deletions aip/general/0143/aip.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# Standardized codes

Many common concepts, such as spoken languages, countries, currency, and so on,
have common codes (usually formalized by the [International Organization for
Standardization][iso]) that are used in data communication and processing.
These codes address the issue that there are often different ways to express
the same concept in written language (for example, "United States" and "USA",
or "Español" and "Spanish").

## Guidance

For concepts where a standardized code exists and is in common use, fields
representing these concepts **should** use the standardized code for both input
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • @Alfus The important part here is that the string itself is the immutable, canonical code, and is used on the wire format. You do not need to create a separate enum, and if you do, then you fall out of synchronization quickly.
    • @hudlow What this does bring to mind is that we validate that enums are lower snake strings. But this guidance correctly says, do not bastardize canonical codes into lower snake.
    • @lukesneeringer It sounds like the IBM linter needs an exception here, allowing industry standard formats.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hudlow Perhaps a specific note about not bastardizing strings into a company's string enum format.

and output.

```typescript
// A message representing a book.
interface Book {
// Other fields...

// The IETF BCP-47 language code representing the language in which
// the book was originally written.
// https://en.wikipedia.org/wiki/IETF_language_tag
languageCode: string;
}
```

- Fields representing standardized concepts **must** use the appropriate data
type for the standard code (usually `string`).
- Fields representing standardized concepts **must** indicate which standard
they follow, preferably with a link (either to the standard itself, the
Wikipedia description, or something similar).
- The field name **should** end in `_code` or `_type` unless the concept has an
obviously clearer suffix.
- When accepting values provided by users, validation **should** be
case-insensitive unless this would introduce ambiguity (for example, accept
both `en-gb` and `en-GB`). When providing values to users, APIs **should**
use the canonical case (in the example above, `en-GB`).
- Standardized code fields **may** have a default; if they do, the sentinel
value **must** be the omission of the field.

**Note:** The string itself _is_ the immutable, canonical code, used on the
wire format. Services **should not** create a separate enum with a different
wire format, and **should not** use company-specific strings, because doing so
makes it difficult to use multiple APIs together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a quick section about defaults, saying that defaults are permissible, and the sentinel value must be empty.

### Content types

Fields representing a content or media type **must** use [IANA media types][].
For legacy reasons, the field **should** be called `mime_type`.

### Countries and regions

Fields representing individual countries or nations **must** use the [Unicode
CLDR region codes][cldr] ([list][]), such as `US` or `CH`, and the field
**must** be called `region_code`.

**Important:** We use `region_code` and not `country_code` to include regions
distinct from any country, and avoid political disputes over whether or not
some regions are countries.

### Currency

Fields representing currency **must** use [ISO-4217 currency codes][iso-4217],
such as `USD` or `CHF`, and the field **must** be called `currency_code`.

**Note:** For representing an amount of money in a particular currency, rather
than the currency code itself, use [`google.protobuf.Money`][money].

### Language

Fields representing spoken languages **must** use [IETF BCP-47 language
codes][bcp-47] ([list][]), such as `en-US` or `de-CH`, and the field **must**
be called `language_code`.

### Time zones

Fields representing a time zone **should** use the [IANA TZ][] codes, and the
field **must** be called `time_zone`.

Fields also **may** represent a UTC offset rather than a time zone (note that
these are subtly different). In this case, the field **must** use the [ISO-8601
format][] to represent this, and the field **must** be named `utc_offset`.

## Changelog

- **2020-05-12**: Replaced `country_code` guidance with `region_code`,
correcting an original error.

<!-- prettier-ignore-start -->
[bcp-47]: https://en.wikipedia.org/wiki/IETF_language_tag
[cldr]: http://cldr.unicode.org/
[iana media types]: https://www.iana.org/assignments/media-types/media-types.xhtml
[iana tz]: http://www.iana.org/time-zones
[iso]: https://www.iso.org/
[iso-4217]: https://en.wikipedia.org/wiki/ISO_4217
[iso-8601 format]: https://en.wikipedia.org/wiki/ISO_8601#Time_offsets_from_UTC
[list]: https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
[money]: https://github.com/googleapis/api-common-protos/blob/master/google/type/money.proto
<!-- prettier-ignore-end -->
7 changes: 7 additions & 0 deletions aip/general/0143/aip.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
id: 143
state: approved
created: 2019-07-24
placement:
category: fields
order: 40