Transliteration - Topograph

Some of the registers Topograph integrates return company data in non-Latin scripts: Bulgarian and Ukrainian in Cyrillic, Greek in Greek, Chinese (mainland and Hong Kong) in Simplified Chinese. There are two distinct features for working with that data in Latin form. They are independent and can be used together.

The two features at a glance

	`transliterate=true` query flag	`companyNameTransliterations` field
What it does	Rewrites every non-Latin user-facing string in the response into a single Latin form	Lists the canonical Latin spellings of the company’s legal and commercial names
Scope	Whole response: names, addresses, descriptions, person names, status labels	Company names only
Output	Single string per field, replaces the original	`string[]` alongside the original Cyrillic
Default	Off. Opt in by setting `transliterate=true`	Always populated when data exists. No flag required
Search impact	None. Response-time only	Indexed in search, so Latin queries match Cyrillic-only records
Standards	One generic romanization across all scripts	Per-country standards. For Ukrainian: generic folding, KMU 55, ISO 9, deduped
When to use	”I don’t read Cyrillic, give me the whole payload in Latin"	"Show me the canonical Latin spellings I can display or screen against”
Coverage today	All scripts (Cyrillic, Greek, Hangul, Hebrew, Arabic, Thai, CJK)	Ukrainian. Other non-Latin countries follow in later releases

The two features are aligned: element [0] of companyNameTransliterations is the same string the response flag would put on legalName. Customers using both stay consistent. The rest of this page documents the response flag. For the per-country name field, see the country page (e.g. Ukraine).

Response flag: `transliterate=true`

Transliteration is opt-in. Pass transliterate=true as a query parameter on /v2/search or /v2/company, and the response comes back with user-facing string fields rewritten in Latin characters.

What gets transformed

The transform touches fields that carry user-facing text in the source language:

Company names: legalName, commercialNames, legacyLegalNames, legacyCommercialNames
Address components: addressLine1, addressLine2, city, state, region, careOf, poBox
Person names: title, firstName, middleName, lastName, fullName, suffix
Establishment, document, and graph node names: name
Activity, control, and relationship descriptions: description, activityDescription
Legal form, role, and status local labels: localName
Status explanations: additionalInfo

Fields that are already in Latin form, or that represent identifiers rather than text, are never transformed. That includes legalNameInEnglish, englishTranslation, standardized, iso20275Code, iso5009Code, countryCode, postalCode, id, dates, URLs, phone numbers, and any numeric or enumerated value. Strings that are already in Latin script pass through untouched regardless of which field they live on.

What does not change

The flag is a presentation toggle, not a data mutation:

Cached content on the Topograph side stays in the native script. Repeated calls with and without the flag hit the same cache.
Billing is unchanged. Transliteration is free.
Non-Latin fields are replaced with their romanized equivalents in the response. If you need the native-script value as well, issue a second call without the flag.

How routing works

The API picks a romanization strategy per string based on two signals: the Unicode script of the string and the country code of the record.

Script detected	Library used	Standard
Cyrillic	`transliteration` (unidecode)	BGN/PCGN-compatible
Greek	`transliteration` (unidecode)	ISO 843
Hangul	`transliteration` (unidecode)	Revised Romanization
Hebrew, Arabic, Thai	`transliteration` (unidecode)	ALA-LC compatible
CJK with hiragana or katakana	`kuroshiro` + `kuromoji`	Hepburn
CJK with kanji only, country = JP	`kuroshiro` + `kuromoji`	Hepburn
CJK with kanji only, other country	`pinyin-pro`	Hanyu Pinyin, toneless
Latin or ASCII	no-op	.

Mixed-script strings are handled automatically. A string like "Nokia 中国" on a Chinese record becomes "Nokia Zhong Guo": the Latin run passes through untouched and the Chinese run is routed to pinyin.

Known limitations

Transliteration is a mechanical transform. It is not the same as translation, and it is not the same as the company’s own branding in Latin characters.

Japanese proper nouns are imperfect. Kuroshiro reads kanji through a morphological dictionary. For unusual personal and company names, the reading the dictionary picks may differ from the reading the company actually uses. If accuracy on a specific Japanese entity matters, cross-check against the source.
Chinese is character-level, not word-level. Output is spaced per syllable (Bei Jing Xiao Mi), not joined (Beijing Xiaomi). Word-level joining would require a separate segmentation pass and we do not do it today.
Source-provided Latin names are ignored. Some Hong Kong and Chinese records carry an official English name from the source register. The flag always produces a mechanical romanization of the native string, not that official English name. If you want the official English name, read legalNameInEnglish from the unflagged response: that field is never transliterated.
Streaming search is not transliterated. /v2/search?stream=true returns Server-Sent Events straight from the search pipeline. The flag is ignored in streaming mode. Use the standard JSON mode when you need transliteration.
Determinism across versions. The output of pinyin-pro and kuroshiro can change between library or dictionary versions. Library versions are pinned in the API image, so the output is stable within a release.

Examples

Search a Bulgarian company and get Latin results:

curl -H "x-api-key: $TOPOGRAPH_API_KEY" \
  "https://api.topograph.co/v2/search?country=BG&query=%D0%92%D0%B0%D0%B7%D1%80%D0%B0%D0%B6%D0%B4%D0%B0%D0%BD%D0%B5&transliterate=true"

Retrieve a Chinese company profile in Latin:

curl -H "x-api-key: $TOPOGRAPH_API_KEY" \
  -X POST "https://api.topograph.co/v2/company?transliterate=true" \
  -H "Content-Type: application/json" \
  -d '{"countryCode":"CN","id":"91310000MA1FL10P8Q","dataPoints":["company"]}'

Fetch a previously created request in Latin:

curl -H "x-api-key: $TOPOGRAPH_API_KEY" \
  "https://api.topograph.co/v2/company/253299d1-e8d0-4268-945b-f175f98bc114?transliterate=true"

When to use this

Reach for transliterate=true when:

Your downstream system (CRM, KYC engine, search index, spreadsheet export) only indexes Latin text.
You want to display company and officer names in a Western UI without glyph fallbacks.
You are matching names across jurisdictions and need a consistent Latin form on both sides.

Do not reach for it when:

You need the company’s own brand in Latin. Use legalNameInEnglish instead when the register provides it.
You need to preserve the native-script value alongside the Latin form. Issue a second call without the flag, or store the unflagged response and call again with the flag only when rendering.

Documentation Index

​The two features at a glance

​Response flag: transliterate=true

​What gets transformed

​What does not change

​How routing works

​Known limitations

​Examples

​When to use this