- How resolution works — the high-level pipeline: classification → cache → LLM extraction → geocoding → validation.
- Supported input formats — what the endpoint accepts, with examples for compound and ambiguous inputs.
- Geocode a location — the
GET /api/locations/geocodeendpoint with a live playground.
How resolution works
Pipeline overview
The Geocode endpoint runs each input through a small, fixed sequence of steps. Classification can short-circuit early; otherwise the pipeline uses LLM extraction to pull structured location parts out of the input, geocodes each part to canonical coordinates, validates the result, and caches it for future lookups. Themethod field on every response tells you which branch produced the result:
method value | Meaning |
|---|---|
cache | Step 3 hit — served from the cache. |
llm | Steps 4–7 — extracted, geocoded, validated, and cached for next time. |
remote_keyword | Step 1 — input matched a known remote sentinel (e.g. "Remote", "Worldwide"). |
no_match | Step 4 returned no parts, or step 6 rejected every candidate. |
error | Step 1 detected a placeholder / non-location (e.g. "TBD", "Various"). |
Steps
Classify (remote / placeholder)
The trimmed input is checked against two regex sets before anything
else runs.Remote sentinels (returned immediately as
remote_keyword, with
country: "Remote" and no coordinates):
remote, anywhere, global, worldwide, work from home, wfh,
telecommute, distributed, optionally followed by
/, |, or , plus one of traveling, travel, anywhere,
remote, virtual, or a trailing parenthetical.Placeholders (returned as an error, succeeded: false, no
geocoding attempted):N Locations(e.g."2 Locations","15 locations")multiple/various/several/many/all locations- Single-word placeholders:
hq,headquarters,head office,office,location,flexible,hybrid,onsite,on-site,various,n/a,tbd,tba
Normalize
Real inputs are normalized into a stable cache key — abbreviations
are expanded, diacritics folded, and casing standardized. Inputs
that differ only in case, punctuation, or common abbreviations
share a cache entry.
Cache lookup
The normalized key is looked up in the cache. On hit, the previously
resolved locations are returned immediately with
method: "cache".Only successful resolutions are cached. Failed lookups are not
written so the same input can succeed later as data quality improves.LLM extraction
On cache miss, the raw input is passed to a language model that
handles translation, abbreviation expansion, multi-location splitting
(e.g.
"US or Canada" → two parts), and noise removal
(e.g. "Store 04494 - Lynchburg, VA" →
Lynchburg / Virginia / United States). It returns zero or more
structured {city, region, country} parts.If the model returns no usable parts, the response is no_match.Geocode each part
Each extracted part is sent to a structured geocoder, which returns
canonical city / region / country / country code labels and
latitude/longitude coordinates. The geocoder’s labels are
authoritative — the saved location uses its casing and spelling,
not the LLM’s. Parts the geocoder cannot resolve are dropped.
Validation
Each geocoded result is validated against the original input by
comparing token overlap between the input string and the canonical
labels. This catches:
- Mis-geocodes (e.g.
Springfield, ILresolving to Springfield, MA) - LLM confabulations (inventing a country the input never mentioned)
fuzzy_confidence field. Results below the threshold are rejected.
If every result is rejected, the response is no_match.Response method field
The method field tells you which path produced the result:
| Method | When |
|---|---|
cache | Cache hit on the normalized input key |
remote_keyword | Input matched a remote/distributed sentinel; no geocoding performed |
llm | Resolved via LLM extraction → geocoding → validation → cached |
no_match | LLM returned no parts, geocoder returned no coordinates for any part, or validation rejected all candidates |
"2 Locations", "Multiple Locations", single-word
"office", etc.) and empty inputs return errors without a method.
Most repeated lookups hit the cache and return in single-digit
milliseconds. Cache misses go through the full pipeline and typically
complete in a few hundred milliseconds.
Supported input formats
What the endpoint accepts
The Geocode endpoint is designed for messy, real-world location strings from job postings, ATS systems, and user profiles. Because extraction is LLM-driven, there is no fixed “format grammar” — the model expands abbreviations, translates non-English names, splits compound expressions, and strips noise before each part is geocoded to canonical labels. The table below shows formats that have been observed working well in practice. It is not exhaustive.| Format | Examples |
|---|---|
| City, State (US) | San Francisco, CA · New York, NY · Austin, Texas · Portland, OR · Denver, Colorado · Raleigh, NC |
| City, Country | London, UK · Tokyo, Japan · Berlin, DE · Mumbai, India · Dubai, UAE · São Paulo, Brazil |
| City, Province (CA) | Toronto, ON · Vancouver, BC · Montreal, QC · Calgary, AB · Ottawa, Ontario · Halifax, NS |
| City, State (AU) | Sydney, NSW · Melbourne, VIC · Brisbane, QLD · Perth, WA |
| City, Constituent Country (UK) | London, England · Edinburgh, Scotland · Cardiff, Wales · Belfast, Northern Ireland |
| City, Bundesland, Germany | Munich, Bavaria, Germany · Stuttgart, Baden-Württemberg |
| City, State, India | Pune, Maharashtra, India · Bengaluru, Karnataka |
| Country codes (ISO alpha-2) | US · GB · DE · FR · JP · IN · AU · CA · BR · SG |
| Country codes (ISO alpha-3) | DEU · GBR · ITA · SWE · AUS · NZL · JPN · CHN |
| Country-first formats | Germany, Berlin · ITA, Rome · India, Mumbai |
| Full country names | United States · United Kingdom · Germany · Japan · South Korea |
| Country name variants | Deutschland · España · Schweiz · 日本 · भारत · Czechia · Türkiye |
| International / diacritics | München, Deutschland · São Paulo, Brazil · Zürich, Switzerland · Köln, DE · Malmö, Sweden |
| City aliases & abbreviations | NYC · SF · LA · DC · CDMX · Bay Area · Silicon Valley · DFW · RTP |
| City transliterations | Bombay (→ Mumbai) · Peking (→ Beijing) · Calcutta (→ Kolkata) · Madras (→ Chennai) |
| Well-known cities (standalone) | London · Tokyo · Dubai · Singapore · Amsterdam · Dublin · Tel Aviv |
| Compound locations | US or Canada · New York and San Francisco · London & Paris · EMEA or APAC |
| Remote keywords | Remote · Anywhere · Global · Worldwide · Work from Home · WFH · Telecommute · Distributed |
| Region aliases | EMEA · APAC · LATAM · DACH · BENELUX · NORDICS · ANZ · GCC · MENA |
| Ambiguous codes | CA (California or Canada) · IN (Indiana or India) · DE (Delaware or Germany) · GA (Georgia state or country) |
Compound locations
The endpoint handles compound expressions withor, and, &, ;,
and | separators. Splitting is performed during LLM extraction, which
returns one {city, region, country} part per location. Each part is
then independently geocoded, and the resulting array is returned in the
locations field of a single response.
| Input | Resolved parts |
|---|---|
US or Canada | US, Canada |
New York and San Francisco | New York, San Francisco |
London & Paris | London, Paris |
EMEA or APAC | EMEA, APAC |
Remote or San Francisco, CA | Remote, San Francisco, CA |
Ambiguous codes
Some 2-letter codes match both a US state abbreviation and an ISO country code. Disambiguation is delegated to the LLM, which uses any surrounding city or context tokens in the input to decide.| Code | US State | Country |
|---|---|---|
CA | California | Canada |
IN | Indiana | India |
DE | Delaware | Germany |
GA | Georgia (state) | Georgia (country) |
CO | Colorado | Colombia |
AL | Alabama | Albania |
ME | Maine | Montenegro |
NL | (Newfoundland & Labrador) | Netherlands |
SK | Saskatchewan | Slovakia |
- Standalone two-letter code is typically resolved to the more
common US state interpretation.
CAalone → California. - Code paired with a city is disambiguated by city context.
Los Angeles, CA→ California;Toronto, CA→ Canada;Amsterdam, NL→ Netherlands;Tel Aviv, IL→ Israel. - Three-letter codes are unambiguous ISO alpha-3 country codes.
DEU→ Germany;GBR→ United Kingdom.
Canada for an input
that contained California, validation will typically reject it.
For best results with ambiguous codes, include a city name or use the
full state/country name. For example, prefer
"California" or
"Canada" over a bare "CA" when precision matters.Tips for best results
Be specific
Include both city and state/country when possible.
"San Francisco, CA" is
better than just "CA".Use full names for ambiguous codes
Use
"California" or "Canada" instead of "CA". Use "Germany" instead
of "DE".Standard separators for compounds
Use
"or", "and", "&", ";", or "|" between locations.
"US or Canada" works; "US, Canada" may be treated as a single
location.Don't worry about case or diacritics
The pipeline normalizes case and handles diacritics.
"münchen",
"MÜNCHEN", and "Munchen" all resolve to Munich, Germany.City aliases are supported
Common abbreviations like
NYC, SF, LA, DC, and transliterations
like Bombay, Peking are handled automatically.Noise is stripped automatically
Internal references and qualifiers like
"Store 04494 - Lynchburg, VA"
or "Greater San Francisco Area" are normalized down to the
underlying city.Error handling
The endpoint returns HTTP 200 withsucceeded: false for inputs that
were classified as placeholders or that the pipeline could not resolve.
Missing or whitespace-only location query parameters return HTTP 400
with a ProblemDetails body.
Common scenarios
Missing or emptylocation query parameter (HTTP 400):
succeeded: false):
Inputs the classifier recognizes as placeholders never reach the
geocoder. The full placeholder set is:
- Count placeholders:
"2 Locations","15 locations", … - Quantifier placeholders:
"Multiple Locations","Various Locations","Several Locations","Many Locations","All Locations" - Single-word placeholders:
hq,headquarters,head office,office,location,flexible,hybrid,onsite,on-site,various,n/a,tbd,tba
succeeded: false):
ProblemDetails body with title,
detail, and status fields.
Endpoints
Full request/response reference and a live “Try it” playground live on the dedicated page below.Geocode a location
GET /api/locations/geocode — resolve a free-text location string to canonical city/region/country and coordinates.
