Skip to main content
The Locations API turns messy, real-world location strings — from job postings, ATS systems, and user profiles — into structured, geocoded data. This page covers:
  • How resolution works — the high-level pipeline: classification → cache → LLM extraction → geocoding → validation.
  • Supported input formats — what the endpoint accepts, with examples for compound and ambiguous inputs.
  • Geocode a location — the GET /api/locations/geocode endpoint with a live playground.

How resolution works

Pipeline overview

The Geocode endpoint runs each input through a small, fixed sequence of steps. Classification can short-circuit early; otherwise the pipeline uses LLM extraction to pull structured location parts out of the input, geocodes each part to canonical coordinates, validates the result, and caches it for future lookups. The method field on every response tells you which branch produced the result:
method valueMeaning
cacheStep 3 hit — served from the cache.
llmSteps 4–7 — extracted, geocoded, validated, and cached for next time.
remote_keywordStep 1 — input matched a known remote sentinel (e.g. "Remote", "Worldwide").
no_matchStep 4 returned no parts, or step 6 rejected every candidate.
errorStep 1 detected a placeholder / non-location (e.g. "TBD", "Various").

Steps

1

Classify (remote / placeholder)

The trimmed input is checked against two regex sets before anything else runs.Remote sentinels (returned immediately as remote_keyword, with country: "Remote" and no coordinates): remote, anywhere, global, worldwide, work from home, wfh, telecommute, distributed, optionally followed by /, |, or , plus one of traveling, travel, anywhere, remote, virtual, or a trailing parenthetical.Placeholders (returned as an error, succeeded: false, no geocoding attempted):
  • N Locations (e.g. "2 Locations", "15 locations")
  • multiple/various/several/many/all locations
  • Single-word placeholders: hq, headquarters, head office, office, location, flexible, hybrid, onsite, on-site, various, n/a, tbd, tba
2

Normalize

Real inputs are normalized into a stable cache key — abbreviations are expanded, diacritics folded, and casing standardized. Inputs that differ only in case, punctuation, or common abbreviations share a cache entry.
3

Cache lookup

The normalized key is looked up in the cache. On hit, the previously resolved locations are returned immediately with method: "cache".Only successful resolutions are cached. Failed lookups are not written so the same input can succeed later as data quality improves.
4

LLM extraction

On cache miss, the raw input is passed to a language model that handles translation, abbreviation expansion, multi-location splitting (e.g. "US or Canada" → two parts), and noise removal (e.g. "Store 04494 - Lynchburg, VA"Lynchburg / Virginia / United States). It returns zero or more structured {city, region, country} parts.If the model returns no usable parts, the response is no_match.
5

Geocode each part

Each extracted part is sent to a structured geocoder, which returns canonical city / region / country / country code labels and latitude/longitude coordinates. The geocoder’s labels are authoritative — the saved location uses its casing and spelling, not the LLM’s. Parts the geocoder cannot resolve are dropped.
6

Validation

Each geocoded result is validated against the original input by comparing token overlap between the input string and the canonical labels. This catches:
  • Mis-geocodes (e.g. Springfield, IL resolving to Springfield, MA)
  • LLM confabulations (inventing a country the input never mentioned)
The match score (0.0–1.0) is returned in each location’s fuzzy_confidence field. Results below the threshold are rejected. If every result is rejected, the response is no_match.
7

Cache the result

Accepted results are saved and the cache is updated under the normalized key, so the next request for the same input is served instantly. The response is returned with method: "llm".

Response method field

The method field tells you which path produced the result:
MethodWhen
cacheCache hit on the normalized input key
remote_keywordInput matched a remote/distributed sentinel; no geocoding performed
llmResolved via LLM extraction → geocoding → validation → cached
no_matchLLM returned no parts, geocoder returned no coordinates for any part, or validation rejected all candidates
Placeholders ("2 Locations", "Multiple Locations", single-word "office", etc.) and empty inputs return errors without a method.
Most repeated lookups hit the cache and return in single-digit milliseconds. Cache misses go through the full pipeline and typically complete in a few hundred milliseconds.

Supported input formats

What the endpoint accepts

The Geocode endpoint is designed for messy, real-world location strings from job postings, ATS systems, and user profiles. Because extraction is LLM-driven, there is no fixed “format grammar” — the model expands abbreviations, translates non-English names, splits compound expressions, and strips noise before each part is geocoded to canonical labels. The table below shows formats that have been observed working well in practice. It is not exhaustive.
FormatExamples
City, State (US)San Francisco, CA · New York, NY · Austin, Texas · Portland, OR · Denver, Colorado · Raleigh, NC
City, CountryLondon, UK · Tokyo, Japan · Berlin, DE · Mumbai, India · Dubai, UAE · São Paulo, Brazil
City, Province (CA)Toronto, ON · Vancouver, BC · Montreal, QC · Calgary, AB · Ottawa, Ontario · Halifax, NS
City, State (AU)Sydney, NSW · Melbourne, VIC · Brisbane, QLD · Perth, WA
City, Constituent Country (UK)London, England · Edinburgh, Scotland · Cardiff, Wales · Belfast, Northern Ireland
City, Bundesland, GermanyMunich, Bavaria, Germany · Stuttgart, Baden-Württemberg
City, State, IndiaPune, Maharashtra, India · Bengaluru, Karnataka
Country codes (ISO alpha-2)US · GB · DE · FR · JP · IN · AU · CA · BR · SG
Country codes (ISO alpha-3)DEU · GBR · ITA · SWE · AUS · NZL · JPN · CHN
Country-first formatsGermany, Berlin · ITA, Rome · India, Mumbai
Full country namesUnited States · United Kingdom · Germany · Japan · South Korea
Country name variantsDeutschland · España · Schweiz · 日本 · भारत · Czechia · Türkiye
International / diacriticsMünchen, Deutschland · São Paulo, Brazil · Zürich, Switzerland · Köln, DE · Malmö, Sweden
City aliases & abbreviationsNYC · SF · LA · DC · CDMX · Bay Area · Silicon Valley · DFW · RTP
City transliterationsBombay (→ Mumbai) · Peking (→ Beijing) · Calcutta (→ Kolkata) · Madras (→ Chennai)
Well-known cities (standalone)London · Tokyo · Dubai · Singapore · Amsterdam · Dublin · Tel Aviv
Compound locationsUS or Canada · New York and San Francisco · London & Paris · EMEA or APAC
Remote keywordsRemote · Anywhere · Global · Worldwide · Work from Home · WFH · Telecommute · Distributed
Region aliasesEMEA · APAC · LATAM · DACH · BENELUX · NORDICS · ANZ · GCC · MENA
Ambiguous codesCA (California or Canada) · IN (Indiana or India) · DE (Delaware or Germany) · GA (Georgia state or country)

Compound locations

The endpoint handles compound expressions with or, and, &, ;, and | separators. Splitting is performed during LLM extraction, which returns one {city, region, country} part per location. Each part is then independently geocoded, and the resulting array is returned in the locations field of a single response.
curl "https://connect.jobo.world/api/locations/geocode?location=US+or+Canada" \
  -H "X-Api-Key: YOUR_API_KEY"
Returns multiple locations:
{
  "input": "US or Canada",
  "succeeded": true,
  "locations": [
    {
      "city": null,
      "region": null,
      "country": "United States",
      "country_code": "US",
      "latitude": 39.8283,
      "longitude": -98.5795,
      "display_name": "United States",
      "fuzzy_confidence": 1.0
    },
    {
      "city": null,
      "region": null,
      "country": "Canada",
      "country_code": "CA",
      "latitude": 56.1304,
      "longitude": -106.3468,
      "display_name": "Canada",
      "fuzzy_confidence": 1.0
    }
  ],
  "method": "llm"
}
Examples of compound splitting:
InputResolved parts
US or CanadaUS, Canada
New York and San FranciscoNew York, San Francisco
London & ParisLondon, Paris
EMEA or APACEMEA, APAC
Remote or San Francisco, CARemote, San Francisco, CA

Ambiguous codes

Some 2-letter codes match both a US state abbreviation and an ISO country code. Disambiguation is delegated to the LLM, which uses any surrounding city or context tokens in the input to decide.
CodeUS StateCountry
CACaliforniaCanada
INIndianaIndia
DEDelawareGermany
GAGeorgia (state)Georgia (country)
COColoradoColombia
ALAlabamaAlbania
MEMaineMontenegro
NL(Newfoundland & Labrador)Netherlands
SKSaskatchewanSlovakia
Heuristics the resolver tends to apply:
  1. Standalone two-letter code is typically resolved to the more common US state interpretation. CA alone → California.
  2. Code paired with a city is disambiguated by city context. Los Angeles, CA → California; Toronto, CA → Canada; Amsterdam, NL → Netherlands; Tel Aviv, IL → Israel.
  3. Three-letter codes are unambiguous ISO alpha-3 country codes. DEU → Germany; GBR → United Kingdom.
The validation step compares the canonical labels back against the original input string. If the resolver settles on Canada for an input that contained California, validation will typically reject it.
For best results with ambiguous codes, include a city name or use the full state/country name. For example, prefer "California" or "Canada" over a bare "CA" when precision matters.

Tips for best results

Be specific

Include both city and state/country when possible. "San Francisco, CA" is better than just "CA".

Use full names for ambiguous codes

Use "California" or "Canada" instead of "CA". Use "Germany" instead of "DE".

Standard separators for compounds

Use "or", "and", "&", ";", or "|" between locations. "US or Canada" works; "US, Canada" may be treated as a single location.

Don't worry about case or diacritics

The pipeline normalizes case and handles diacritics. "münchen", "MÜNCHEN", and "Munchen" all resolve to Munich, Germany.

City aliases are supported

Common abbreviations like NYC, SF, LA, DC, and transliterations like Bombay, Peking are handled automatically.

Noise is stripped automatically

Internal references and qualifiers like "Store 04494 - Lynchburg, VA" or "Greater San Francisco Area" are normalized down to the underlying city.

Error handling

The endpoint returns HTTP 200 with succeeded: false for inputs that were classified as placeholders or that the pipeline could not resolve. Missing or whitespace-only location query parameters return HTTP 400 with a ProblemDetails body.

Common scenarios

Missing or empty location query parameter (HTTP 400):
{
  "title": "Invalid location",
  "detail": "The 'location' query parameter is required.",
  "status": 400
}
Placeholder strings (HTTP 200, succeeded: false): Inputs the classifier recognizes as placeholders never reach the geocoder. The full placeholder set is:
  • Count placeholders: "2 Locations", "15 locations", …
  • Quantifier placeholders: "Multiple Locations", "Various Locations", "Several Locations", "Many Locations", "All Locations"
  • Single-word placeholders: hq, headquarters, head office, office, location, flexible, hybrid, onsite, on-site, various, n/a, tbd, tba
{
  "input": "2 Locations",
  "succeeded": false,
  "error": "Location string is a placeholder, not a real location."
}
Unresolvable location — extraction returned no usable parts, the geocoder returned no coordinates, or validation rejected every candidate (HTTP 200, succeeded: false):
{
  "input": "xyzzy-unknown-place-12345",
  "succeeded": false,
  "method": "no_match",
  "error": "Could not resolve the provided location."
}
Service errors (HTTP 503 if the request is cancelled, HTTP 500 for unexpected failures) return a ProblemDetails body with title, detail, and status fields.

Endpoints

Full request/response reference and a live “Try it” playground live on the dedicated page below.

Geocode a location

GET /api/locations/geocode — resolve a free-text location string to canonical city/region/country and coordinates.