> ## Documentation Index
> Fetch the complete documentation index at: https://jobo.world/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Career Sites Feed API

> Stream newly discovered career sites — company listing pages and careers portals — in high-volume batches. Each row is a slim card-sized payload with a link to the full company profile.

The Career Sites Feed surfaces career sites — company listing pages, careers portals, and dedicated jobs subdomains — as Jobo discovers them. Each row carries just enough to render a card: the same lightweight company fields you'd see in a job preview, plus the ATS the site runs on, the listing URL where its jobs are posted, and a `details_url` to fetch the fully enriched company profile on demand.

* **[Stream career sites](/api-reference/career-sites-feed/retrieve-a-bulk-feed-of-newly-discovered-career-sites)** — `POST /api/career-sites/feed`. Cursor-paginated stream of slim career-site rows.

There is no companion "expired" endpoint: unlike job postings, career sites don't expire — they get updated. Use `updated_after` to pick up changes on subsequent runs. When you need the rich company data (funding, leadership, ratings, etc.), follow `details_url` (or call `GET /api/companies/{id}`) for the rows you care about.

<Note>
  The Career Sites Feed is metered and page-capped because each row carries
  enriched company data. See [Billing & Credits](/pricing) for wallet behavior
  and current pricing links.
</Note>

***

## Keeping your directory in sync

A typical integration runs an **initial backfill**, then an **incremental sync** on a schedule. Use the company `id` as your primary key — it's stable across updates, so syncs are simple upserts.

### 1. Initial backfill

Page through the entire feed once with cursor pagination. **Persist `next_cursor` after every successful batch** so a crash resumes mid-stream rather than restarting from page 1. Stop when `has_more` is `false`.

### 2. Incremental sync

Run on a schedule (a few times per day is plenty — new career sites are discovered in modest volumes). Each run:

1. Read your stored `last_run_started_at`. Record `now` as `this_run_started_at` **before** the first request.
2. Call `/feed` with `updated_after = last_run_started_at - 15m` (a small overlap protects against clock skew).
3. Page through with `cursor` until `has_more` is `false`. Upsert each row by `id`.
4. After the loop succeeds, persist `this_run_started_at` as the new `last_run_started_at`.

### 3. Backoff & rate limits

* On `400 Invalid cursor`, drop the cursor and restart pagination — don't loop on the same value.
* Watch `X-Credits-Balance` to alert before you run dry.

### End-to-end example

This sample implements the full workflow against a small key-value `store`. Replace `store` with your real database (Postgres upsert, etc.).

<CodeGroup>
  ```python Python theme={null}
  """Incremental sync: pages newly discovered / updated career sites since the last run.
     Cursor is checkpointed so a crash resumes mid-stream."""
  import json, time, datetime as dt, requests, pathlib

  API_KEY = "YOUR_API_KEY"
  URL = "https://connect.jobo.world/api/career-sites/feed"
  STATE = pathlib.Path("career_sites_sync_state.json")
  OVERLAP = dt.timedelta(minutes=15)

  state = json.loads(STATE.read_text()) if STATE.exists() else {"last_run_at": None, "cursor": None}
  this_run_at = dt.datetime.now(dt.timezone.utc).isoformat()
  store = {}  # ← your DB. key = company["id"]

  def post(body):
      r = requests.post(URL,
                        headers={"X-Api-Key": API_KEY, "Content-Type": "application/json"},
                        json=body)
      r.raise_for_status()
      return r.json()

  updated_after = None
  if state["last_run_at"]:
      updated_after = (dt.datetime.fromisoformat(state["last_run_at"]) - OVERLAP).isoformat()

  cursor = state.get("cursor")
  while True:
      body = {"batch_size": 50}
      if updated_after: body["updated_after"] = updated_after
      if cursor:        body["cursor"] = cursor
      data = post(body)

      for site in data["career_sites"]:
          store[site["id"]] = site  # ← upsert in your DB

      cursor = data["next_cursor"]
      STATE.write_text(json.dumps({**state, "cursor": cursor}))  # checkpoint
      if not data["has_more"]: break

  # Commit the new checkpoint only after the sync succeeded.
  STATE.write_text(json.dumps({"last_run_at": this_run_at, "cursor": None}))
  print(f"Synced. {len(store)} career sites in store.")
  ```

  ```javascript Node.js theme={null}
  import fs from "node:fs";

  const API_KEY = "YOUR_API_KEY";
  const URL = "https://connect.jobo.world/api/career-sites/feed";
  const STATE_FILE = "career_sites_sync_state.json";
  const OVERLAP_MS = 15 * 60 * 1000;

  const state = fs.existsSync(STATE_FILE)
    ? JSON.parse(fs.readFileSync(STATE_FILE, "utf8"))
    : { last_run_at: null, cursor: null };
  const thisRunAt = new Date().toISOString();
  const store = new Map(); // ← your DB, keyed by company.id

  async function post(body) {
    const r = await fetch(URL, {
      method: "POST",
      headers: { "X-Api-Key": API_KEY, "Content-Type": "application/json" },
      body: JSON.stringify(body),
    });
    if (!r.ok) throw new Error(`Feed ${r.status}: ${await r.text()}`);
    return r.json();
  }

  const updatedAfter = state.last_run_at
    ? new Date(new Date(state.last_run_at).getTime() - OVERLAP_MS).toISOString()
    : null;

  let cursor = state.cursor;
  while (true) {
    const body = { batch_size: 50 };
    if (updatedAfter) body.updated_after = updatedAfter;
    if (cursor) body.cursor = cursor;
    const data = await post(body);

    for (const site of data.career_sites) store.set(site.id, site); // ← upsert

    cursor = data.next_cursor;
    fs.writeFileSync(STATE_FILE, JSON.stringify({ ...state, cursor })); // checkpoint
    if (!data.has_more) break;
  }

  fs.writeFileSync(
    STATE_FILE,
    JSON.stringify({ last_run_at: thisRunAt, cursor: null })
  );
  console.log(`Synced. ${store.size} career sites in store.`);
  ```
</CodeGroup>

***

## Filters

All filter fields on the request body are optional. Lists are OR-matched internally; multiple filters AND together.

| Field               | Type        | Description                                                                                                             |
| ------------------- | ----------- | ----------------------------------------------------------------------------------------------------------------------- |
| `industries`        | `string[]`  | Industry tags ("fintech", "biotech", …).                                                                                |
| `company_sizes`     | `string[]`  | Size bands ("1-10", "11-50", "51-200", …).                                                                              |
| `country_codes`     | `string[]`  | ISO 3166-1 alpha-2 country codes.                                                                                       |
| `funding_stages`    | `string[]`  | Funding stage tags ("seed", "series\_a", …).                                                                            |
| `company_types`     | `string[]`  | One of `"for_profit"`, `"non_profit"`.                                                                                  |
| `founded_after`     | `integer`   | Only career sites whose company was founded in or after this calendar year.                                             |
| `is_agency`         | `boolean`   | Restrict to agency (`true`) or non-agency (`false`) companies.                                                          |
| `updated_after`     | `date-time` | Only career sites whose `updated_at` is strictly greater than this UTC timestamp. Use this for incremental sync.        |
| `discovered_after`  | `date-time` | Only companies first discovered on or after this UTC timestamp. Stable across re-enrichment — use for backfill windows. |
| `discovered_before` | `date-time` | Only companies first discovered strictly before this UTC timestamp. Pair with `discovered_after` for bounded backfills. |
| `cursor`            | `string`    | Opaque pagination cursor from the previous response's `next_cursor`.                                                    |
| `batch_size`        | `integer`   | Rows per batch. Default `50`, max `50`.                                                                                 |

***

## CareerSiteDto schema

Each row in `career_sites[]` has this shape:

| Field            | Type            | Description                                                                                                                      |
| ---------------- | --------------- | -------------------------------------------------------------------------------------------------------------------------------- |
| `id`             | `string (uuid)` | Stable Jobo company identifier — use as the primary key when upserting.                                                          |
| `name`           | `string`        | Company display name.                                                                                                            |
| `website`        | `string\|null`  | Company marketing website, when known.                                                                                           |
| `logo_url`       | `string\|null`  | Hosted logo URL.                                                                                                                 |
| `summary`        | `string\|null`  | Short (one-paragraph) company summary.                                                                                           |
| `industries`     | `string[]`      | 1–3 industry tags describing what the company does.                                                                              |
| `categories`     | `string[]`      | Business-model bucket tags (`b2b`, `b2c`, `saas`, `service-provider`).                                                           |
| `linkedin_url`   | `string\|null`  | Company LinkedIn profile URL.                                                                                                    |
| `crunchbase_url` | `string\|null`  | Company Crunchbase profile URL.                                                                                                  |
| `ats`            | `string`        | ATS / job-board provider the career site runs on (e.g. `"greenhouse"`, `"lever"`, `"workday"`).                                  |
| `listing_url`    | `string`        | Career-site URL where the company's jobs are listed.                                                                             |
| `discovered_at`  | `string\|null`  | UTC timestamp the career site was first discovered. Stable across re-enrichment runs.                                            |
| `updated_at`     | `string`        | UTC timestamp the company profile was last updated.                                                                              |
| `details_url`    | `string\|null`  | URL to the public, fully enriched company profile (`GET /api/companies/{id}`). Returns funding, leadership, ratings, press, etc. |

***

## Endpoint

<CardGroup cols={1}>
  <Card title="Stream career sites" icon="rss" href="/api-reference/career-sites-feed/retrieve-a-bulk-feed-of-newly-discovered-career-sites">
    `POST /api/career-sites/feed` — cursor-paginated stream of enriched company profiles for newly discovered career sites.
  </Card>
</CardGroup>
