SAP SuccessFactors Jobs API.

Extract enterprise-scale job listings from SAP's Career Site Builder tenants — a POST search API returns structured metadata while server-rendered detail pages carry the full descriptions.

Get API access

Data fields

Structured Job Metadata
Full HTML Job Descriptions
Business Unit & Department
Geo Coordinates & Locations
Posting & Expiry Dates
Multi-Locale Support

Use cases

01Enterprise Job Aggregation
02Global Hiring Trend Analysis
03Recruiting Data Pipelines
04Job Board Syndication

Trusted by

W. L. Gore & Associates

Fetch the homepage for the CSRF token and session cookie

SuccessFactors Career Site Builder is served from two host families: the global *.jobs.hr.cloud.sap and the China *.jobs.hr.sapcloud.cn. The homepage GET seeds the JSESSIONID cookie and embeds the CSRF token the recruiting API requires. (The legacy career*.successfactors.com RCM portal is JavaScript-only and not scrapable.)

import requests
import re

# Global:  https://{tenant}.jobs.hr.cloud.sap
# China:   https://{tenant}.jobs.hr.sapcloud.cn
base_url = "https://wlgore.jobs.hr.cloud.sap"

session = requests.Session()
session.headers.update({"User-Agent": "Mozilla/5.0 (compatible; JoboBot/1.0)"})

# The homepage GET sets JSESSIONID on the cookie jar and embeds the CSRF token.
resp = session.get(base_url)
resp.raise_for_status()
html = resp.text

# The token lives in one of three markers; try them in order.
csrf_token = None
for pattern in (
    r'<meta\s+name="csrf-token"\s+content="([^"]+)"',
    r'data-csrf-token="([^"]+)"',
    r'x-csrf-token:\s*([a-f0-9-]+)',
):
    m = re.search(pattern, html, re.IGNORECASE)
    if m:
        csrf_token = m.group(1)
        break

print("CSRF token:", csrf_token)

Discover tenant-specific category IDs

Each tenant partitions its board into categories with unique numeric IDs. Parse the homepage nav links (/go/{name}/{id}); if none are present, fall back to any embedded "categoryId": N JSON. When a tenant exposes no categories at all, categoryId 0 returns the whole board.

# Prefer /go/{name}/{id} nav links, then fall back to embedded categoryId JSON.
category_ids = set()
for m in re.finditer(r'/go/[^/"]+/(\d+)/?', html):
    category_ids.add(int(m.group(1)))

if not category_ids:
    for m in re.finditer(r'"categoryId"\s*:\s*(\d+)', html):
        category_ids.add(int(m.group(1)))

# categoryId 0 is the "all jobs" fallback when no categories are advertised.
category_ids = sorted(category_ids) or [0]
print("Categories:", category_ids)

Call the recruiting search API

POST to /services/recruiting/v1/jobs with the CSRF token header and the fixed search body. The response carries jobSearchResult plus a totalJobs count you will use to page — but no job descriptions.

def fetch_page(session, base_url, csrf_token, category_id, page_number):
    api_url = f"{base_url.rstrip('/')}/services/recruiting/v1/jobs"
    headers = {"Content-Type": "application/json"}
    if csrf_token:
        headers["x-csrf-token"] = csrf_token
    payload = {
        "locale": "en_US",
        "pageNumber": page_number,
        "sortBy": "",
        "keywords": "",
        "location": "",
        "facetFilters": {},
        "brand": "",
        "skills": [],
        "categoryId": category_id,
        "alertId": "",
        "rcmCandidateId": "",
    }
    # JSESSIONID rides along automatically from the session cookie jar.
    r = session.post(api_url, headers=headers, json=payload)
    r.raise_for_status()
    return r.json()

data = fetch_page(session, base_url, csrf_token, category_ids[0], 0)
results = data.get("jobSearchResult", [])
total_jobs = data.get("totalJobs", 0)
print(f"{len(results)} jobs on page 0 of {total_jobs} total")

Parse job metadata from the API response

Each entry nests its fields under a response object. Map title, location, business unit, department (filter2), sub-category (filter8), currency and the US-formatted posting date, and remember to strip the <br/> tags out of the short location string.

def clean_location(job):
    short = job.get("jobLocationShort") or []
    if short and short[0]:
        return re.sub(r"<br\s*/?>", "", short[0]).strip()
    state = (job.get("jobLocationState") or [None])[0]
    country = (job.get("jobLocationCountry") or [None])[0]
    return ", ".join(p for p in (state, country) if p) or None

parsed = []
for item in results:
    job = item.get("response") or {}
    if not job.get("id"):
        continue  # drop rows without a stable native id
    parsed.append({
        "id": job["id"],
        "title": (job.get("unifiedStandardTitle") or job.get("unifiedUrlTitle") or "").strip(),
        "location": clean_location(job),
        "business_unit": ", ".join(job.get("businessUnit_obj") or []),
        "department": ", ".join(job.get("filter2") or []),
        "subcategory": ", ".join(job.get("filter8") or []),
        "currency": ", ".join(job.get("currency") or []),
        "posted_date": job.get("unifiedStandardStart"),  # US M/d/yy
        "locales": job.get("supportedLocales") or [],
        "url_title": job.get("unifiedUrlTitle") or job.get("urlTitle") or "job",
    })

print(f"Parsed {len(parsed)} listings")

Fetch full descriptions from HTML detail pages

The search API omits descriptions, so build each detail URL as /job/{urlTitle}/{id}-{locale} and fetch it. The body lives in div.job-description, falling back to the broader div.job container; labelled <h2> sections carry req id, work arrangement and posting dates.

from bs4 import BeautifulSoup

def fetch_details(session, base_url, job):
    locale = job["locales"][0] if job["locales"] else "en_US"
    detail_url = f"{base_url.rstrip('/')}/job/{job['url_title']}/{job['id']}-{locale}"
    r = session.get(detail_url)
    r.raise_for_status()
    soup = BeautifulSoup(r.text, "html.parser")

    h1 = soup.find("h1")
    title = h1.get_text(strip=True) if h1 else job["title"]

    # Descriptions live in .job-description, or the broader .job container.
    desc_el = soup.select_one("div.job-description") or soup.select_one("div.job")
    description = desc_el.get_text(" ", strip=True) if desc_el else None

    # Labelled <h2> sections expose structured fields.
    fields = {}
    for h2 in soup.find_all("h2"):
        label = h2.get_text(strip=True).rstrip(":").lower()
        value = " ".join(
            sib.get_text(" ", strip=True)
            for sib in h2.find_next_siblings()
            if sib.name != "h2"
        ).strip()
        if label in ("req id", "requisition id", "job id"):
            fields["req_id"] = value
        elif label in ("work arrangement", "workplace type", "work type"):
            fields["work_arrangement"] = value
        elif label in ("posting start date", "posted date", "date posted"):
            fields["posted_date"] = value

    return {**job, "title": title, "description": description,
            "detail_url": detail_url, **fields}

detailed = fetch_details(session, base_url, parsed[0])
print(f"Description: {len(detailed.get('description') or '')} chars")

Page through every category to completion

The API returns 50 jobs per page. Iterate each discovered category, incrementing pageNumber until the fetched count reaches totalJobs (with a defensive stop on any empty page), and pace requests to match the scraper's 200ms delay.

import time

JOBS_PER_PAGE = 50

def fetch_all(session, base_url, csrf_token, category_ids):
    all_jobs = []
    for category_id in category_ids:
        page = 0
        while True:
            data = fetch_page(session, base_url, csrf_token, category_id, page)
            results = data.get("jobSearchResult", [])
            if not results:
                break  # defensive: stop this category on an empty page
            all_jobs.extend(results)
            total = data.get("totalJobs", 0)
            fetched = page * JOBS_PER_PAGE + len(results)
            print(f"cat {category_id} page {page}: +{len(results)} ({fetched}/{total})")
            if total and fetched >= total:
                break
            page += 1
            time.sleep(0.2)  # match the scraper's 200ms inter-request delay
    return all_jobs

jobs = fetch_all(session, base_url, csrf_token, category_ids)
print(f"Total jobs fetched: {len(jobs)}")

Common issues

highCSRF token not found in the homepage

The token appears in one of three markers: a <meta name="csrf-token"> tag, a data-csrf-token attribute, or an inline x-csrf-token: header line. Try all three in order. Without it the POST search API typically returns 403, so re-fetch the homepage and re-extract before retrying.

highSearch API returns metadata but no descriptions

The /services/recruiting/v1/jobs response never contains the job body. Fetch each /job/{urlTitle}/{id}-{locale} HTML page and parse div.job-description. Plan for N+1 requests where N is the job count, and cap detail concurrency to stay polite.

mediumCategory IDs are unknown or tenant-specific

Every tenant has its own numeric category IDs. Discover them from /go/{name}/{id} nav links, fall back to embedded "categoryId": N JSON, and when neither exists use categoryId 0 to pull the entire board.

highSession expires or requests get blocked mid-scrape

JSESSIONID and the CSRF token go stale, and the API answers 401 (auth) or 403 (blocked) once they do. Refresh proactively by re-fetching the homepage and re-extracting both, and back off on 429 rate-limit responses.

mediumLegacy RCM portal URLs do not work

Only Career Site Builder hosts (*.jobs.hr.cloud.sap and *.jobs.hr.sapcloud.cn) are scrapable. The older career*.successfactors.com and performancemanager*.successfactors.com portals are JavaScript-only with signed XHRs and no JSON API — treat those tenants as unsupported rather than scraping the query-string company id.

mediumDescription container varies between tenants

Some tenants wrap the body in div.job-description; others nest content directly inside div.job with no dedicated wrapper. Select div.job-description first and fall back to div.job, extracting the full inner content rather than a single non-greedy match.

lowPosting dates parse incorrectly

unifiedStandardStart is a US-style M/d/yy or M/d/yyyy string (e.g. 1/20/26). Parse it explicitly as month/day/year instead of relying on the ambient locale, and expand two-digit years to 20xx.

Best practices

1Fetch the homepage first to seed the JSESSIONID cookie and CSRF token before any API call
2Discover categoryId values from /go/{name}/{id} nav links; fall back to categoryId 0 for the full board
3Page with pageNumber and stop once the fetched count reaches totalJobs (50 rows per page)
4Pull descriptions from the HTML detail pages — the search API returns metadata only
5Refresh the session and CSRF token whenever the API returns 401 or 403
6Stick to *.jobs.hr.cloud.sap and *.jobs.hr.sapcloud.cn hosts; skip legacy career*.successfactors.com portals

Or skip the complexity

One endpoint. All SAP SuccessFactors jobs. No scraping, no sessions, no maintenance.

Get API access

curl "https://connect.jobo.world/api/jobs?sources=sap successfactors" \
  -H "X-Api-Key: YOUR_KEY"

Ready to integrate

Access SAP SuccessFactors
job data today.

One API call. Structured data. No scraping infrastructure to build or maintain — start with the free tier and scale as you grow.

Get API access View all platforms

99.9%API uptime

<200msAvg response

50M+Jobs processed

SAP SuccessFactors Jobs API.

Try the API.

What's in every response.

How to scrape SAP SuccessFactors.

Fetch the homepage for the CSRF token and session cookie

Discover tenant-specific category IDs

Call the recruiting search API

Parse job metadata from the API response

Fetch full descriptions from HTML detail pages

Page through every category to completion

Access SAP SuccessFactors
job data today.

SAP SuccessFactors Jobs API.

Fetch the homepage for the CSRF token and session cookie

Discover tenant-specific category IDs

Call the recruiting search API

Parse job metadata from the API response

Fetch full descriptions from HTML detail pages

Page through every category to completion

Related ATS Platforms

Access SAP SuccessFactorsjob data today.

Access SAP SuccessFactors
job data today.