Data Engineer, Platform at Basis

New York, New York, United States

Basis Logo
Not SpecifiedCompensation
Mid-level (3 to 4 years), Senior (5 to 8 years)Experience Level
Full TimeJob Type
UnknownVisa
AI Research, Nonprofit, Machine LearningIndustries

Requirements

  • Demonstrated significant achievements in data engineering for ML/AI systems (e.g., building data pipelines for model training or evaluation at scale, developing feature stores or data platforms serving multiple teams, creating data quality frameworks and implementing governance systems, designing data architectures that enabled new ML capabilities)
  • Strong proficiency in data technologies including SQL (expert level), Python for data processing, distributed computing frameworks (Spark, Dask), and workflow orchestration tools (Airflow, Dagster, Prefect)
  • Experience with cloud data platforms including data warehouses (Snowflake, BigQuery, Redshift), data lakes, object storage (S3), and streaming systems (Kafka, Kinesis, Flink) for both batch and real-time processing
  • Understanding of ML data requirements including feature engineering, training/validation/test splits, data versioning, experiment reproducibility, and the specific data needs of different model types and training procedures
  • Skilled at data quality and governance including implementing validation frameworks, anomaly detection, data lineage tracking, metadata management, and ensuring compliance with privacy and security policies
  • Knowledge of data modeling principles for both relational and NoSQL systems, understanding of schema design, normalization/denormalization tradeoffs, and performance optimization
  • Value data provenance and documentation, ensuring data pipelines are transparent, decisions are documented, and others can understand and trust the data delivered
  • Technically excellent, treat data quality as a first-class concern, combine software engineering discipline with deep understanding of data systems and ML requirements
  • Aspire to do rigorous, high-quality, robust data engineering, iterate, learn from real usage, and explore different approaches
  • Enjoy collaborative efforts, building data foundations for large-scale problems

Responsibilities

  • Build trustworthy data pipelines with comprehensive provenance and quality gates
  • Curate documented datasets for training and evaluation
  • Ensure data infrastructure scales reliably
  • Work on both platform-specific data needs and cross-project data coordination, preventing duplicate work and facilitating shared datasets
  • Help scale data operations to support medium-scale models
  • Ensure data governance as serving external customers
  • Build systems that researchers can trust for reproducible experiments

Skills

Data Pipelines
Data Provenance
Data Quality
ML Data Pipelines
Data Lineage
Data Infrastructure
Data Governance
Scalable Data Systems
Software Engineering
Dataset Curation

Basis

Platform for developing financial applications

About Basis

Basis provides a platform that assists businesses in creating financial applications. The platform includes a variety of tools and integrations that simplify the process of building, testing, and deploying these applications. Clients, which range from financial institutions to fintech startups, can access the platform through a subscription model, paying for its features and support. Basis distinguishes itself from competitors by focusing on streamlining the development process and offering custom solutions tailored to the specific needs of its clients. The company's goal is to empower businesses to innovate and enhance their financial operations through improved application offerings.

1688 Pine St UNIT E211, San Francisco, CA 94109, USAHeadquarters
2022Year Founded
$6MTotal Funding
SEEDCompany Stage
Enterprise Software, FintechIndustries
1-10Employees

Risks

Emerging fintech startups offer similar tools at lower costs.
Regulatory scrutiny on data privacy may increase compliance costs.
Economic downturns could reduce demand for Basis's services.

Differentiation

Basis provides real-time cash flow profiles for B2B lenders.
The platform integrates data from revenue, accounting, and banking sources.
Basis offers a subscription-based model with premium services and custom solutions.

Upsides

Increased demand for real-time financial data analytics benefits Basis's offerings.
The rise of embedded finance aligns with Basis's focus on lending stacks.
Open banking trends enhance Basis's cash flow profile accuracy.

Land your dream remote job 3x faster with AI