Data Engineer, Platform at Basis

New York, New York, United States

Basis Logo
Not SpecifiedCompensation
Mid-level (3 to 4 years), Senior (5 to 8 years)Experience Level
Full TimeJob Type
UnknownVisa
AI Research, Nonprofit, Machine LearningIndustries

Requirements

  • Demonstrated significant achievements in data engineering for ML/AI systems (e.g., building data pipelines for model training or evaluation at scale, developing feature stores or data platforms serving multiple teams, creating data quality frameworks and implementing governance systems, designing data architectures that enabled new ML capabilities)
  • Strong proficiency in data technologies including SQL (expert level), Python for data processing, distributed computing frameworks (Spark, Dask), and workflow orchestration tools (Airflow, Dagster, Prefect)
  • Experience with cloud data platforms including data warehouses (Snowflake, BigQuery, Redshift), data lakes, object storage (S3), and streaming systems (Kafka, Kinesis, Flink) for both batch and real-time processing
  • Understanding of ML data requirements including feature engineering, training/validation/test splits, data versioning, experiment reproducibility, and the specific data needs of different model types and training procedures
  • Skilled at data quality and governance including implementing validation frameworks, anomaly detection, data lineage tracking, metadata management, and ensuring compliance with privacy and security policies
  • Knowledge of data modeling principles for both relational and NoSQL systems, understanding of schema design, normalization/denormalization tradeoffs, and performance optimization
  • Value data provenance and documentation, ensuring data pipelines are transparent, decisions are documented, and others can understand and trust the data delivered
  • Technically excellent, treat data quality as a first-class concern, combine software engineering discipline with deep understanding of data systems and ML requirements
  • Aspire to do rigorous, high-quality, robust data engineering, iterate, learn from real usage, and explore different approaches
  • Enjoy collaborative efforts, building data foundations for large-scale problems

Responsibilities

  • Build trustworthy data pipelines with comprehensive provenance and quality gates
  • Curate documented datasets for training and evaluation
  • Ensure data infrastructure scales reliably
  • Work on both platform-specific data needs and cross-project data coordination, preventing duplicate work and facilitating shared datasets
  • Help scale data operations to support medium-scale models
  • Ensure data governance as serving external customers
  • Build systems that researchers can trust for reproducible experiments

Skills

Key technologies and capabilities for this role

Data PipelinesData ProvenanceData QualityML Data PipelinesData LineageData InfrastructureData GovernanceScalable Data SystemsSoftware EngineeringDataset Curation

Questions & Answers

Common questions about this position

What skills and experience are required for this Data Engineer role?

The role requires demonstrated achievements in data engineering for ML/AI systems such as building data pipelines at scale, developing feature stores, creating data quality frameworks, and designing data architectures. Strong proficiency is needed in SQL (expert level), Python for data processing, distributed computing frameworks like Spark and Dask, and workflow orchestration tools like Airflow, Dagster, or Prefect.

What is the salary or compensation for this position?

This information is not specified in the job description.

Is this Data Engineer role remote or does it require office work?

This information is not specified in the job description.

What is the company culture like at Basis?

Basis emphasizes a collaborative effort both internally and with external partners, building a new kind of organization that puts human values first, and seeks people who enjoy working on problems larger than they can tackle alone while aspiring to rigorous, high-quality data engineering with iteration and learning.

What makes a strong candidate for this Data Engineer position?

Strong candidates have significant achievements in ML/AI data engineering, treat data quality as a first-class concern, combine software engineering discipline with deep understanding of data systems and ML requirements, and enjoy collaboration on large-scale problems.

Basis

Platform for developing financial applications

About Basis

Basis provides a platform that assists businesses in creating financial applications. The platform includes a variety of tools and integrations that simplify the process of building, testing, and deploying these applications. Clients, which range from financial institutions to fintech startups, can access the platform through a subscription model, paying for its features and support. Basis distinguishes itself from competitors by focusing on streamlining the development process and offering custom solutions tailored to the specific needs of its clients. The company's goal is to empower businesses to innovate and enhance their financial operations through improved application offerings.

1688 Pine St UNIT E211, San Francisco, CA 94109, USAHeadquarters
2022Year Founded
$6MTotal Funding
SEEDCompany Stage
Enterprise Software, FintechIndustries
1-10Employees

Risks

Emerging fintech startups offer similar tools at lower costs.
Regulatory scrutiny on data privacy may increase compliance costs.
Economic downturns could reduce demand for Basis's services.

Differentiation

Basis provides real-time cash flow profiles for B2B lenders.
The platform integrates data from revenue, accounting, and banking sources.
Basis offers a subscription-based model with premium services and custom solutions.

Upsides

Increased demand for real-time financial data analytics benefits Basis's offerings.
The rise of embedded finance aligns with Basis's focus on lending stacks.
Open banking trends enhance Basis's cash flow profile accuracy.

Land your dream remote job 3x faster with AI