AI Data Engineer

@A Leading BioTech VC

Pay Range

$220,000 - $250,000

Contract

About the Company

Posthaste Labs is a boutique consulting firm that provides development and data services to VCs and startups. Our focus is on building the highest quality products that will enable our customers to quickly gain traction in the market while ensuring that their code and data ecosystems are robust and able to scale.

About the Role

As an AI Data Engineer with Posthaste Labs, you will be the bridge between traditional data engineering and AI enablement — working primarily in Snowflake and dbt to build the data foundations, tooling, and infrastructure that allow AI systems to execute more efficiently, autonomously, and reliably. Rather than building the AI models themselves, your focus will be on creating the robust, well-structured data layer and supporting tooling that makes AI agents and LLM-powered workflows performant and scalable. You will design pipelines that prepare, curate, and serve data to AI systems while building the feedback loops, evaluation frameworks, and orchestration tooling that enable those systems to operate with minimal human intervention. This position will be embedded directly with the client in a leading BioTech firm while having autonomy to determine the tools, technologies, and processes. You will work directly with the client, shaping requirements around AI-driven use-cases, defining technical approaches, and executing on agreed-upon deliverables. You will work within a team of 6 within the overall development of solving end-to-end use-cases. This is an hourly contract position to start. The candidate should be able to consistently provide 30+ hours a week but can set their own schedule and may work up to 50 hours if desired. Candidates must be based in the US but can set their hours for when and where they want to work.

Key Responsibilities

  • Client engagement: Proactively identify gaps, issues and AI/data needs while sharing opinions with the client. Lead interactions and projects with clear communication, translating complex AI concepts for non-technical stakeholders.
  • Snowflake data architecture: Design, build, and maintain the core data layer in Snowflake — creating well-structured schemas, views, and materialized assets that serve as the reliable foundation AI systems depend on to operate autonomously.
  • dbt pipeline development: Build and maintain a robust dbt project that transforms raw data from disparate sources into clean, well-modeled, AI-ready datasets. Optimize for reusability, performance, and clarity so that AI agents can consume data with minimal ambiguity.
  • AI enablement tooling: Create the tooling and infrastructure that allows AI systems to execute more efficiently and autonomously — including structured data contracts, metadata layers, context-generation pipelines, and automated data preparation workflows that reduce the need for human-in-the-loop intervention.
  • Orchestration & feedback loops: Build orchestration frameworks and feedback loops that enable AI agents to trigger data refreshes, evaluate their own outputs against ground truth, and self-correct — moving toward increasingly autonomous operation.
  • Entity resolution with ML: Develop advanced entity resolution algorithms that leverage both traditional matching techniques and ML-based approaches to reconcile data from disparate sources, rank matches, and intelligently correlate records.
  • Data quality & governance: Implement comprehensive dbt tests, data quality monitoring, and validation frameworks to ensure that the data AI systems consume is accurate, complete, and trustworthy. Build guardrails that prevent AI systems from operating on stale or corrupted data.
  • Evaluation & observability: Design evaluation harnesses and observability tooling that measure how effectively AI systems are using the data layer — tracking accuracy, latency, cost, and drift to surface issues before they impact downstream AI performance.
  • Data preparation for modeling: Prepare and curate high-quality datasets for ML training, fine-tuning, and evaluation across multiple use-cases including unstructured text classification, NER, and semantic search.

Experience and Qualifications

  • 5+ years of data engineering experience building production data pipelines
  • 5+ years using SQL with strong Snowflake experience required
  • 4+ years with dbt, including designing and maintaining large-scale dbt projects
  • 5+ years working with Python for data engineering and automation
  • 2+ years of hands-on experience building tooling or infrastructure that supports LLM-powered applications, AI agents, or autonomous workflows
  • Experience designing data contracts, metadata layers, or structured interfaces that AI systems consume
  • Familiarity with LLM/agent frameworks (LangChain, LlamaIndex, or similar) — enough to understand what AI systems need from the data layer
  • 3+ years developing entity resolution or record linkage algorithms
  • 2+ years experience with Spark (preferred)
  • 3+ years in startups or fast-paced environments
  • Strong understanding of data quality practices including dbt tests, monitoring, and validation frameworks
  • Proven experience moving analytical and AI-enabling workloads into scalable production processes
  • Highly opinionated on data architecture and how to structure data for AI consumption, able to articulate tradeoffs between different approaches
  • Able to take highly ambiguous AI/data projects and convert them into structured, well-managed deliverables