Sr. Data Engineer

About the Company

Posthaste Labs is a boutique consulting firm that provides development and data services to VCs and startups. Our focus is on building the highest quality products that will enable our customers to quickly gain traction in the market while ensuring that their code and data ecosystems are robust and able to scale.

About the Role:

As a Sr. Data Engineer with Posthaste Labs, you will be responsible for developing both the business logic as well as technical implementation for the data pipelines that are used by the client to run their operations. The pipelines will be developed using dbt and will need to merge data across several distinct data sources using information that changes across time. This position will be embedded directly with the client in a leading BioTech firm while having autonomy to determine the tools, technologies and processes. You will work directly with the client, shaping requirements, defining timelines and executing on agreed-upon deliverables. You will work within a team of 6 within the overall development of solving end-to-end use-cases.

This is an hourly contract position to start. The candidate should be able to consistently provide 30+ hours a week but can set their own schedule and may work up to 50 hours if desired. Candidates must be based in the US but can set their hours for when and where they want to work.

Key Responsibilities

  • Client engagement: Proactively identify gaps, issues and data needs while sharing opinions with client. Lead interactions and projects with clear communication.
  • Entity resolution: Develop matching algorithms that will reconcile data about entities from disparate sources, rank matches and intelligently correlate records together.
  • Data quality transparency: Create robust monitoring and validation frameworks to provide constant transparency into the input and output data quality
  • Pipeline development: create well-defined dbt models that optimize reusability and performance while providing value for the use-case they’re defined for
  • Data preparation for modeling: have a firm understanding of the data needs for different ML models that are used for unstructured text classification and be able to clean and curate data sets that can train these models
  • Measurement: Lead the analytical measurement and efficacy of the entity resolution process as well as creating tooling that surfaces data quality issues early.

Experience and Qualifications:

  • 5+ years using SQL / building data pipelines in SQL (snowflake experience highly preferred)
  • 3+ years developing entity resolution algorithms
  • 3+ years with dbt
  • 5+ years working with Python
  • 2+ years experience Spark (preferred)
  • 3+ years in startups
  • Proven experience moving analytical workloads into scalable production processes
  • Highly opinionated on analytical approaches and able to articulate tradeoffs between different methodologies
  • Able to curate a plan to convert a highly ambiguous project into a structured and well-managed project
Pay:
160000
-
200000