REMOTE Lead ETL Engineer (Python, Pyspark) - $130-180k + Benefits!

Location: Houston, Texas, United States
Type: Full-time
Posted: 09.AUG.2021


About Us: Located in the heart of Silicon Valley (but with a fully remote team all around the US!), we are an exciting new digital healthca...


About Us:

Located in the heart of Silicon Valley (but with a fully remote team all around the US!), we are an exciting new digital healthcare company which leverages data and artificial intelligence to build a one of a kind virtual pharmacist that ensures patients are on the safest and most effective medication regimens possible. We are VC-backed and mission-driven to improve the lives of patients with a proven track record with providing valuable services to blue chip health companies.

We are looking for a software engineering candidate to join a strong team of diverse engineers in building our data platform. This candidate will have a direct impact on the infrastructure and delivery of our core enterprise data offerings. As a Senior/Lead ETL Engineer, you will be responsible for various data pipelines, data ingestion and integration platforms. The role will work in close collaboration with technical stakeholders to optimize processes to ingest large enterprise healthcare datasets and build novel healthcare data pipelines. This position will support, maintain, and develop software using a variety of different tools, including AWS stacks, Python, Pandas and Pyspark.


  • Participate in all aspects of our data platform, which include:
  • Write production level ETL pipeline using Python, Pandas, PySpark
  • Build and improve ETL framework Data processing, validation, cleaning, and debugging
  • Work closely with Data Science team and Clinical team on data exploration


  • Minimum 7+ years of experience writing production level Python code and unit testing
  • Experience with developing complex algorithms in python
  • Experience working with JSON, data frames and other data structures
  • Experience with developing, deploying, monitoring ETL pipelines in PySpark
  • Experience with data validation process
  • Experience with AWS services (S3, DynamoDB, EMR)

Bonus Points:

  • Demonstrable understanding of healthcare data.
  • Experience setting up and executing jobs through AWS API with EMR, Glue or Apache Airflow
  • Experience with PostgreSQL, shell scripting
  • Experience with processing EHR and healthcare claims data using python, pyspark, SQL

So, if you a are Senior ETL Engineer with Python + Pyspark exerience, please apply today!

Apply Now


Loader2 Processing ...