Data Engineer - Airflow, AWS S3, EMR, Kubernetes EKS and Python

Company: KSN Technologies, Inc.
Location: Oakland, California, United States
Type: Full-time
Posted: 07.APR.2021
< >

Summary

Data Engineer - Airflow, AWS S3, EMR, Kubernetes EKS and Python Oakland, CA 6 Months Contract Looking for the candidates who can work on ...

Description

Data Engineer - Airflow, AWS S3, EMR, Kubernetes EKS and Python

Oakland, CA

6 Months Contract

Looking for the candidates who can work on our W2.

• Top Skills: Strong Python experience, programming experience, building data pipelines for product application development (not necessarily for databases) but more customer/product focused, implementation focused for features. Must have Spark expertise. Airflow for scheduling/pipelines. "Great expectations" is a tool for data validation that they use (I've never heard of this tool). Most data is healthcare data, so healthcare domain expertise would be helpful. Taking a multi-step process -
o 1. Spark (spark-sql, Spark API's (dataframes) - using Spark 3
o Airflow
o Python (understanding algorithms)
o AWS S3, EMR, Kubernetes EKS

Generation of a medically-related cohort is the gateway to all the work we do at Komodo Health. As a member of the team responsible for Komodo's standardized approach to healthcare cohort generation (Prism), you are responsible for the backbone data retrieval and transformations required by our internal service layer used to serve cohort generation along a variety of axes - patients, payers, providers, etc. You are responsible for ensuring that complex cohort generation queries are supported through the cohort generation API, adhere to SLAs, and for the underlying caching necessary to support multiple cohort generation access patterns.

Looking back on your first 12 months at Client you will have…
• Expanded or enhanced the Prism internal data model to ensure timely performance for all types of cohort generation requests
• Recommended and implemented additional pre-calculation and caching strategies as needed to ensure that SLAs are met
rting API output, as above, ideally within a low query volume / high data volume environment
• Demonstrably deep experience with Python
• Demonstrably deep experience with relevant 'big data' processing either via Spark or through a modern MPP database like Snowflake, ideally with experience in both
• Experience with separate caching/cache invalidation strategies
• Understand and design for non-functional concerns such as performance, cost optimization, maintainability and developer experience
• Strong communication with engineers, product managers, and salespeople

- provided by Dice

 
Apply Now

Share

Flash-bkgn
Loader2 Processing ...