Jobs for Developers

Data Platform Engineer - India

OnehouseFull-time$110k - $270k*Bangalore, IndiaJan 27, 2022
Apply for this job
About Onehouse
Onehouse delivers a new bedrock for your data, through a cloud-native managed lakehouse service built on an open, interoperable, industry-proven technology. Founded by a former Uber data architect and the creator of Apache Hudi, Onehouse accelerates the inevitable transition of the data lake into a lakehouse, unlocking incremental processing to replace old-school batch processing on the lake. Onehouse makes it possible to blend the ease of use of a warehouse with the scale of a data lake into a fully managed product. Engineers can build data lakes in minutes, process data in seconds, and own data in open source formats instead of being locked away to individual vendors. 

https://www.onehouse.ai

Job Description
Are you a passionate Data Engineer who wants to reinvent the future of data engineering infrastructure for the entire industry? Have you always wanted to dig deeper and contribute directly to open source projects like Apache Hudi or Apache Spark/Flink? As a data engineer at Onehouse, you will contribute directly to Apache Hudi and the surrounding open source ecosystem, while deploying and operating these technologies at massive scale for our customers.

Responsibilities

  • Be the thought leader around all things data engineering within the company - schemas, frameworks, data models.
  • Implement new sources and connectors to seamlessly ingest data streams.
  • Building scalable job management on Kubernetes to ingest, store, manage and optimize petabytes of data on cloud storage.
  • Optimize Spark or Flink applications to flexibly run in batch or streaming modes based on user needs, optimize latency vs throughput.
  • Tune clusters for resource efficiency and reliability, to keep costs low, while still meeting SLAs

Must Haves

  • 3+ years of experience in building and operating data pipelines in Apache Spark or Apache Flink.
  • 2+ years of experience with workflow orchestration tools like Apache Airflow, Dagster.
  • Proficient in Java, Maven, Gradle and other build and packaging tools.
  • Adept at writing efficient SQL queries and trouble shooting query plans.
  • Experience managing large-scale data on cloud storage.
  • Great problem-solving skills, eye for details. Can debug failed jobs and queries in minutes.
  • Operational excellence in monitoring, deploying, and testing job workflows.
  • Open-minded, collaborative, self-starter, fast-mover.

Bonus Skills

  • Hands-on experience with k8s and related toolchain in cloud environment.
  • Experience operating and optimizing terabyte scale data pipelines
  • Deep understanding of Spark, Flink, Presto, Hive, Parquet internals.
  • Hands-on experience with open source projects like Hadoop, Hive, Delta Lake, Hudi, Nifi, Drill, Pulsar, Druid, Pinot, etc.
  • Operational experience with stream processing pipelines using Apache Flink, Kafka Streams.
Who We Are
At Onehouse, our mission is to aid companies of all sizes in supercharging their data engineering/data science, by automating painful data infrastructure buildout. We are a team of self-driven, inspired, and seasoned builders that have created large-scale data systems, as well as globally distributed platforms that sit at the heart of some of the most well known companies out there including Uber, Linkedin, Confluent, Microsoft. We are set out on an ambitious goal to build the world's best fully managed and self-optimizing data lake platform. We are very well funded and backed by some of the top-tier VCs in Silicon Valley, and as well as numerous well-known angel investors from top Silicon Valley companies.

Why join us
Fun team, challenging problems! One day, we will be managing the largest database in existence!
Contribute directly to open source, including an exciting and growing data project - Apache Hudi
Create instant impact by contributing to Hudi, which is already in use by numerous large enterprises globally
Experienced team with numerous staff level engineers, to learn and grow with.
Early opportunity on a very happening space, everybody agrees the next few years will reshape the data landscape
Founding team is the creator of a large, fast-growing technology category - transactional data lakes

We are growing fast and looking for rising talent who can grow with us to become future leaders of the team. Come help build this unicorn-to-be!

Share

Alternative Jobs