THE COMPANY

Our mission is to build the Covariant Brain, a universal AI to give robots the ability to see, reason and act on the world around them. Bringing AI from research in the lab to the infinite variability and constant change of our customer’s real-world operations requires new ideas, approaches and techniques.

Success in the real world requires a team that represents that world: diversity of backgrounds, points of view, and experiences. Our common denominator: ambitious expectations, love of learning, empathy for those around us, and a team-first mindset.

THE ROLE

Production Engineers at Covariant play a mission-critical role in ensuring our services' seamless operation and future scalability. In this role, you'll be at the forefront of every significant engineering endeavor embedded within our production and research teams. As a production engineer, you will drive innovation and efficiency in our projects by applying your expertise in AWS, Docker, Kubernetes, Puppet, and Terraform to architect scalable and resilient infrastructure for our innovative AI robotics systems.

AREAS OF FOCUS

Own and orchestrate large GPU clusters across different cloud providers using IaaC and scripts to provide researchers with a single cohesive interface
Help other teammates architect and build scalable tooling for our edge robot fleet
Collaborate with brilliant researchers to evolve our training and inference tooling to be state-of-the-art

YOU WILL

Design, build, manage and monitor the infrastructure we use to deploy our AI software and robotics solutions
Develop and evolve software engineering and operational practices for the unique needs of distributed AI-powered cyber-physical systems
Identify and establish healthy engineering and operational culture and processes
Deliver previously impossible robotics capabilities that solve real needs for our partners and customers
Collaborate with, learn from, and support a diverse and cross-functional team, including mechanical, electrical, and robotics engineers, AI/ML researchers, and business development

YOU HAVE

Substantial previous experience in operating and automating production systems in both cloud and bare metal, deploying and administering Linux systems and/or wide-area networks, and building new tools and/or extending existing tools to add new capabilities
A track record of accelerating developer productivity through improved tooling, automation, and education
A track record of partnering with stakeholders to deliver solutions throughout the development process
A solid foundation in Python, Linux, and networking
Commitment to continuous learning and willingness to pick up new languages or technologies as needed, to solve real problems and deliver business impact

NICE TO HAVES

Desire to work with a small collaborative team, with a high degree of autonomy and responsibility
Are motivated to work on challenging real-world engineering problems without prior solutions
Are excited to join coworkers who strive to be inclusive, thoughtful, and down-to-earth
Are self-directed and enjoy figuring out what is the most important problem to work on
Have previously done one or more of the following: deployed client-side software, including protecting source code, establishing secure licensing, and performing release engineering; or, set up and scaled developer tooling and CI/CD systems; or built ML or IoT data pipelines processing images and metadata from live deployments; or managed high-bandwidth deep learning or super-computing hardware

SAMPLE WEEK IN THE LIFE

Monday: Start the week with a team meeting to discuss ongoing projects and explore potential collaborations. Resume work on the rollout of BigProxy v2 in the development environment, refining probing tests to enhance its reliability. Also, schedule a discussion with our Tailscale account representative to renew our contract.
Tuesday: Address an urgent issue with the networking backplane of one of our GPU clusters not performing optimally. Conduct a troubleshooting session with the cluster provider to adjust the NCCL topology file, following unexpected changes on their end.
Wednesday: Develop a new alert in Datadog to monitor the performance of the GPU cluster backplane, ensuring it is adaptable for use with various providers.
Thursday: Collaborate with a colleague on deploying a PyPi server in our cloud infrastructure. Continue the implementation and testing of BigProxy v2 which was paused on Tuesday.
Friday: Lead a presentation at the weekly engineering deep dive to discuss the features and potential rollout of BigProxy v2, which consolidates all connections from remote deployments to the cloud through a single channel and simplifies SSH access to GPU clusters outside AWS/GCP. Gather and incorporate feedback from the team to finalize the deployment strategy.

SALARY RANGE

Base pay is one element of our total rewards package which may also include comprehensive benefits and equity etc., depending on eligibility. The annual base salary range for this position is from $165,000 to $210,000. The actual base pay offered will be determined on factors such as years of relevant experience, skills, education etc. Decisions will be determined on a case-by-case basis.

COMPANY CORE VALUES

LEARNING CONSTANTLY

STRIVING FOR EMPATHY

TAKING ON THE IMPOSSIBLE, TOGETHER

BENEFITS (US)

Health, dental, and vision insurance for you and your family

Unlimited PTO and Flexible work hours

401(k) plan and company match

Lunch and dinner each day (for on-site employees)

Monthly Health & Wellness budget

Quarterly Learning budget

At covariant.ai we don’t just accept difference—we celebrate it, we support it, and we thrive on it for the benefit of our employees, our products, and our community. Covariant.ai is proud to be an equal opportunity workplace. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, or Veteran status.

Production Engineer

AREAS OF FOCUS

YOU WILL

YOU HAVE

NICE TO HAVES

SAMPLE WEEK IN THE LIFE

Alternative Jobs

Automate your job search with AI

Popular searches