About the Team
HashiCorp solves development, operations, and security challenges in infrastructure so organizations can focus on business-critical tasks. We build products to give organizations a consistent way to manage their move to cloud-based IT infrastructures for running their applications. Our products enable companies large and small to mix and match AWS, Microsoft Azure, Google Cloud, and other clouds as well as on-premises environments, easing their ability to deliver new applications.
At HashiCorp, we have used the Tao of HashiCorp as our guiding principles for product development and operate according to a strong set of company principles for how we interact with each other. We value top-notch collaboration and communication skills, both among internal teams and in how we interact with our user
The Role
As an Engineering Manager for the SRE team, you will lead a strategic effort to improve our cloud products' reliability and operational readiness. This role encompasses driving initiatives around in enhancing our operational resilience and maintaining the reliability of our cloud-based products. With a focus on rapid identification, response, and resolution of incidents, you will be at the forefront of ensuring high availability and performance across HashiCorp’s offerings.As an Engineering Manager for the SRE team, you will be at the forefront of ensuring high availability across HashiCorp’s products, with an emphasis on quickly identifying, responding to, and resolving incidents.
What you'll do (responsibilities)
- Lead and manage incident response and disaster recovery efforts across high availability SaaS environments.
- Design and execute robust disaster recovery strategies to ensure alignment with Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO).
- Drive compliance with organizational and industry standards by embedding best practices for disaster recovery, resilience, and fault tolerance, leveraging Chaos Engineering where appropriate.
- Define and evolve the incident response framework to enable rapid, coordinated resolution of operational disruptions.
- Proactively identify and mitigate potential points of failure through automation and predictive tooling to enhance system stability.
- Analyze incident patterns and root causes to drive continuous improvement in reliability engineering practices and response processes.
- Develop, maintain, and scale engineering tools for real time incident detection, diagnostics, and automated remediation.
- Collaborate with cross functional teams to build frameworks for incident simulation, root cause analysis, and reproducibility at scale.
- Own and Lead DR drills and chaos testing exercises, documenting findings and delivering actionable recommendations for resilience enhancement
- Partner closely with development, operations, and security teams to ensure cohesive incident management and comprehensive post-incident reviews
What you’ll need (basic qualifications)
- Minimum of 12 years of professional experience, including at least 2 years in a managerial capacity within a Site Reliability Engineering (SRE) focused team.
- Demonstrate hands-on leadership in SRE for high-availability SaaS environments with a strong focus on reliability and operational excellence.
- Possess a strong background in cloud-based software development and have led teams addressing scalability, performance, and reliability challenges.
- Demonstrate excellent leadership and project management skills, with a track record of mentoring engineers and driving cross-functional collaboration.
- Show a proactive approach to problem-solving, capable of anticipating and mitigating potential issues before they impact customers.
- Are experienced in agile methodologies, leading teams with empathy, and committed to delivering high-quality, reliable software solutions. #LI-Hybrid
“HashiCorp is an IBM subsidiary which has been acquired by IBM and will be integrated into the IBM organization. HashiCorp will be the hiring entity. By proceeding with this application you understand that HashiCorp will share your personal information with other IBM subsidiaries involved in your recruitment process, wherever these are located. More information on how IBM protects your personal information, including the safeguards in case of cross-border data transfer, are available here: link to IBM privacy statement.”