Jobs for Developers

Site Reliability Engineer V

ID.meFull-time$181k - $210kMcLean, VirginiaApr 15, 2024
Apply for this job

Company Overview is a high-growth enterprise software company that simplifies how people prove and share their identity online.  The company empowers people to control their data through a portable and trusted login, which means they don’t need to create a new password when visiting sites that have the button.’s digital identity network has over 117 million registered members, and is used by fourteen federal agencies, agencies in 30 states and over 600 corporations for secure identity proofing and verification.’s technology meets the federal standards for consumer authentication set by the Commerce Department and is approved as a NIST 800-63-3 IAL2 / AAL2 credential service provider by the Kantara Initiative. In addition to helping people control their credentials and data, the company’s “No Identity Left Behind” initiative strives to expand digital access and inclusion for all people. The company offers multiple pathways to identity verification – online self-serve, live video chat agents, and in person. is passionate about building a robust identity network that does not compromise access for traditionally underserved groups. has received numerous awards including Deloitte’s 2023 Technology Fast 500, Washington Business Journal’s Fastest Growing Companies, Entrepreneur Magazine’s 100 Brilliant Companies and Wall Street Journal’s Startup of the Year finalist.  In recent quarters, announced it raised $132 million in Series D funding, led by Viking Global Investors with participation from CapitalG, Morgan Stanley Counterpoint, FTV Capital, PSP Growth, Auctus Investment Group, Moonshots Capital, and Scout Ventures.’s most recent round brings the total investment in to over $275 million since its founding in 2010.

The Site Reliability Engineer V (SRE) will combine software and systems engineering to build and run distributed, fault-tolerant systems at scale. SRE's ensure our services have the appropriate reliability and uptime to protect and promote our customers’ experience.

Note that candidates must be located in the Washington DC or San Francisco Bay area as this role requires an onsite presence.


  • Design, build, implement, and maintain platform tooling that improves reliability across the entire product surface area, to improve the availability, scalability, latency, and efficiency of services
  • Manage end-to-end distributed systems availability and ensure high-performance of applications
  • Build automation solutions to prevent problem recurrence
  • Build visibility into SLIs, SLOs, SLAs, and dependency metrics to manage operational burden and systems reporting
  • Design, build, implement, and maintain observability ecosystem to provide visibility across the platform services and applications
  • Proactively identify risks and develop engineering processes and/or tooling to reduce availability risk
  • Evangelize best practices and mentor service owners on reliability, resiliency, and scalability for new and existing services and/or features
  • Participate in an on-call rotation and hold retroactive root cause analysis meetings, focusing on identifying remediations and product resiliency opportunities

Ideal Qualifications 

  • At least 7 years of experience working in medium or large scale production systems
  • The ability to take a systematic approach to analyzing, troubleshooting, and diagnosing system problems to identify, locate, resolve, and repair problems
  • Experience in software development or systems engineering with code
  • Experience designing for scale and automation-forward ecosystems and solutions
  • Possess a breadth of engineering skills with an interest in service reliability, automation, monitoring, and capacity planning
  • Understanding of modern application architecture (e.g. microservices, EDA)
  • Experience with APM services and solutions (e.g. Open Telemetry, Honeycomb, New Relic, Dynatrace, AppDynamics, Datadog)
  • Experience with time-series observability solutions (e.g. InfluxDB, Prometheus, Grafana)
  • Experience with scaled indexed logging solutions (e.g. Splunk, ElasticSearch, OpenSearch)
  • Experience running and operating Ruby on Rails applications and infrastructure
  • Deep knowledge with major cloud services providers and solutions (Amazon Web Services, Google Cloud Platform, Microsoft Azure)
  • Previous experience working within site reliability engineering culture (e.g. improving reliability through systems engineering automation, chaos testing, synthetics, and process improvement)
  • Experience designing, building, implementing, and operating distributed systems and cloud infrastructure at scale
  • Experience with container computing and container orchestration (e.g. proprietary systems such as Google Kubernetes Engine (GKE), multi-cloud solutions such as Kubernetes, or Nomad)
  • Experience with configuration management systems (e.g. Ansible, Puppet, Chef, Saltstack, Consul)
  • Experience with virtual networking (e.g. cloud networking, service mesh, SDN)
  • Experience in security automation (e.g. cloud proprietary solutions such as Google Secret Manager or Vault)
  • Experience with infrastructure-as-code (e.g. Terraform)
  • Strong written communication skills
  • Ability to work in an asynchronous environment
  • Experience in supporting a 24/7 operational infrastructure including on-call rotations

Ideal candidate will thrive in the following culture:

  • Must have an obsession for building quality products 
  • Ability to thrive when there are changing priorities and shifting of gears
  • Strong oral and written communication skills
  • Must be a team player with a strong, self-managing work ethic
  • Must be a self-starter with a passion for platform engineering, learning and continuous improvement

Day to Day Life

  • Ensure observability tooling and integrations are providing telemetry and logging statistics across the entirety of systems and applications
  • Enable the Engineering organization the ability to identify and triage operational issues, empowering teams to own and operate autonomously
  • Contribute to defining and executing on the Observability Roadmap in maintaining and modernizing cloud-native observability within the organization
  • Integrate telemetry and logging frameworks to the cloud platform
  • Evaluate new and existing observability technologies to ensure capabilities are inclusive of black box solutions (e.g. COTS) as well as Engineering-created software
  • Manage distributed system and application scaling activity directly (as applicable) as well as in an advisory capacity on behalf of Engineering development teams

Note that candidates must be located in the Washington DC or San Francisco Bay area as this role requires an onsite presence.

At, we believe that an in-office culture fosters professional growth and development, mentorship, collaboration, and accelerated innovation. This position will be in-office based at one of our locations in either McLean, VA or Sunnyvale, CA. Working in an office together allows our culture to thrive and our team members to establish real connections with their coworkers and the opportunity for lifelong friendships. Our work is critical to protecting online identity and we’re confident that working together is how we’ll change the world.

The annual base salary listed below for this role is based on experience, skills, education, relevant training and geographic location. Company bonus, incentive for sales roles, equity, and benefits are available depending on the role. offers comprehensive medical, dental, vision, health savings account, flexible spending accounts (medical, limited purpose, dependent care, commuter benefit accounts), basic and voluntary life and AD&D insurance, 401(k) with company match, parental leave, ability to participate in unlimited paid time off subject to the terms and conditions of the PTO policy, including 8 company wide holidays, short and long-term disability insurance, accident and critical illness insurance, referral bonus policy, employee assistance program, pet insurance, travel assistant program, wellbeing and childcare discounts, benefit advocates, and a learning and development benefit.

The above represents the anticipated total rewards package for this job requisition. Final offers may vary from the amount listed based on qualifications, professional experiences, skills, education, relevant training, geographic location, and other job related factors.

Pay Range
$181,488$210,000 USD maintains a work environment free from discrimination, where employees are treated with dignity and respect. All employees share in the responsibility for fulfilling our commitment to equal employment opportunity. does not discriminate against any employee or applicant on the basis of age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. adheres to these principles in all aspects of employment, including recruitment, hiring, training, compensation, promotion, benefits, social and recreational programs, and discipline. In addition,'s policy is to provide reasonable accommodation to qualified employees who have protected disabilities to the extent required by applicable laws, regulations and ordinances where a particular employee works. Upon request we will provide you with more information about such accommodations.

Please review our Privacy Policy, including our CCPA policy, at If you provide with any personally identifiable information you confirm that you have read and agree to be bound by the terms and conditions set out in our Privacy Policy. participates in E-Verify.


Alternative Jobs