Smart is a UK-based retirement technology company and one of the world's largest recordkeepers. Our mission is to use our modern technology to disrupt the retirement market and improve outcomes for retirement savers all over the world. In the UK, our own pooled employer plan - the Smart Pension Master Trust - has over 700,000 participants. We also provide Platform-as-a-Service solutions to other retirement platforms. In addition to the UK we already operate with partners in Ireland, Dubai and Australia, and we launched our US business in 2020. Our strategic investors include Legal & General Investment Management, Natixis, Barclays Bank and LINK Group (Australia).
Smart is growing its technology team and as part of our growth, we are looking to hire Site Reliability and Infrastructure Engineers to help us dramatically scale our operations. Are you an opinionated, fast moving engineer with prior SRE, systems or infrastructure engineering experience looking to make your dent in the universe? Smart could be your chance!
Responsibilities
Site Reliability Engineer
Are you someone who has a passion for uptime? Are you opinionated on what SRE is and isn’t? Do you know the difference between good and bad instrumentation? We are looking for seasoned engineers to join our US or European SRE team and help us level up!
- Build and configure developer and client facing tooling to improve platform and service observability.
- Reduce manual work (toil) for the technology team -- yourself included! - by implementing appropriate solutions or processes as required.
- Be a security champion within the SRE sphere of responsibility
- Own & support engineering in implementation of monitoring & alerting as we move to a 24/7/365 platform.
- Coach the development team to build and leverage application and platform metrics more effectively.
- Define Service Level Objectives for the Smart Platform, including working with other departments to define appropriate availability targets.
- Keep up to date with best practices in full-stack site reliability
- Assist our Infrastructure team in maintaining our AWS environments
- Own and improve our incident management tooling and integrations
- Update or produce documents to describe changes to the platform
Infrastructure Engineer
As part of our growth, we are now looking to hire a Senior Infrastructure Engineer to own and extend our internal platforms to support the next generation scaling requirements. This position includes some limited out of hours on-call after your initial training period, and will be supported by the rest of the engineering team. We cherish work-life balance here at Smart, so this support will be reasonable and compensated.
- Own and be accountable for the reliability, security and deployment of the platform infrastructure.
- Provide support and input to the teams running services on top of that infrastructure.
- Design and document standards, processes, and procedures; as well as update documentation as systems change.
- Build, maintain, & support our AWS, Kubernetes and Heroku (legacy) environments.
- Reduce manual work (toil) for the technology team -- yourself included! - by implementing appropriate solutions or processes as required.
- Design & implement improved data security strategies for our platform.
- Own & implement appropriate monitoring & alerting as we move to a 24/7/365 platform.
- Define Service Level Objectives for the Smart Platform, including working with other departments to define appropriate availability targets.
Requirements
Site Reliability Engineer
Essential
- Experience supporting platforms (websites and/or APIs) on hyperscale cloud providers (especially AWS) and SaaS products.
- Experience defining appropriate full-stack performance and security metrics to monitor web platforms, and alerting on breaches of those metrics.
- Experience with continuous delivery and working in zero-downtime deploy environments.
- Comfortable with command-line tools and environments. Linux experience is essential.
- Proficient with at least one server-side programming language such as Ruby or Python and with configuration management tools like Terraform or Ansible.
- Enjoys complex problem solving and delivering results.
Helpful
- Experienced user of SaaS observability tooling, especially DataDog
- Experienced working with or supporting Incident Management tooling (Pagerduty, OpsGenie, etc).
- Experience with container deployment platforms like Kubernetes, and especially AWS EKS.
- Prior evidence of developing command-line tools.
- Experience with Serverless technologies e.g. AWS Lambda
Infrastructure Engineer
Essential
- Proven track record of supporting web platforms (e.g. websites or API’s) in production at scale.
- Experience in cloud infrastructure, design, deployment, and support
- Experience defining metrics to monitor web platforms, and alerting on those metrics.
- Experience with Systems and network security in a cloud environment.
- Experience with incident response and on-call rotations.
- Experience with continuous delivery and zero-downtime deployments.
- Comfortable with command-line tools and environments. Linux/Unix experience is preferred.
- Experience with configuration management tools like Terraform or Ansible
- Ability to solve complex problems, and deliver results.
- Ability to define and document standards, via runbooks, design documents, RFCs, etc.

At Smart, we're a diverse team, made up of people from different backgrounds, experiences and skills. Our goal is to build great products to help people plan for their financial futures. We’re constantly developing new ideas to help people look after their pension schemes, in the UK and abroad. We’ve grown to a team of over 500 talented people, all dedicated to creating the best experience for our customers. Recently we made it onto Great Places to Work UK's Best Workplaces 2020 for medium-sized companies! If you think you can help us build a smarter future, come and work with us.
Our Recruitment Data Policy is here. Please click on the link if you have any questions about how we store your data or to know your rights.
Benefits
- Health Insurance (via Aetna) including Dental and Vision.
- Life, Short & Long Term Disability Insurance.
- $500 personal training budget to spend on books, courses, conferences or training materials to help you develop.
- 5 week sabbatical after 5 years.
- 15 days vacation per year plus 1 extra day vacation day after 2 years and then every year up to a max of 20 days vacation. 10 holidays. 10 days sick per year.
- 401(k) Plan with company match.
- Enhanced maternity and paternity