GRAIL is a healthcare company whose mission is to detect cancer early, when it can be cured. GRAIL is focused on alleviating the global burden of cancer by developing pioneering technology to detect and identify multiple deadly cancer types early. The company is using the power of next-generation sequencing, population-scale clinical studies, and state-of-the-art computer science and data science to enhance the scientific understanding of cancer biology, and to develop its multi-cancer early detection blood test. GRAIL is headquartered in Menlo Park, CA with locations in Washington, D.C., North Carolina, and the United Kingdom. GRAIL, LLC is a wholly-owned subsidiary of Illumina, Inc. (NASDAQ:ILMN). For more information, please visit www.grail.com.

GRAIL is seeking a Staff Software Engineer in our Site Reliability Engineering (SRE) team to help us improve security and reliability of production systems that are critical for our mission to detect cancer early and save lives. You will contribute to the architecture, design, development, implementation, and be responsible for secure, healthy, and reliable operation of critical cloud-based infrastructure, services, and applications. You are someone who enjoys learning and implementing best industry technology trends and practices. You foster and contribute to the creative and collaborative culture to deliver results. You embrace ambiguity and enjoy exploring new technologies delivering robust, scalable solutions.

This is a hybrid role and requires you to be onsite 2 days a week in Menlo Park, CA

Responsibilities

Ensure High Availability: Implement and maintain resilient cloud architectures, monitor system performance, and proactively identify and resolve potential bottlenecks or points of failure.
Incident Management: Play an active role in production on-call, responding swiftly to troubleshoot and resolve production issues. Minimize service disruptions and downtime by conducting thorough triaging and debugging of product or system issues. Continuously optimize the on-call process for sustainability and efficiency.
Automation and Tooling: Develop and maintain automation scripts, tools, and processes to streamline system deployment, monitoring, and management tasks. Your contributions will be vital in efficiently scaling cloud operations.
Performance Optimization: Optimize cloud infrastructure and applications for performance, scalability, and cost-effectiveness.
Security and Compliance: Collaborate with security engineers to implement best practices and ensure compliance with security standards and policies.
Monitoring and Alerting: Design and configure advanced monitoring systems to gain insights into system behavior, set up alerts, and respond proactively to potential issues. Create and maintain comprehensive dashboards and playbooks for production on-call.
Software Development Consultation: Engage actively in the entire software development lifecycle. Participate in system design reviews and provide valuable Site Reliability Engineer (SRE) insights during launch reviews, influencing and enhancing system architecture.

Preferred Qualifications

Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.
3+ years of professional experience maintaining production systems on Cloud based services and infrastructure.
8+ years of software development experience in one or more programming languages with a primary focus on leveraging, working on cloud-based services and infrastructure.
Strong knowledge of cloud platforms such as Google Cloud Platform, AWS, or Azure. We prefer AWS experience but will also entertain GCP or Azure
Practical experience with containerization technologies, including Docker and Kubernetes.
Familiarity with Python, Bash scripting and Ansible
Familiarity with infrastructure as code tools like Terraform is essential.
Solid understanding of databases, networking, security principles, and best practices.
Proficiency in using monitoring and alerting tools to detect and respond to potential issues effectively.

Desired Skills

AWS Certifications (such as Solutions Architect, Security, etc.)
Experience in a regulated industry or healthcare field

The expected, full-time, annual base pay scale for this position is $180,000 - $210,000. Actual base pay will consider skills, experience, and location.

Based on the role, colleagues may be eligible to participate in an annual bonus plan tied to company and individual performance, or an incentive plan. We also offer a long-term incentive plan to align company and colleague success over time.

In addition, GRAIL offers a progressive benefit package, including flexible time-off, a 401k with a company match, and alongside our medical, dental, vision plans, carefully selected mindfulness offerings.

GRAIL is an Equal Employment Employer and does not discriminate on the basis of race, color, religion, sex, sexual orientation, gender identity, national origin, protected veteran status, disability or any other legally protected status. We will reasonably accommodate all individuals with disabilities so that they can participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation. GRAIL maintains a drug-free workplace.

Staff Site Reliability Engineer #3718

Alternative Jobs

Responsibilities

Preferred Qualifications

Desired Skills

Alternative Jobs

Automate your job search with AI

Popular searches