ABOUT US
Our company culture represents an intermix between passion for technology, rock star output, and appreciation for a balanced, healthy lifestyle. Our Vision is to build and maintain a community that respects, values, and celebrates our individuality; fosters an inclusive and equitable experience for all; and empowers us to be our authentic selves each day.
Bluescape provides virtual workspaces for hybrid and remote teams to meet and interact with all of their mission critical content and data. We're all entrepreneurs here no matter what role you are in, so if you want to work with smart, collaborative people; work on interesting projects; and contribute towards a cool and innovative product, then we want to hear from you!
ABOUT THE ROLE
Reporting to the SRE Manager, this position will be responsible for the development and maintenance of automation, tools, and configurations, and systems & application service uptime in a high-availability customer-facing business critical 24x7 SaaS environment where uptime is critical and requires immediate response to service impacting issues. You will have or will develop skills in assessing the tradeoffs in installation, configuration, and diagnostics in open source Linux systems in a large-scale DevOps environment. The right candidate will have excellent verbal and written communication skills with demonstrated ability to work across departments towards a common goal. Passion for implementing open source tools, systems / network / application diagnostics frameworks, CI/CD environments for a SaaS enterprise with a structured approach to achieve high-quality sustainable production operations will be required. Candidate will have knowledge of deployment of Java and/or Node.js and/or other typical enterprise application frameworks and languages.
RESPONSIBILITIES
- Take on new projects, prototype, and manage execution to completion.
- Develop and manage consistent and coherent SRE processes and practices to support software development, testing, builds and deployment.
- Guide and develop infrastructure & tools architecture design to enable high uptime, minimize failures, ensure applications & data security and expedite diagnostics.
- Identify, diagnose, and resolve complex technical issues efficiently in a live production environment and drive to quick resolutions – as well as – leverage those events to improve current technology & processes towards prevention of such issues.
- Work closely with the Engineering teams to escalate and/or triage issues to resolution.
- Review tickets and diagnostics with a post-mortem to identify trends/chronic issues.
- Hands-on implementation & upgrade of tools for monitoring, trending & diagnostics.
- Audit proactive monitoring of all systems to detect and resolve problems to ensure uninterrupted operation of all infrastructure systems.
- Update corresponding documentation on installation process & configurations.
- Consider security concerns with all work.
- Automate, Automate, Automate everything.
SKILLS AND REQUIREMENTS
- Remote position
- Bachelors in computer science or equivalent. Advanced degree or equivalent work experience is preferred
- 5+ years of relevant work experience
- Solid knowledge of cloud architecture concepts and practices
- Knowledge of architectural design patterns, e.g. immutable production, fail fast, stateless etc.
- Strong understanding of Application release management & configuration, upgrades/patches & support of Unix/linux systems – applications on Node.js or similar in a SaaS environment.
- Passion for troubleshooting and triage of incidents, bringing issues to rapid resolution
- Ability to apply detailed knowledge of organizational procedures to make independent decisions and serve as a credible resource for technology teams
- Strong verbal and written communication skills, with the ability to work effectively across organizations
- Excellent problem-solving skills with the ability to analyze situations, identify existing or potential problems and recommend solutions
- Software engineering skills and computer science knowledge
- Excellent understanding of scalable, micro-service based architectures and experience in applying them to real-world problems
- Ability to take on-call escalation rotation & co-ordinate work under production critical situations is essential
- Knowledge of the use and maintenance of continuous integration and continuous deployment systems
BONUS POINTS
Extensive working knowledge of as many of the following technologies and areas as possible:
- Linux, Docker, Kubernetes, OpenShift & related open source software
- Automation using Ansible and terraform in a cloud environment
- Working knowledge of databases
- Good Networking fundamentals with Protocols, Load Balancers, VPN, switches/routers/firewalls, LDAP, SNMP, SMTP
- Good understanding of filesystem Technologies – to build and/or troubleshoot filesystem issues
- Virtualization/Cloud technologies – Strong working knowledge of AWS with a good understanding of other technologies like OpenStack, OpenShift, Google Cloud
- Web servers/reverse proxies such as apache, nginx and haproxy
- Web application frameworks in node.js, python, etc.
- Monitoring, trending & diagnostics tools
- Logging tools such as Splunk, ELK stack, etc.
- Using source code control systems such as git (or similar)
- Work/defect tracking & Wiki systems such as JIRA / Confluence
- Knowledge of the use and maintenance of continuous integration and continuous deployment systems.
- Ability to prioritize & balance activity between projects for longer-term impact –and- immediate production critical requirements with a customer focus
- Be a self-starter and require minimal guidance
DISCLAIMER
Bluescape is an equal opportunity employer. In keeping with the values of Bluescape, we make all employment decisions including hiring, evaluation, termination, promotional and training opportunities, without regard to race, religion, color, sex, age, national origin, ancestry, sexual orientation, physical handicap, mental disability, medical condition, disability, gender or identity or expression, pregnancy or pregnancy-related condition, marital status, height and/or weight