Senior DevOps Engineer(Promtheus/Grafana)
Asia / Taiwan, Taipei
Engineering – DevOps /
Full-time Onsite or Remote /
Hybrid
Responsibilities:
- Design, implement, and manage comprehensive monitoring solutions to ensure high availability, performance of our microservices infrastructure and applications.
- Utilize advanced monitoring tools and scripting to automate the monitoring of our cloud environments, focusing on AWS.
- Develop and maintain robust logging and alerting mechanisms to identify and mitigate potential issues proactively.
- Collaborate with infra team to integrate monitoring solutions into the CI/CD pipeline, ensuring seamless deployments and operations.
- Conduct performance analysis, capacity planning, and scalability testing to ensure our systems meet current and future demands.
- Lead incident response and troubleshooting efforts, utilizing monitoring data to quickly resolve operational issues.
Requirements:
- Minimum of 5 years of hands-on experience with Kubernetes, Elasticsearch, Promtheus, Grafana and AWS, with a strong emphasis on monitoring and observability in cloud-native environments.
- Proficiency in promgraming languages (such as Python, Go or Rust) for automation of monitoring tasks.
- Experience with infrastructure as code (IaC) tools, and strong understanding of CI/CD principles, including experience with Docker and Kubernetes for container orchestration.
- Deep knowledge monitoring tools (such as Prometheus, Grafana or ELK stack) and strategies for large-scale environments.
- Proven track record in managing and troubleshooting large-scale distributed systems, with an emphasis on performance tuning and optimization.
- Excellent problem-solving skills, with a focus on delivering high-quality, reliable, and scalable infrastructure solutions.
- Strong communication and teamwork skills, with the ability to work effectively in a fast-paced, collaborative environment.