Responsibilities:
- As a Site Reliability Engineer, you will combine software and infrastructure engineering to build and run large-scale, distributed, fault-tolerant systems
- Participate as a stakeholder in planning the product roadmap, sprint planning, standups
- You will be responsible for the implementation of scalable and resilient infrastructure
- Scale systems sustainably and development velocity through automation
- Build and improve performance of software integration and deployment automation for delivering product to our cloud platform
- Optimize logging, monitoring and alerting systems to ensure the highest availability and uptime
- Investigate and solve underlying ambiguity technical problems in different programming languages
- Troubleshooting networking, compute, and Kubernetes failures
- Practice sustainable incident response, perform root cause analysis, resolve incidents and write precise blameless postmortems
- Hardening security all around.
- Continuously improve the way the team operates, and ensure everyone is happy and efficient
- Maintain existing services and tools, augmenting and replacing as required
- Research, design, develop, adopt tools to aid in improving infrastructure reliability
- Documenting systems, particularly tribal knowledge
- Sharing your knowledge and experience with others in the squad
Respond to emergencies off-work hours.
Preferred qualifications:
- 1 years of demonstrated software infrastructure experience as a site reliability engineer / DevOps engineer / DevSecOps engineer / software engineer.
- Infrastructure and application security engineering experience or understanding of information security is a plus.
- You have an understanding of basic networking.
- You have basic knowledge in Linux monitoring, troubleshooting, and administration.
- Competence in at least one programming language. Must be able to write and evaluate code for scalability/runtime. (Ideally, VueJS, Golang, NodeJS)
- Experience with container orchestration platforms such as Kubernetes
- Experience working with at least one DBMS Eg: MSSQL, MySQL
- Experience with monitoring, APM, and logging tooling E.g EFK, Grafana is a plus
- Experience with configuration management tools Eg: Helm is a plus
- Experience with Infrastructure-as-Code tools such as Terraform is a plus
- Experience with Kubernetes Multicluster is a plus
- Experience working with at least one major Cloud Provider (AWS/Azure/GCP) is a plus
- Understanding of cloud native security requirements is a plus
- Experience and track record of codifying and automating infrastructure in a production environment is a plus
- You’re comfortable working in a stakeholders that deals with large amounts of ambiguity technical problems every day.
- Be proactive, work independently and eager to learn new technologies.
- A genuine passion for technology and continuous improvement, with the ability to find innovative solutions to solve technical challenges.
- Having Can-Do and Growth Mindset
- Fast learner and high Googling skill