We are looking for a Software Developer with an SRE background who will be able to deliver this role.
Team: 4 DevOps and several office IT admins, 1 SRE.
Projects you will work on
- Infrastructure automation with Terraform;
- Improving monitoring and tuning alerts in DataDog as well as new metrics development;
- Interaction with product managers and teams to develop SLO services to improve their reliability;
- Automation of auxiliary processes on duty (on-call);
- Setting up and tuning the OpsGenie alert system.
About the role
You’ll be working in a dynamic, rapidly evolving environment with the following responsibilities:
- Advise developers on the choice of service level indicators and target indicators of service indicators;
- Participate in the duty of the SRE team (solve user problems and respond to alerts and prevent incidents);
- Manage incidents and conduct incident reviews;
- Collaborate with development teams to ensure the stable and reliable operation of services;
- Set up a monitoring system to respond to symptoms of problems;
- Help to equip services with tracing, metrics and logs;
- Write documentation for the actions you perform (run book), in order to define repeatable processes and automate them;
- Ensure the observability of services and systems;
- Improve operational processes in cooperation with the DevOpS team;
- Debugging and investigating production issues in services and different levels of the stack;
- Improve team practices through code review and incident handling.
What we expect
- You apply at your work Grafana or Prometheus;
- Have skills in finding and solving problems in distributed systems;
- Know what percentiles are and understand charts;
- You are good at with relational databases, you know SQL (writing complex queries);
- Understand how HTTP-API, gRPC work;
- Have experience with one of the monitoring and logging systems (we use DataDog) and manual problem solving;
- Solve tasks with Docker, Kubernetes, Helm or similar technologies on a daily basis;
- Know OpsGenie/PagerDuty notification system or similar;
- Familiar with flexible methodologies (as part of a Scrum team).
As a plus
- Worked with Google Cloud Platform;
- You can develop Golang microservices: commercial experience or pet projects;
- Can read and understand HTML, JS, TypeScript, Python, Shell/Bash;
- Have experience with SLO/Error Budget;
- Able for using API tools: curl, Postman/Bloom or similar;
- English: Upper-Intermediate.
What you can expect
- We offer flexible working hours and trust you to work enough hours to do your job well, at times that suit you and your team.
- A working environment that gives you autonomy and responsibility from day one.
- You should be comfortable with the idea that the quality of your work will influence the shape of your career.
- Participation in company’s employee stock options program.
- Health Insurance
We are passionate about creating an equitable, high-performing workplace that gives people from all backgrounds the support they need to thrive, grow and meet their goals (whatever they may be).
If this sounds exciting to you, we’d love to hear from you!