At Mangopay, our mission is to power the payment infrastructure and payment operations of the world's biggest and most exciting marketplaces & platforms.
We provide marketplaces and platforms with powerful modular payment and regulatory solutions. Since 2013, we have enabled the success of some of the biggest names in e-commerce, retail, and cutting-edge platforms such as Vinted, Rakuten, Chrono24, La Redoute, Wallapop and over 2,500+ more.
Our team of 400+ is spread across Europe, with offices in Paris, Luxembourg and London. In an environment where marketplaces and fintech ventures are thriving, we're actively seeking exceptional individuals to tackle the challenges in our field and contribute to our growth. Our commitment to diversity is unwavering, and we are dedicated to promoting employee well-being, inclusivity, and equal opportunities. Joining Mangopay means you’ll be part of a dynamic, flexible, and rapidly growing team.
We’re currently looking for motivated and results driven Senior & Lead Site Reliability Engineers (SREs) to join our Platform team. As a valued member of the Technology department, you will have the opportunity to work closely with cross-functional teams to deploy and manage systems, drive operational efficiency through automation, and troubleshoot issues across multiple environments.
If you’re a senior engineer with a good cloud infrastructure background that is comfortable with ambiguity, and aim to simplify and improve how infrastructure works, this could be the role for you. As a team, we’re responsible for designing, building, and operating the services we consume from AWS, along with the software we run on top like Kubernetes, Kafka, Redis, PostgreSQL and more. We’re also responsible for operating our network, and being on-call for the things we own and run.
To achieve this, we’re organised into three teams within the Platform Universe; Platform Engineering, Data Engineering, and Operations. Each squad is responsible for solving a specific set of problems for our customers and our engineers. We’re looking for engineers who are interested in joining our Operations SRE squad.
This role is a remote opportunity.
What will you be responsible for?
- Designing and implementing automation tools and frameworks to streamline our operations and deployment processes. This will involve creating new tools as well as improving existing ones
- Leading efforts to design and implement scalable fault tolerant systems that can handle our increasing user base and traffic
- Identifying areas of performance optimisation and conduct capacity planning to automate future growth
- Participating in architecture and design reviews to ensure that our systems are scalable, reliable, and secure. You will be working with other engineers to make sure that our systems are designed and built for the long term
- Building, maintaining and continuously improving our monitoring, alerting, and logging systems. This includes setting up new tools and constantly finding ways to improve our existing ones
- Identifying and troubleshooting production issues and provide quick resolution. You will be responsible for identifying problems and finding solutions, as well as working with other teams to ensure that they are resolved quickly
- Collaborating with development teams to ensure that our systems are designed and built for reliability and scalability. You will be working with other teams to make sure that our systems are designed and built to be robust and scalable
- Monitoring and reporting on system performance and availability. You will be responsible for monitoring our systems to ensure that they are performing well and are available to our users
What do we expect from you?
- Strong experience with Amazon Web Services (AWS) is a must. You should have a deep understanding of AWS services and how to use them effectively
- Experience with migration projects and migrating environments is a must
- Experience with containerization technologies such as Docker and Kubernetes
- Extensive experience with Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, or Ansible
- Excellent problem solving and troubleshooting skills
- Strong experience with at least one programming language such as Python, Java, or Go
- Strong networking experience
- Experience with monitoring and logging tools such as Grafana, ELK stack, DataDog, Splunk or others
- Experience with CI/CD pipelines and tools such as TeamCity, GitLab, or CircleCI
- Strong understanding of networking concepts and protocols
- Excellent communication skills and ability to work in a team environment
If this role sounds of interest, we would love to hear from you!
We care about equal employment opportunities, so all qualified applicants will receive equal consideration regardless of their race, colour, religion, sex, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, or disability status.