Senior SRE Engineer
London (United Kingdom)
Category: Other techies
google-cloud-platform terraform docker performance amazon-web-services sysadmin
Beamery’s mission is to help companies acquire their greatest asset; their people. Our Talent Operating System lets companies attract, engage, and retain the best talent - it’s the one solution that enterprises need to deliver exceptional experiences at every stage of the talent journey, and build meaningful relationships with their future employees.
We are lucky to be one of the fastest growing companies in the world, and even more lucky that the people in Beamery are not only superb at their jobs, but are a reliable, friendly bunch who leave egos out of the equation. We are a team that cares about the right outcome above everything else.
We're looking for a Senior Platform Engineer to join our rapidly expanding Engineering Team here at Beamery HQ in London. You will join the Platform Team and work with cutting-edge technologies on an industry leading SaaS product that power the experiences for millions of candidates across many of the world's biggest companies and brands.
This role is about giving you a broad operations remit to ensure Beamery is using the latest stable trends in DevSecOps and SRE spaces. You will need to work across technologies and a successful candidate needs to pick-up new stacks quickly, as well as propose and implement new tools in quick iterations. You will also be providing technical leadership and mentorship to ensure Beamery engineering standards are continuously improved.
What will you do as a Senior Platform Engineer at Beamery?
Building market leading B2B multi-tenanted SaaS platform disrupting recruitment industry
Ownership & enablement of mission-critical production multi-tenanted SaaS operations: on-rota, monitoring, alerting, configuration and change management, incident management and disaster recovery
Ownership & enablement of service scalability, elasticity, fault-tolerance and disaster recovery
Continuously improving SLOs such as availability, performance and recoverability
Elimination of operational toil & engineering knowledge
Key contributor to service release management
Accountable for technical mentorship of junior engineers
Evolving Beamery engineering standards (coding standards, TDD/BDD practices, frameworks, tooling, docs) and processes (design reviews, code reviews, branch management, deployment, release management and service operations)
We'd love to meet someone who has…
7+ years of hands on experience as SRE/DevOps engineer delivering business critical scalable cloud based services preferably multi-tenanted B2B or B2C multi-tenanted SaaS services deployed on top of GCP or AWS IaaS/PaaS
Excellent understanding of DNS, cloud networking and infrastructure
Excellent debugging and analytical skills: ability to isolate root cause across networking/infrastructure, application and database stacks
Experience of managing email deliverability at scale (millions of emails monthly) is essential & in-depth knowledge of deliverability tools like 250ok / MXToolbox is highly desirable as well as experience managing ESPs (e.g Sendgrid) at large scale, IP Pooling and email feedback loops
Operational experience of deploying and running services at scale on top of Docker/Kubernetes stack and a service mesh (i.e. ISTIO) is highly desirable
Operational experience of using RabbitMQ, Kafka, Mongo, ElasticSearch services at scale is highly desirable
Strong experience with logging (i.e. ELK, stackdriver) and monitoring (ie. Datadog, Prometheus) solutions
Operational experience with orchestration tools (CI/CD) and Infrastructure-as-Code tooling (i.e. Terraform) is a must
Understanding of IaaS/PaaS/Serverless deployments and related operational tradeoffs is a must
Experience of delivering software using Agile delivery methodologies is a must (SCRUM/Kanban)
Experience of using software engineering practices such as TDD, pair programming, testing automation, code reviews, code refactoring, branch management (GitFlow) and CICD practices
Degree in computer science/mathematics/physics or related technical subject is highly desirable
How will success be measured?
Service availability, performance and recoverability SLOs
Service release failure rates
Time spent during on-rota on production issues
Dog friendly office
Regular socials, food & drink
Quarterly team-building events
Flexible learning & development budget