National Express are recruiting an experienced Site Reliability Engineer to join our team, based at Head Office, Birmingham. As the successful candidate, you will with a focus on infrastructure, you will play a pivotal role in ensuring the reliability, performance, and security of our distributed infrastructure environment. You will leverage your deep technical expertise and problem-solving skills to support, evaluate, build, deliver, and maintain a high-quality infrastructure that meets the evolving needs of our business.
What you'll do:
Design, implement, and maintain highly available and scalable systems on AWS
Develop and maintain automation scripts and tools to streamline operations and reduce manual tasks
Monitor system performance, identify bottlenecks, and implement optimizations to improve response times
Forecast resource requirements and ensure adequate capacity to meet business needs
Operate and maintain traditional IT infrastructures, cloud ecosystems, and IT services to meet business needs. Manage compute, storage, and networking environments in a MSP environment
Manage infrastructure using IaC tools (e.g., Terraform, CloudFormation) to ensure consistency and reproducibility
Implement robust monitoring and alerting systems to proactively identify and address issues
Contribute to security best practices and implement measures to protect our systems and data
Manage relationships with 3rd party vendors for technical support, build, and maintenance
Participate in projects and service improvements related to infrastructure and data centre services. Identify, own, and implement proactive maintenance plans, including upgrades and patches
Provide specialist-level incident and problem management support. Perform problem identification, root cause analysis, and recommend service improvements
Meet service level agreements (SLAs) for infrastructure services
Adhere to National Express's processes, including change controls, problem records, and supportability of technology
Maintain infrastructure supporting documentation in line with improvements and maintenance activities
Drive continuous improvement initiatives to enhance system reliability and efficiency
What you'll need:
Three years plus of hands-on experience leading to a deep understanding and proficiency of AWS services (e.g., EC2, S3, RDS, Lambda, CloudFront) and best practices
Experience working within an ITIL framework in organisations of 3000+ users
Excellent customer-facing skills, including critical issue escalation resolution, root cause analysis, and accountability
Understanding of Microsoft Server platforms, Hyper Converged Infrastructure, Domain services with hand-on experience, Backup, business continuity, Disaster recovery, and data centre operations
Proven experience with Infrastructure as Code (IaC) tools like Terraform
Strong scripting skills in Python or Bash
Experience with monitoring and observability tools such as AWS CloudWatch, Grafana, or Datadog
Knowledge of containerisation technologies (Docker, Kubernetes)
Understanding of serverless technologies and their use cases
Knowledge of Microservices architectures
Solid understanding of networking concepts (TCP/IP, DNS, routing)
What we offer in return for your hard work and commitment...
Free Bus & Coach travel for yourself
Complimentary coach travel for a Nominated Person or complimentary bus travel for a Spouse or Partner
50% discount for friends and family on full fares on our coach services
Life Assurance
Company pension
Employee Assistance programme
Private online GP service
National Express is committed to creating an inclusive workplace that reflects the diverse communities we serve and we positively encourage applications from all sectors of the community.
We are a Disability Confident Committed employer and should you require any adjustments at any stage of the recruitment process please let us know.
We reserve the right to close this advert early if we receive a high volume of applications before the advertised closed date.
Things to Note...
At National Express, we are really proud of our health and safety record and as a result, we operate a Drugs and Alcohol Policy which is applicable to all employees.
As part of your initial assessment, we will complete Drug and Alcohol testing and you may be subject to random tests during your employment.
You are applying for...
Site Reliability Engineer
Salary: Up to £70,000 per annum
Working Pattern: Full Time
Contract Type: Permanent
OR
Please upload your CV
Simply drag and drop your CV here, or select a file from your machine
Drop hereupload_file
For Security purposes you can only upload the following types of documents: