SOC Engineer
SkipTheDishes
Winnipeg, MB, Canda
3d ago

Description

We’re revolutionizing the way humanity eats, and there’s a lot of room for optimization and growth. That’s where you come in.

Your ingenuity will help us continue to drive innovation, making an impact on the reliability, performance, and scalability of Skip’s industry-leading technology.

The Role :

The Service Operations Team is responsible for the primary operation & availability of SKIP / Just Eat’s platforms and services around the world.

The team is responsible for identifying and resolving issues in production (ideally before they become visible to customers) and working with the wider engineering community to ensure we chase down and mitigate areas of risk.

You'll be working as part of the Reliability Engineering department to help define and shape the future of our platform and services.

You’ll be operating our global AWS Platforms at scale.

Skills :

  • Incident Management experience
  • Strong Troubleshooting, problem-solving and investigative skills
  • Experience operating a production environment
  • Excellent communication skills
  • Experience of working in an agile environment to deliver software
  • Experience of Scripting / Automation
  • Knowledge & Experience :

  • Worked with C# .net, Java, Python, Ansible
  • Experience of ELK, Splunk, Prometheus, Graphite, Grafana
  • Experience of AWS or Other cloud providers
  • Used APM Tools in production before
  • Operated a production Windows or Linux Environment
  • Experience of working in an Agile Environment
  • Worked a shift pattern before
  • Accountable & Responsible For :

  • Monitoring our production environments and reacting fast to prevent or reduce customer visible impact.
  • Accurate Escalation of incidents when required
  • Communication of production issues to key stakeholders
  • Troubleshooting, reproducing and mitigating issues in our production environments
  • Incident management of high severity issues impacting our sites and services 24 by 7
  • Creating automation and tooling for the SRE team to improve our processes
  • Working on the Reliability Engineering backlog to improve how we operate our services in production
  • Supporting service prior to go-live through pre-launch reviews
  • Participating in Wargames to test our operational response and Identify areas of weakness in our platforms.
  • Apply
    My Email
    By clicking on "Continue", I give neuvoo consent to process my data and to send me email alerts, as detailed in neuvoo's Privacy Policy . I may withdraw my consent or unsubscribe at any time.
    Continue
    Application form