Site Reliability Engineer

EA Digital Illusions CE AB / Datajobb / Stockholm
Observera att sista ansökningsdag har passerat.

Visa alla datajobb i Stockholm, Solna, Lidingö, Sundbyberg, Danderyd eller i hela Sverige
Visa alla jobb hos EA Digital Illusions CE AB i Stockholm, Uppsala, Göteborg eller i hela Sverige

We are EA
And we make games - how cool is that? In fact, we entertain millions of people across the globe 24/7 with the most amazing and immersive interactive software in the industry. But making games and delivering a flawless player experience is hard work. That's why we employ the most creative, resourceful, and passionate people in the industry.

The Challenge Ahead
We are a group of Site Reliability Engineers who collaborate with multiple teams to provide online services that enhance the game experience. We support a multi-billion-dollar video game ecosystem and various non-development business units within EA - our portfolio is wide. Our environments are continuously challenged by marketing promotions, game launches, and security threats. We are passionate about automation and ensuring high standards.

Who You Are
A self-starter with a considerable breadth of technical knowledge and the ability to dig deep
Someone who communicates well with people across dozens of teams and practices
An engineer with a passion for excellence, a devotion to automation, and an eye for efficiency
An engineer with development experience who has improved operations with code
A curious problem solver who isn't afraid to get dirty

Who We Are
We are a multi-discipline team of engineers supporting our live services and the developers who create them. As Site Reliability Engineers our role covers the entire life-cycle of a product, from helping the developers with architecture and delivery to on-call incident response and triage. We focus heavily on automation and continuous integration/delivery with an emphasis on solving operations issues using software, ensuring that everything we deliver is robust, efficient, and supportable. Our responsibilities include:
Using code to solve common operational problems in a results-focused way
Establishing monitoring, alerting, and dashboarding to continuously improve the observability of player experience, infrastructure and application performance, and business metrics.
Hands-on design, analysis, development, and troubleshooting of highly-distributed large-scale production systems spanning on-prem and cloud-based hosting
Performing root cause analysis and post-mortems with an eye towards future prevention
Being the escalation path for on-call incident response and triage
Using automation technologies to ensure repeatability, eliminating toil, reducing mean time to detection and resolution (MTTD & MTTR) and repair services
Using scale testing to measure, tune and optimize system performance
Designing and implementing CI/CD and app deployment solutions for anything we or our dev teams build
Preemptively creating stability, security, and performance improvements
Making sure every service is tuned for high-availability and disaster recovery
Maintaining security standards across everything we support
Producing documentation, runbooks, and support tooling for online support teams

Your Skills
The systems we support are incredibly diverse, produced by dozens of teams from around the world. The ideal candidate will have a diverse skillset and always be eager to expand it. More importantly, they will be able to apply their conceptual understanding to new technologies and tools rapidly. Being a self-starter and having a personal dedication to continuous learning is key. The below is a list of skills we are looking for, in addition to those the successful candidate brings:
Leading the example of engineering quality with testing, teaching and a team-minded attitude
Cross functional knowledge with system, storage, networking, security and databases
Experience in monitoring infrastructure and application availability and reliability to ensure SLI and SLO
A strong understanding of *nix is mandatory; familiarity with RHEL and Debian is preferred
Understanding of standard networking protocols and components such as HTTP, DNS, ECMP, TCP/IP, UDP, ICMP, the OSI Model, subnetting, and load balancing strategies.
Automation and orchestration: Chef, Puppet, Terraform, Packer, Jenkins
Experience in languages such as Python, Ruby, Bash, Java, Go, Perl, C/C++; strong skills in reading, understanding, and writing code in the same
A strong understanding of distributed systems is a must
An understanding of the CAP theorem, Microservices, Twelve-Factor Apps, and techniques for high availability, service discovery, secret management, etc.
Virtualization, containerization, and cloud computing: AWS (preferred), GCP, Azure, VMWare ecosystem, Kubernetes (preferred), Docker, Vagrant, etc.

What's in it for you? Glad you asked!
We love to brag about our great perks like comprehensive health and benefit packages, tuition reimbursement, 401k with company match, and, of course, free video games. And since we realize it takes world-class people to make world-class games, we offer competitive compensation packages and a culture that thrives off of creativity and individuality. At EA, we live the "work hard/play hard" credo every day.

Publiceringsdatum
2020-09-16

Så ansöker du
Sista dag att ansöka är 2020-09-26
Klicka på denna länk för att göra din ansökan

Adress
EA Digital Illusions CE AB
Södermalmsállen 36
11828 Stockholm

Omfattning
Detta är ett heltidsjobb.

Arbetsgivare
EA Digital Illusions CE AB (org.nr 556710-6520)
Södermalmsállen 36 (visa karta)
118 28 STOCKHOLM

Arbetsplats
Ea Digital Illusions Ce AB

Jobbnummer
5362208

Observera att sista ansökningsdag har passerat.

Site Reliability Engineer

Prenumerera på jobb från EA Digital Illusions CE AB