Staff Site Reliability Engineer

Job Description

What Youll Do
We are looking for a motivated, self starting, Staff Site Reliability Engineer for our Core Products Team, which maintains and develops new features for all of Dattos backup appliances (~75K devices and growing quickly). The backup device is a physical or virtual appliance that takes block-level backups of Windows, Mac, and Linux machines, turns them into raw disk images and stores them on a local ZFS-based disk array. In the case of a disaster, our customers restore these backups/disk-images instantly as KVM-based virtual machines, iSCSI targets, Samba shares, and many other formats. We also offer a virtual VMware/Hyper-V-based appliance and integrate with their hypervisors. We write code in modern Symfony-based PHP (with some Python and C++ sprinkled in), and we strongly rely on our Ubuntu-based Linux stack. We do amazing and exciting things every day, such as detecting when a VM has booted successfully, injecting drivers into the Windows registry before boot, and generating vmdk files on the fly. On top of that, we work with many low-level technologies, such as hypervisors and the ZFS filesystem. This is not your average PHP webdev gig! You will report to the Sr. Director of Software Engineering.
Your job function and responsibilities include:
Collaborate with Product and Software Development teams to define the Core products reliability strategy including Service Level Objectives (SLOs) and Indicators (SLIs)
Drive product reliability improvement through monitoring, alerting, and application of software development best practices
Collect SLI metrics and establish monitoring based on SLO thresholds and other product requirements
Define and configure transaction volume, traffic, performance, and error rate monitoringIncluding alert thresholds, capacity planning, and performance impact analysis
You will participate in SRE software engineering, writing code for the continuing reduction of human intervention in operational tasks and automation of processes
Troubleshoot complex issues quickly and effectively
Develop a balanced on-call program with appropriate staffing
Communicate with Users, Support, and Development teams in the event of an incident
Diagnose and develop root cause solutions for failures and performance issues in our production environment
Your Experience:
Bachelors degree in Computer Science or equivalent experience
Experience with software development, automation, infrastructure as code, and data-driven analysis
Experience with configuration management tools such as Puppet, Ansible, and Salt
Hands-on experience with mainline programming and scripting languages such as Bash, Python, Perl, Ruby
Familiar with standard tools and platforms that enable continuous delivery such as GitLab, Jenkins, Kubernetes, Docker, JIRA, and ServiceNow
Significant experience with virtualized and bare metal infrastructure; KVM and OpenStack experience strongly preferred
Experience with monitoring and management in public cloud is highly desired
Strong root cause analysis and troubleshooting competency
Strong tendency to automate and monitor everything
Excellent communication skills
Ability to operate in a fast paced environment
Self-motivated & willing to learn
Ability to work independently and as part of a team
Qualifications:
Bachelors degree in Computer Science or equivalent experience
Solid understanding of Objection Oriented Programming fundamentals
Experience with OOP languages such as Java, PHP, C#, or C++
Experience with distributed systems, hypervisors or file systems a plus
Experience working with automation and data driven analysis
Strong root cause analysis and troubleshooting competency
Strong communication ability

Work in United States
Employment Options
Base Salary

150,000 - 200,000 USD

Academic Degree
Professional Experience
Skills
  • Grafana
  • SLA
  • SLO
  • SLI
  • Linux
  • Python
  • PHP
  • Java
  • C++
  • IaC
Apply to Job

Company

Company Name

Datto, Inc

Company Details

View Datto, Inc details

Recruiter

David Feligno

Senior Technical Recruiter

Rochester, New York, United States

View Details

Recruiter Contacts

Phone
(619) 507-8124