Our Company Changing the world through digital experiences is what Adobe’s all about. We give everyone—from emerging artists to global brands—everything they need to design and deliver exceptional digital experiences! We’re passionate about empowering people to create beautiful and powerful images, videos, and apps, and transform how companies interact with customers across every screen. We’re on a mission to hire the very best and are committed to creating exceptional employee experiences where everyone is respected and has access to equal opportunity. We realize that new ideas can come from everywhere in the organization, and we know the next big idea could be yours!
The Opportunity
Adobe's Reliability Engineering team is looking for an experienced Database Reliability Engineer, passioned about SRE principles to join our Database Services team.
As part of Reliability Engineering organization, Database Services team complements the SRE team by providing database operational support and architectural guidance to cloud native teams that are building customer-facing services. Our focus revolves around designing, testing, building and operating datastores with scale and reliability requirements.
We're operating over 25 Adobe products across different regions in the public clouds, using a mix of open-source database technologies: Cassandra, MongoDB, MySQL, Postgres.
What you'll Do
Work on reliability and performance aspects for core database infrastructure pieces that allow Adobe products to scale
Ensure the highest level of uptime and Quality of Service (QoS) to Adobe’s mission-critical database environments through operational excellence
Implement solutions for automating deployment, provisioning and managing large-scale database environments
Use domain expertise to consult application development teams, providing guidance and database architecture solutions
Improve observability by implementing smart monitoring, tracing and logging.
Participate in a cross-regional on-call rotation using a follow-the-sun model
Act as main point of contact for production incidents, perform root cause analysis, identify and resolve underlying problem patterns, while working towards develop automated and self-healing solutions
Work in a dynamic, fast-paced environment with distributed teams and inter-dependent services
What you need to succeed
Bachelor's degree in Computer Science or related technical field
At least 3 years relevant production experience in supporting at scale, highly available, mission-critical environments running at least one of the following open-source database management systems:
Experience working with hyperscale cloud providers (AWS and/or Azure, GCP) and running at scale database environments in virtual computing environments (Amazon EC2, Azure VM, etc)
Experience with infrastructure automation and configuration management tools such as Chef, Ansible, Puppet, Terraform
Deep understanding of cluster management areas, such as scaling, consistency tuning, replication, and multi-datacenter configuration
Experience in securing, monitoring, capacity planning, full-proof DR, backup & recovery for distributed database systems
Strong understanding of high availability strategies, horizontal partitioning, clustering
Experience in performance monitoring and storage performance optimization, tuning database server configurations, queries, and indexes
Strong data modeling and data structure design skills
Good understanding of Linux OS concepts and of Linux and Unix Shell
Experience with monitoring software such as Prometheus, Grafana, New Relic
Proficiency in any of the scripting language (e.g. Python /PHP/Perl/ Ruby)
Intellectual curiosity to pursue the unknown and to continuously learn.
Good social skills and desire to work in a dynamic and fast-paced environment
Able to work independently with minimum need or supervision.