VMware

Senior Site Reliability Engineer - Opportunity for Working Remotely (BB-47453)

Encontrado en: Neuvoo CR

Descripción:

Job Description

Cloud Services Business Unit delivers the full VMware portfolio of enterprise capabilities as an integrated set of cloud services, to enable consistent infrastructure and operations across every major public cloud, or service provider environment.Our team enables Cloud Providers across the globe to consume VMware products. By offering a wide range of VMware-based cloud services on a geographical basis, Providers can offer cloud services that quickly and seamlessly extend their customer’s data center into the cloud using the same VMware products and tools they already use on premise.Role : As a Senior Member of Technical Staff, Site Reliability, you will collaborate closely with product development teams on management and deployment of multiple SaaS offerings. You have a background running large scale applications in Public Cloud(AWS, GCP, Azure) deployed over Kubernetes. You are excited about helping teams be successful in building reliable, self-healing services. Responsibilities :
  • Participate in architectural reviews with reliability and resiliency in mind.
  • Recommend preventive and corrective actions for incidents.
  • Collaborate with teams on improving deployment automation, improving resiliency and security of our cloud products. You’re intimately familiar with CI/CD tools and methodologies and know how to get the most out of them
  • Comfortable working with development teams on addressing reliability and scale concerns across the stack. You’re just as much dev as ops and flourish working in an Agile model
  • Help teams improve the observability of their services through application and infrastructure instrumentation. Monitoring, alerting, metrics, and deep introspection of applications is a must and an area you’re passionate about
  • Troubleshoot complex operational issues within a microservices based architecture
  • Develop tooling to enhance development and troubleshooting efficiency
  • Participate in the on-call rotation in keeping the Availability as per SLA.
  • Requirements :
  • 5-8 years of SRE/DevOps experience working on highly scalable distributed systems
  • Experience with metric and log aggregation tools (Prometheus, ELK, etc.)
  • Experience with Monitoring tools like Grafana/Wavefront
  • Experience working on Terraform/Ansible/Helm
  • Knowledge of relational and non-relational databases, networking, Linux internals, filesystems, web architecture, CI/CD principles
  • Scripting Experience with any programming language
  • A solid understanding of cloud-based architectures and concepts, with hands-on experience using Public Clouds and Kubernetes
  • Experience using Git
  • Strong interpersonal communication skills (including listening, speaking, and writing) and ability to work well in a diverse, team-focused environment with other SREs, Engineers, Product Managers, etc.
  • A "team-player attitude": rather than celebrating heroic effort pulled off to resolve an incident, you prefer engaging in engineering practices that avoid the incidents in the first place

  • Category : Engineering and Technology
    Subcategory: Site Reliability
    Experience: Manager and Professional
    Full Time/ Part Time: Full Time
    Posted Date: 2021-04-05


    calendar_todayhace 3 días

    report

    info FULL TIME

    location_on Heredia, Costa Rica

    work VMware

    Aplicar:
    Autorizo expresamente a la Términos y condiciones

    Empleos similares