Site Reliability Engineer

Employment Type

: Full-Time

Industry

: Engineering



About Us

Venmo was founded on the principles of breaking down the intimidating barriers around financial transactions to make them intuitive, friendly, and even fun. And it worked: people love sending money with Venmo, and we’re growing by leaps and bounds!

But we’re only just getting started. We want to take that magic of sending money with Venmo and cascade it into every place where people use money. That means connecting people to their money in the most intuitive and fun way possible, then connecting people with each other. Users already love Venmo, but we know there are lots of things we haven’t thought of to make the experience of using Venmo even more delightful and valuable. All that’s going to take a lot of figuring out. Let’s figure it out together!

Engineering at Venmo

At Venmo, we are creating a product that people love. We strive to create a delightful user experience while connecting the world and empowering people through payments. We are looking for intellectually curious people who want to be inspired and inspire others to change the world.

Engineering is a craft, and at Venmo we want the internals of our software to be as elegant as the end user experience we are designing. We spend our days scaling our infrastructure and building new features to meet and exceed our user’s needs and wants. We teach and learn from one another, and push each other to be at our creative and analytical bests.

What You’ll Do

  • Engage in and improve the whole lifecycle of our products—from ideation and design, through development, launch, operation and iteration.
  • Partner with product engineering teams through PDLC on design, development, capacity planning, and ramp plans to ensure Venmo continuous to scale and maximize availability.
  • Ensure sufficient logging, monitoring and alerting strategies around availability, latency and overall system health.
  • Scale systems sustainably through automation, and evolve systems by pushing for changes that improve reliability and velocity.
  • Host incident reviews and blameless post mortems.
  • Continuously improve Incident Management policies, procedures, tools, and implementation.
  • What We’re Looking For

  • BS degree in Computer Science or related technical field involving systems engineering (e.g., physics or mathematics), or equivalent practical experience.
  • Software Development background with ability to analyze and improve existing codebase.
  • Cloud based architecture experience (Ideally AWS)
  • Established ability to diagnose technical problems, debug code, and automate routine tasks.
  • Ability to support a 24/7/365 always available production grade service.
  • Experience in one or more of the following: Java, Python, Golang, or shell scripting.
  • Experience with Unix/Linux operating systems internals and administration.
  • Ability to debug and optimize code and automate redundant tasks.
  • Great analytical and problem solving skills.
  • Familiarity with orchestration tools (Ansible, Puppet, Chef, Terraform, etc.).
  • Established experience with monitoring/logging tools and best practices.
  • Preferred Qualifications

  • Proficiency in managing cloud based large-scale infrastructure.
  • Expertise in designing and troubleshooting large scale distributed systems.
  • Strong communicator, both written and spoken.
  • Kubernetes and container experience.
  • We know the and can get in the way of meeting spectacular candidates. Please don't hesitate to apply.

    * The salary listed in the header is an estimate based on salary data for similar jobs in the same area. Salary or compensation data found in the job description is accurate.

    Launch your career - Create your profile now!

    Create your Profile

    Loading some great jobs for you...