Job Details

Senior Systems Reliability Engineer (SRE) - Software Infrastructure

Job Info:

Category: Development, Infrastructure
Company Description: Leading global provider of data, news and analytics
Salary: Highly Competitive, Depending on Experience
Position Type: Permanent
Job Number: 8537

Job Description:


Our client's systems are fast and reliable and this is the team that makes that possible:

We build middleware - the software infrastructure designed for creating large-scale, fault-tolerant applications that run on thousands of machines throughout the world. We're two dozen C++ programmers building a complex infrastructure using a variety of programming paradigms such as RPC, publish/subscribe and message queues. With thousands of clients depending on our infrastructure solutions, we are looking to grow our SRE team. That's where you come in.


What's in it for you:

As a Systems Reliability Engineer (SRE) working on this critical infrastructure, you'll focus on automating everything from build and deployment to reaction and remediation to outages. You will work on all aspects of this end-to-end system to support the API.


We'll trust you to:

  • Take responsibility for deployment after Beta for our messaging and multicast services
  • Ensure level 1 support for production issues
  • Automate everything from reaction to outages to quality checks for new builds
  • Provide feedback to developers to make this infrastructure increasingly resilient

You'll need to have:

  • Proven experience as a software engineer or developer working on high availability, large-scale distributed applications
  • Excellent programming skills. You don't need to be a rock star C++ programmer, but you need to know at least little bit about C++ and you do need to be a great programmer in other programming languages such as Python, Ruby, Perl, Scala or JavaScript.
  • A strong understanding of the UNIX/Linux command line
  • A passion for performance excellence and an engineering mindset
  • Previous experience with data, statistics and latency numbers
  • A Bachelor's degree in Computer Science or equivalent experience

We'd love to see:

  • Strong leadership skills
  • Prior experience as a systems performance or site/systems reliability engineer
  • Extensive experience working with fault-tolerant approaches in a large-scale distributed environment with high performance systems
  • A deep understanding of Internet and networking protocols, including IP multicast (PGM)
  • Knowledge of network analysis and performance and application issues using standard tools (Tcpdump or Wireshark)
  • 2+ years of Chef, Puppet or Ansible system configuration experience (error handling, idempotency, configuration management)
  • Experience with virtualization and Infrastructure as a Service models
  • The ability to handle periodic on-call duties as well as out-of-band requests



All qualified candidates are encouraged to apply by submitting their resume as an MS word document including a cover letter with a summary of relevant qualifications, highlighting clearly any special or relevant experience.

Please Note: All inquiries will be treated with the utmost confidentiality. Your resume will not be submitted to any client company without your prior knowledge and consent.

Contact Recruiter
Senior Technical Recruiter
Andiamo Partners | 90 Broad Street, Suite 1501, New York, NY 10004

Share Share this Job