• Platform - SRE - Reliability Engineering - Java Developer - London

    Location(s) UK-London
    Job ID
    Schedule Type
    Full Time
    Vice President/Executive Director
    Business Unit
    Employment Type



    At Goldman Sachs, our Engineers don’t just make things – we make things possible.  Change the world by connecting people and capital with ideas.  Solve the most challenging and pressing engineering problems for our clients.  Join our engineering teams that build massively scalable software and systems, architect low latency infrastructure solutions, proactively guard against cyber threats, and leverage machine learning alongside financial engineering to continuously turn data into action.  Create new businesses, transform finance, and explore a world of opportunity at the speed of markets.


    Engineering, which is comprised of our Technology Division and global strategists groups, is at the critical center of our business, and our dynamic environment requires innovative strategic thinking and immediate, real solutions.  Want to push the limit of digital possibilities?  Start here.




    Goldman Sachs Engineers are innovators and problem-solvers, building solutions in risk management, big data, mobile and more. We look for creative collaborators who evolve, adapt to change and thrive in a fast-paced global environment.



    Our team of engineers builds solutions to the most complex problems. We develop cutting-edge systems and processes that form the core of our key business and enable transactions to move in milliseconds. We provide real-time access to critical deal information and crunch billions of data points each day to inform firm-wide market insights and strategies. Team members have the opportunity to work at the forefront of technology innovation alongside industry leaders and make significant contributions to the field.


    Deployment and Runtime is responsible for Goldman Sachs' mission critical production platforms including:  

    • Private and Public Cloud computing platform 
    • Distributed scheduling 
    • Entitlements platform 
    • Security and critical infrastructure services 
    • Linux and Windows engineering 
    • Core data centre platforms including network and storage 

    You will be part of a diverse global technical team focusing on critical business problems interacting with multiple technology engineering teams. We are responsible for a critical client facing function and aim to innovate and drive solutions through technology that will impact the bottom line for the firm. 


    Deployment and Runtime System Reliability Engineers (SREs) apply software engineering principles to production Network management focusing on availability, latency, performance, efficiency, change management, capacity planning & monitoring. SREs drive automation by building and adopting tools and practices that enable standardization of Technology's run time environment.

    Our rapidly evolving environment is enabled by its massive underlying networking infrastructure. Key to a stable Network is enhancing our ability to utilise a wider and broader range of testing, reporting and monitoring. The ultimate goal is to take the human element out of Network issue detection, troubleshooting and fix - we aim to detect network interruptions and then automatically mitigate or repair them within seconds.

    This is achieved through sophisticated engineering; autonomic and machine learning that statistically and inductively help understand the behaviour of these complex Networked systems.

    An individual in this role is responsible for design, development, deployment and support of products and platforms that leverage Java based technologies and enable large scale event processing in engineering products in Deployment and Runtime. The individual will engage in both server side as well as front-end development as required to achieve the desired outcomes.



    • Develop software and systems architectural frameworks and tooling 
    • Maintain Network services by measuring and monitoring availability, latency and overall system health 
    • Create and sustain scalable systems and services through automation and uplifts 
    • Expand and enhance the firm's Network analytics offering - a Java, Elasticsearch and Kibana based platform - driving performance, reliability and accessibility to new levels 
    • Analyze data at scale to present novel insights into trends and outliers, building dashboards to present key metrics and tools to perform deep dives 
    • Work closely with adjacent teams in Runtime & Deployment, collaboratively building data models and providing guidance on implementation




    • 5 or more years of experience in a relevant role (DevOps, Reliability Engineering, Application Programming, etc) 
    • Strong knowledge of Java, Collections Framework, Concurrent Programming 
    • Experience spring framework, multi-threading and synchronisation
    • Demonstrated experience developing REST Services and Service Oriented Architectures 
    • Demonstrated experience developing and maintaining distributed, scalable, highly available systems 
    • Experience with writing automated tests using tools like Junit and Mockito. Additionally experience with scenario testing tools like FitNesse would be beneficial as well as experience with test driven development methodologies 
    • Experience with debugging, troubleshooting, code optimization and issue resolution 
    • Thorough knowledge and experience in all phases of SDLC 
    • Experience with distributed compute systems 
    • Highly knowledgeable of at least one of Linux or Windows platforms 
    • Knowledgeable of many other areas of technology (networking, hardware, etc) 
    • Must possess the ability to handle multiple on-going assignments and be able to work independently in addition to contributing as part of a highly collaborative and globally dispersed team
    • Strong analytical skills with the ability to break down and communicate complex issues, ideas and solutions
    • Strong interpersonal skills - good client facing skills as well as excellent oral and written communication


    • Working knowledge of Maven, SVN, NPM, Gulp 
    • Fundamental understanding of networking and authentication protocols 
    • Understanding of load-balancing/high-availability solutions 
    • Knowledge of at least one other programming language (Python, Perl) beyond basic scripting is a key advantage 
    • Good knowledge of at least one database product (like MSQL / PostgreSQL / DB2) (Knowledge of NOSQL products like MongoDB would be an advantage)


    The Goldman Sachs Group, Inc. is a leading global investment banking, securities and investment management firm that provides a wide range of financial services to a substantial and diversified client base that includes corporations, financial institutions, governments and individuals. Founded in 1869, the firm is headquartered in New York and maintains offices in all major financial centers around the world.

    © The Goldman Sachs Group, Inc., 2018. All rights reserved Goldman Sachs is an equal employment/affirmative action employer Female/Minority/Disability/Vet.