Service Reliability Engineer (Remote)

SquarePeg Hires

This is a Full-time position in Westminster, CO posted April 2, 2021.

For a large financial services client, we are seekingService Reliability Engineers (SRE)with expertise inAWS cloud and DevOps.

In this role, you will support the entire development lifecycle to incorporate service reliability best practices and reduce downtime.

In this role you will begin asan Contract employee supporting our clientbut convert to a full time role directly with our client at some point within the first year.ResponsibilitiesIndependently determine the needs of the customer while identifying and resolving conflicting orcomplementary needs across customer groups.Applying advanced skill, knowledge and experience, design and develop software solutions to meetcustomer needs.Use a process-driven approach to leading design solutions.Implement new software technology and coordinate simultaneous implementation tasks across teams.May maintain or oversee the maintenance of existing software.Requirements4+ years of relevant professional experience as a full-stack developer or SRE.Work with application stakeholders and define non-functional requirements covering performance,scalability, availability, resiliency and reliability including Service Level Objectives, Service Level Indicators and Error Budgets.Develop strategies to address the Non-functional requirements throughout Software or Product DevelopmentLife Cycle.Work with architecture and development teams in creating performant, highly resilient and reliablearchitecture and design using performance engineering & chaos engineering principles.Work with architecture and development teams in implementing resiliency constructs, building faulttolerance and develop optimal code.Develop tools and utilities to automate manual operational tasks in production.Responsible for incidents related to NFRs, updating SOPs to capture right set of metrics/logs for RCA, Rootcause analysis of the incidents, Solutions identification and Ensure permanent closure of the incidents.Analyze production utilization and incidents patterns, identify improvement areas and implement automationto improve productivity, avoid manual tasks and recurring incidents.Excellent verbal and written communication skills with experience presenting information and/or ideas to anaudience in a way that is engaging and easy to understand.Experience collaborating cross-functionally on availability / performance issues in order to identify root cause, determine areas for improvement, and drive those actions to closure through effective solutions.Extensive knowledge of principles, advanced techniques, and theories to suggest and implement solutions ona specific project, program, or product.Influencing skills to include negotiation, persuasion of others, meeting facilitation, and conflict resolution.Adept at managing project plans, resources, and people to ensure successful project completion in an Agile /Scrum environment in order to facilitate the design / development of performance engineering and resiliency methodologies through collaboration with engineering and product teams to implement shift left techniques on test design & automation.Experience mentoring teams in the writing of Performance and Chaos Engineering strategies and scripts witha strong emphasis on automated deployment, infrastructure automation solutions, and continuous integration & delivery processes.Skilled as a full stack developer with a focus on cross-platform optimization and responsiveness ofapplications.Strong understanding and knowledge of Java/J2EE technologies and frameworks
– UI/JavaScript frameworks,Spring Boot/ Spring Cloud Frameworks, REST, Microservices, server-side frameworks.Experience in working with one of cloud technologies (AWS, GCP or Azure).Knowledge on Cloud technologies and containerization using Docker & Kubernetes.Excellent understanding and demonstrated experience in the use of DevOps/CICD tools like Jenkins, Jules andAutomated deployment tools.Working knowledge on one of Unix operating systems.Automation experience with Blueprism, Selenium, or Ansible play books and programming languages likeJava, Perl, Python or PowerShell Scripting and Ansible play book.Knowledge on performance tuning of enterprise level Java/J2EE applications (Web and Application ServersConfiguration, JVM parameters tuning, GC and Heap Size, Message Broker).Experience in implementing resiliency design patterns using Hystrix, Resilience4J, Service Mesh or similarframeworks and validation using chaos monkey type frameworks.Experience in performance engineering tools
– Monitoring tools, Performance testing tools and Analysis tools.Experience in trouble shooting Performance / Scalability / Availability issues in production environment.Skilled in cloud technologies and cloud computing to include Amazon Web Services (AWS) offerings,development, and networking platforms.Experience defining, measuring, and improving Reliability Metrics (SLO/SLI), Observability (Monitoring,Logging-Tracing solutions), Operations Processes (Incident, Problem Management), and Operations ToilReduction through Automation.Experience designing, building and implementing necessary dashboards from application and infrastructurehealth perspectives using tools such asdevelopment, and networking platforms.Experience defining, measuring, and improving Reliability Metrics (SLO/SLI), Observability (Monitoring,Logging-Tracing solutions), Operations Processes (Incident, Problem Management), and Operations Toil Reduction through Automation.Experience designing, building and implementing necessary dashboards from application and infrastructurehealth perspectives using tools such as Splunk, Dynatrace, Datadog, etc.

to provide a single pane view of allcritical business and operational information to relevant stakeholders.#ZR

Click here to apply

Jobs JKT

Bringing the best, highest paying job offers near you

Service Reliability Engineer (Remote)

This is a Full-time position in Westminster, CO posted April 2, 2021.

Popular Categories

Popular Cities