Site Reliability Engineer

Site Reliability Engineer

Your new company

Out client offers solutions that range from consultancy to the co-creation of innovative offers and service operations, as well as professional and sector-based solutions.
The vision: the human-digital convergence is a key factor in your company's competitiveness, with a view to creating value. The world's evolution is adapting to Web evolution. To ensure both profitability and growth, organizations must reinvent themselves to back up their strategic challenges.
Our client, by combining collaborative platforms, professional expertise, digital and industrial capacities, has asserted itself as a trusted business partner in digital and mobile transformation for organizations.

Your new role

The Site Reliability Engineer, as a part of our client's Trust & Sign’s SRE team is responsible for providing automated processes linked with building and deploying software in our own DC and AWS, developing scripts/software needed by all activities done by the SRE team. Other areas that SRE Engineer will take care of are operations linked with monitoring of SLA-critical production platforms, resolving issues and manual intervention. All off these actions will be done with close cooperation with software development teams.

Key Responsibilities / Main Activities:
• Deploy platforms on public or private cloud environments and work closely to Development team to prepare operations.
• Harden and automate platforms before they go live by reviewing their design and implementation, tuning configuration as well as developing auxiliary tools and necessary monitoring of critical health indicators.
• Maintain platforms after go live by measuring and monitoring their availability, performance and overall system health.
• Recover platforms during production incidents to meet targeted SLO; perform detailed root cause analysis to prevent regressions.
• Provide technical expertise on Trust & Sign products and support processes to internal and external customers, including defining SLI/SLO acceptable by all involved parties.
• Provide technical and first business level support to Trust & Sign customers on a 8/5 basis. Ensure that each product has all of the O&M functionalities present.
• Validate readiness and maturity of new rollouts through development, execution and verification of automated smoke test suites.
• Understand, follow and improve upon all formally-communicated methodologies, processes, policies and values. Focusing always on delivering consistent, reliable, repeatable, scalable and quality outcomes.
• Provide support and encouragement to other team members and participate in the up-skilling and training of colleagues and new staff.
• Analyze failure and ensure operational recovery within agreed SLA through standard procedure or ad-hoc workarounds.
• Work on continuous improvement process by analyzing recurrent incidents and designing long term solutions. • Participate to on-board new customers on existing services and assist them during their technical on-boarding. • Involved in on-call duties.

What you'll need to succeed
Typical education: Engineering degree Experience:
+5 years Technologies needed:
• Unix/Linux, Kubernetes, Docker, AWS Services, Bash scripting and automation, JAVA 11+, GIT, Terraform, Jenkins, Grafana, Prometheus
• Basic understanding of networking topology and components of distributed web applications
• Basic understanding of SQL database design and operations; SQL syntax
• Understanding of commercial software development, testing and deployment processes

What you need to do now
If you're interested in this role, click 'apply now' to forward an up-to-date copy of your CV, or call us now.
If this job isn't quite right for you but you are looking for a new position, please contact us for a confidential discussion on your career. #1170035
Click here to access HAYS Privacy Policy, which provides detailed information on how we use and protect your personal information, and your rights in relation to this.


Job Type
Technology & Internet Services

Talk to a consultant

Talk to Ruxandra Stanciu, the specialist consultant managing this position, located in Hays Bucharest
Premium Plaza, 63-69 Dr. Iacob Felix Street, 7th floor

Telephone: +40 723163701

Similar jobs to Site Reliability Engineer

  • Software Engineer

    Software Engineer
  • Sr./Lead Full Stack Java Developer

    Java, Spring, JavaScript, Angular, Jenkins, Maven, labor contract
  • Application Specialist

    Looking for a performance-driven Application Specialist Medical Devices in Bucharest!
  • Quality Assurance Engineer

    Selenium & BDD
  • Front-End Developer

    Front-End Developer