Out client offers solutions that range from consultancy to the co-creation of innovative offers and service operations, as well as professional and sector-based solutions.
The vision: the human-digital convergence is a key factor in your company's competitiveness, with a view to creating value. The world's evolution is adapting to Web evolution. To ensure both profitability and growth, organizations must reinvent themselves to back up their strategic challenges.
Our client, by combining collaborative platforms, professional expertise, digital and industrial capacities, has asserted itself as a trusted business partner in digital and mobile transformation for organizations.
Your new role
The Site Reliability Engineer, as a part of our client's Trust & Sign’s SRE team is responsible for providing automated processes linked with building and deploying software in our own DC and AWS, developing scripts/software needed by all activities done by the SRE team. Other areas that SRE Engineer will take care of are operations linked with monitoring of SLA-critical production platforms, resolving issues and manual intervention. All off these actions will be done with close cooperation with software development teams.
Key Responsibilities / Main Activities:
• Deploy platforms on public or private cloud environments and work closely to Development team to prepare operations.
• Harden and automate platforms before they go live by reviewing their design and implementation, tuning configuration as well as developing auxiliary tools and necessary monitoring of critical health indicators.
• Maintain platforms after go live by measuring and monitoring their availability, performance and overall system health.
• Recover platforms during production incidents to meet targeted SLO; perform detailed root cause analysis to prevent regressions.
• Provide technical expertise on Trust & Sign products and support processes to internal and external customers, including defining SLI/SLO acceptable by all involved parties.
• Provide technical and first business level support to Trust & Sign customers on a 8/5 basis. Ensure that each product has all of the O&M functionalities present.
• Validate readiness and maturity of new rollouts through development, execution and verification of automated smoke test suites.
• Understand, follow and improve upon all formally-communicated methodologies, processes, policies and values. Focusing always on delivering consistent, reliable, repeatable, scalable and quality outcomes.
• Provide support and encouragement to other team members and participate in the up-skilling and training of colleagues and new staff.
• Analyze failure and ensure operational recovery within agreed SLA through standard procedure or ad-hoc workarounds.
• Work on continuous improvement process by analyzing recurrent incidents and designing long term solutions. • Participate to on-board new customers on existing services and assist them during their technical on-boarding. • Involved in on-call duties.
What you'll need to succeed
Typical education: Engineering degree Experience:
+5 years Technologies needed:
• Unix/Linux, Kubernetes, Docker, AWS Services, Bash scripting and automation, JAVA 11+, GIT, Terraform, Jenkins, Grafana, Prometheus
• Basic understanding of networking topology and components of distributed web applications
• Basic understanding of SQL database design and operations; SQL syntax
• Understanding of commercial software development, testing and deployment processes
What you need to do now
If you're interested in this role, click 'apply now' to forward an up-to-date copy of your CV, or call us now.
If this job isn't quite right for you but you are looking for a new position, please contact us for a confidential discussion on your career. #1170035
Technology & Internet Services
Talk to a consultant
Talk to Ruxandra Stanciu, the specialist consultant managing this position, located in Hays Bucharest
Premium Plaza, 63-69 Dr. Iacob Felix Street, 7th floor