5 Weeks ago

Hiring SRE Expert

پیگیری قطعی

  • Job position details
  • About company
Type of cooperation
Grade
Bachelor's Degree
Gender
No Difference

Job Description / Tasks

A Site Reliability Engineer (SRE) plays a pivotal role in ensuring that an organization's IT services and infrastructure are highly available, scalable, and efficient. This position often involves a blend of development, operations, and troubleshooting tasks.


System Reliability and Availability: Ensure high availability and reliability of services and infrastructure. This includes proactive monitoring, incident response, and post-mortem analysis to prevent recurrence of incidents.


Performance Management: Monitor and optimize system performance to meet the service level objectives (SLOs) and service level agreements (SLAs). This involves understanding and managing the capacity and scalability of services.


Incident Management and Response: Lead the response to system outages and performance issues, including on-call duties. Develop automation tools to help in the rapid resolution of incidents and to prevent their recurrence.


Automation and Tooling: Design and implement automation tools and frameworks to reduce manual operational work. This could include scripts for deployment, monitoring, and infrastructure management.


Cross-functional Collaboration: Work closely with development teams to design and implement scalable, reliable, and efficient systems. This involves providing input on architectural decisions, optimizing resource utilization, and ensuring system resilience.


Continuous Improvement: Continuously analyze current processes and systems for improvement opportunities. Implement best practices for system reliability and availability.


Disaster Recovery and Backup: Develop and maintain disaster recovery plans, including regular testing to ensure system resilience.


Documentation: Maintain detailed documentation of the system architecture, configurations, processes, and service records to ensure that the knowledge is shared and accessible within the team.

Requirements / Skills

 


Requirements / Skills


 


Education: A bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.


Experience: Proven experience in a site reliability engineering role or similar, with a strong background in software development and system administration.


Technical Skills:


- Proficiency in programming languages.


- Experience with cloud services and container orchestration tools (Kubernetes, Docker).


- Strong understanding of networking principles and protocols.


- Experience with continuous integration and deployment (CI/CD) practices.


Problem-Solving Skills: Ability to troubleshoot and resolve complex technical issues under pressure.


Communication Skills: Excellent verbal and written communication skills, with the ability to effectively communicate technical concepts to non-technical stakeholders.


Teamwork: Ability to work collaboratively in a cross-functional team and interact effectively with developers, operations teams, and management.

Job Benefits

Job Benefits


Loans


Health insurance


Game room


Snacks


Breakfast


Lunch


Occasional packages and gifts


Learning stipends


Resting space

Introduction کارگزاری آگاه

  • گروه مالی آگاه، اولین بار در سال ۱۳۸۴ و با تاسیس کارگزاری به بازار سرمایه کشور قدم گذاشت. در سال‌هایی که گذشت، تمام تلاش‌ ما ارائه بهترین خدمات و محصولات به مشتریان بازار سرمایه بوده است. امروز، آگاه با اولویت قرار دادن رضایت مشتریان خود و به پشتوانه اعتماد شما، دیگر تنها یک کارگزاری نیست... گروه آگاه جایی است که از صفر تا بی نهایت بازار سرمایه را در خود جای داده است.

سایر آگهی های این شرکت

  • کارشناس ارتباط با مشتریان

    ۳ روز قبل تمام وقت

  • کارشناس فروش و توسعه

    ۳ روز قبل تمام وقت

  • HR L&D Manager

    2 Days ago Full Time

  • کارشناس Help Desk

    ۳ روز قبل تمام وقت