5 Weeks ago
پیگیری قطعی
A Site Reliability Engineer (SRE) plays a pivotal role in ensuring that an organization's IT services and infrastructure are highly available, scalable, and efficient. This position often involves a blend of development, operations, and troubleshooting tasks.
System Reliability and Availability: Ensure high availability and reliability of services and infrastructure. This includes proactive monitoring, incident response, and post-mortem analysis to prevent recurrence of incidents.
Performance Management: Monitor and optimize system performance to meet the service level objectives (SLOs) and service level agreements (SLAs). This involves understanding and managing the capacity and scalability of services.
Incident Management and Response: Lead the response to system outages and performance issues, including on-call duties. Develop automation tools to help in the rapid resolution of incidents and to prevent their recurrence.
Automation and Tooling: Design and implement automation tools and frameworks to reduce manual operational work. This could include scripts for deployment, monitoring, and infrastructure management.
Cross-functional Collaboration: Work closely with development teams to design and implement scalable, reliable, and efficient systems. This involves providing input on architectural decisions, optimizing resource utilization, and ensuring system resilience.
Continuous Improvement: Continuously analyze current processes and systems for improvement opportunities. Implement best practices for system reliability and availability.
Disaster Recovery and Backup: Develop and maintain disaster recovery plans, including regular testing to ensure system resilience.
Documentation: Maintain detailed documentation of the system architecture, configurations, processes, and service records to ensure that the knowledge is shared and accessible within the team.
Requirements / Skills
Education: A bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
Experience: Proven experience in a site reliability engineering role or similar, with a strong background in software development and system administration.
Technical Skills:
- Proficiency in programming languages.
- Experience with cloud services and container orchestration tools (Kubernetes, Docker).
- Strong understanding of networking principles and protocols.
- Experience with continuous integration and deployment (CI/CD) practices.
Problem-Solving Skills: Ability to troubleshoot and resolve complex technical issues under pressure.
Communication Skills: Excellent verbal and written communication skills, with the ability to effectively communicate technical concepts to non-technical stakeholders.
Teamwork: Ability to work collaboratively in a cross-functional team and interact effectively with developers, operations teams, and management.
Job Benefits
Loans
Health insurance
Game room
Snacks
Breakfast
Lunch
Occasional packages and gifts
Learning stipends
Resting space
با رزومه ساز کاربوم نتیجه بهتری بگیرید
راهنمای شغلی
معرفی شغل کارشناس شبکه (نیازمندیها، نقش و وظایف)
معرفی شغل کارشناس شبکه: شرح وظایف کارشناس شبکه معرفی شغل کارشناس شبکه: نیازمندیهای کار معرفی شغل کارشناس شبکه: محیط کاری معرفی شغل کارشناس شبکه: چگونه ...