Senior Site Reliability Engineer

Full-Time

Worldwide

Please let Casper Labs know you found this job on Remote3. It helps us get more jobs on our site. Thanks & All the best!

Important: For your security, please only use well-known video meeting platforms like Google Meet or Zoom. Never download unfamiliar software or share sensitive information like wallet addresses or ENS names with recruiters. Doing so might compromise your crypto wallet. If you encounter anything suspicious, please report it immediately to us on Twitter.

Posted on: September 11, 2024

The Senior Site Reliability Engineer will be responsible for ensuring the reliability, performance, and scalability of our blockchain platform. The ideal candidate will have extensive experience with Kubernetes, CI/CD, Terraform, and public cloud providers (AWS & IBM). This role involves collaborating with engineering teams, implementing robust infrastructure solutions, and driving continuous improvement in our operations.

Responsibilities:

Guidance and Mentorship: Provide technical guidance and mentorship to engineers, fostering a culture of learning and collaboration.
Decision Making: Assist stakeholders in making informed technical decisions that align with best practices and business goals.
Knowledge Sharing: Actively share knowledge and expertise with team members to enhance overall team capability.
Kubernetes: Manage and optimize Kubernetes clusters to ensure high availability, performance, and scalability.
CI/CD Pipelines: Design, implement, and maintain continuous integration and continuous deployment pipelines to streamline development and deployment processes.
Terraform: Utilize Terraform for infrastructure as code, ensuring consistent and repeatable infrastructure deployments.
AWS & IBM: Leverage public cloud services from AWS and IBM to build and maintain scalable and resilient infrastructure solutions.
Monitoring and Optimization: Implement monitoring and alerting systems to proactively manage and optimize cloud infrastructure.
Incident Management: Lead incident response efforts to quickly diagnose and resolve reliability and performance issues.
Continuous Improvement: Identify areas for improvement in infrastructure and operations, implementing solutions to enhance reliability and efficiency.
Security: Ensure infrastructure security best practices are followed and proactively address potential vulnerabilities.
Cross-functional Collaboration: Work closely with development, product, and operations teams to ensure alignment and effective communication.
Stakeholder Engagement: Engage with stakeholders to understand their needs and translate them into technical requirements and solutions.
Documentation: Maintain comprehensive documentation of infrastructure, processes, and procedures to ensure knowledge transfer and operational continuity.

Requirements

Experience: 5-15 years of experience in site reliability engineering, DevOps, or a related field.
Technical Skills: Proficiency in Kubernetes, CI/CD, Terraform, and public cloud providers (AWS & IBM).
Soft Skills: Strong communication skills, with a low-ego, approachable demeanor.
Problem-Solving: Excellent problem-solving skills and the ability to work independently as a self-starter.
Education: Bachelor's or Master's degree in Computer Science, Engineering, or a related field.

Benefits

Fully remote, work from home environment
Flexible working hours
Paid Time-Off
Periodic in-person offsites globally (travel permitting)
Long-term incentive programs
Continued education support
Advancement opportunity

Please let Casper Labs know you found this job on Remote3. It helps us get more jobs on our site. Thanks & All the best!

Posted on: September 11, 2024

Get real time job alerts on Telegram 🔔

12 people joined today. 3,800+ members.

Join Telegram Channel