DevOps Engineer
Position Overview
Tekmetric is seeking a skilled DevOps Engineer with 5+ years of experience to join our team. You will be responsible for designing, implementing, and maintaining scalable, reliable, and secure cloud infrastructure. This role involves automating processes, ensuring high availability, and collaborating with various teams to enhance application reliability and scalability.
About Tekmetric
Tekmetric is an all-in-one, cloud-based platform designed to help auto repair shops operate more efficiently, grow their businesses, and improve customer service. Founded in 2017, Tekmetric has become an industry leader by focusing on innovation, real-world experience, and a commitment to transparency, integrity, innovation, and a service-first mindset. We are building a movement to empower repair shops and shape the future of auto repair.
Responsibilities
- Design and implement scalable infrastructure: Architect and maintain reliable, scalable, and secure cloud infrastructure that supports positive user experiences and measurable business growth.
- Monitor and optimize system performance: Develop and maintain monitoring, alerting, and incident response practices to ensure system reliability and performance at scale.
- Automate everything: Create automated pipelines for deployment, testing, and infrastructure management to improve speed, consistency, and reliability across the organization.
- Ensure high availability and disaster recovery: Implement and manage solutions for backup, disaster recovery, and failover processes to ensure business continuity.
- Security and compliance: Apply best practices in security, monitoring, and compliance, ensuring that systems meet necessary requirements and regulations.
- Collaboration: Work cross-functionally with development, data, product, and QA teams to improve application reliability and scalability.
- Leadership and mentorship: Provide technical leadership, mentorship, and guidance to junior DevOps team members, fostering a culture of continuous learning and improvement.
Requirements
- Experience: 5+ years of experience in DevOps, Site Reliability Engineering (SRE), or a related field.
- Cloud Environments: Deep knowledge of cloud environments, preferably AWS or GCP.
- Cloud Infrastructure: Hands-on experience with AWS (or similar cloud providers) and infrastructure as code (e.g., Terraform).
- Automation: Strong experience in automation tools.
- Containerization: Expertise in working with containerized environments like Docker and orchestration tools such as Kubernetes.
- Monitoring and Logging: Experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack).
- Scripting: Proficiency in scripting languages like Python, Bash, or similar.
- CI/CD pipelines: Experience with designing and optimizing Continuous Integration and Continuous Deployment (CI/CD) pipelines.
- Collaboration and Communication: Strong communication skills and ability to work cross-functionally, solving complex technical challenges in a collaborative manner.
- Problem-solving mindset: Ability to troubleshoot and resolve critical issues in high-pressure environments, maintaining composure and professionalism.
Bonus Points
- Experience with Infrastructure as Code tools like Terraform.
- Familiarity with monitoring tools like Prometheus, Grafana, or the ELK stack.
- Exposure to compliance and security best practices in cloud environments.
- Experience coding in one or multiple programming languages such as Go, Java, Javascript.
Who You Are
Successful candidates will also demonstrate characteristics aligned with our core values, including a passion for building and improving.
Application Instructions
- [Details on how to apply would typically go here, but are not provided in the original text.]
Company Information
- Company: Tekmetric
- Founded: 2017
- Location: Houston (officially)
- Values: Transparency, integrity, innovation, service-first mindset.