Sr Technical Lead (Support & Operations)
HCLTech
Job Summary
- Cluster Operations & Maintenance: Perform routine patching, configuration updates, and resource management for vanilla Kubernetes clusters. Execute and monitor backup and recovery tasks using Velero.
- Configuration & Delivery: Utilize and maintain application deployment configurations using tools like Helm and Kustomize. Operate and troubleshoot the continuous delivery pipeline managed by Flux (GitOps).
- Open Source Component Lifecycle (Execution): Execute planned upgrades and maintenance for core open-source components (e.g., Calico, MetalLB, Prometheus, Grafana, ELK, etc.) as directed by L3 strategy.
- Incident Response & Debugging: Serve as the primary escalation point for complex platform incidents. Expertly troubleshoot issues across Calico networking, MetalLB load balancing, Ngnix, and containerd runtime.
- Automation Execution: Execute, test, and maintain existing infrastructure-as-code. Expert-level proficiency in Ansible, Shell Scripting, and Python is mandatory for reliable deployment and management execution.
- Security Implementation: Apply and manage security policies defined by L3. Execute access changes within Dex, perform secret rotations via HashiCorp Vault, and apply pre-approved Gatekeeper, OPA, and PSP rulesets.
- Observability Management: Manage and refine alert conditions in Prometheus/Grafana. Perform advanced log analysis and correlation using the ELK stack.
- Infrastructure Support: Provide operational support for the underlying stack, including Dell physical servers, Ubuntu OS, and basic KVM virtualization. Utilize Docker commands proficiently for container debugging.
- Storage Maintenance: Execute tasks related to CSI volumes and persistent storage, ensuring operational integrity with Dell Isilon and Infinidat integrations.
- Development Literacy: Possess a basic understanding of Go language development for reading, debugging, and reviewing platform utility scripts.
- Cluster Operations & Maintenance: Perform routine patching, configuration updates, and resource management for vanilla Kubernetes clusters. Execute and monitor backup and recovery tasks using Velero.
- Configuration & Delivery: Utilize and maintain application deployment configurations using tools like Helm and Kustomize. Operate and troubleshoot the continuous delivery pipeline managed by Flux (GitOps).
- Open Source Component Lifecycle (Execution): Execute planned upgrades and maintenance for core open-source components (e.g., Calico, MetalLB, Prometheus, Grafana, ELK, etc.) as directed by L3 strategy.
- Incident Response & Debugging: Serve as the primary escalation point for complex platform incidents. Expertly troubleshoot issues across Calico networking, MetalLB load balancing, Ngnix, and containerd runtime.
- Automation Execution: Execute, test, and maintain existing infrastructure-as-code. Expert-level proficiency in Ansible, Shell Scripting, and Python is mandatory for reliable deployment and management execution.
- Security Implementation: Apply and manage security policies defined by L3. Execute access changes within Dex, perform secret rotations via HashiCorp Vault, and apply pre-approved Gatekeeper, OPA, and PSP rulesets.
- Observability Management: Manage and refine alert conditions in Prometheus/Grafana. Perform advanced log analysis and correlation using the ELK stack.
- Infrastructure Support: Provide operational support for the underlying stack, including Dell physical servers, Ubuntu OS, and basic KVM virtualization. Utilize Docker commands proficiently for container debugging.
- Storage Maintenance: Execute tasks related to CSI volumes and persistent storage, ensuring operational integrity with Dell Isilon and Infinidat integrations.
- Development Literacy: Possess a basic understanding of Go language development for reading, debugging, and reviewing platform utility scripts.