Job Description
<div class="content-intro"><p><strong>Be part of the team that defends the networks the world depends on</strong></p>
<p>Corelight defends the world’s most sensitive networks—from global commerce to national defense—quietly, relentlessly, and with resolve. As cyber threats grow faster and smarter, we serve as the trusted force behind network resilience, putting elite defense within reach.</p>
<p>By transforming digital footprints from physical, virtual, and cloud networks into actionable insights, we empower defenders to illuminate blind spots and stay ahead of an evolving threat landscape. Built on open-source innovations and fueled by industry leading agentic AI technology, Corelight helps teams to detect advanced threats and close cases with unprecedented clarity and precision.</p></div><p><span style="font-family: helvetica, arial, sans-serif;">As a<strong> </strong>Lead Cloud Infrastructure Engineer / Site Reliability Engineer (SRE), you will ensure the stability, performance, and security of our Federal region’s cloud platform. You’ll manage infrastructure and operations with a focus on availability, latency, performance optimization, monitoring, incident response, and capacity planning. This role requires maintaining a FedRAMP-compliant environment and working closely with teams to meet the highest standards of security and compliance.</span></p>
<p><span style="font-family: helvetica, arial, sans-serif;">We adopt an "everything as code" approach, leveraging automation and best practices to create an efficient, reliable, and scalable infrastructure. You will be instrumental in maintaining core infrastructure services that are robust, secure, and capable of processing high volumes of data seamlessly.</span></p>
<p><span style="font-family: helvetica, arial, sans-serif;"><strong>The successful candidate must be a U.S. citizen and may need to perform work that the U.S. government has specified can only be carried out by a U.S. citizen on U.S. soil.</strong></span></p>
<h3><span style="font-family: helvetica, arial, sans-serif;"><strong>Responsibilities</strong></span></h3>
<ul>
<li style="font-family: helvetica, arial, sans-serif;"><span style="font-family: helvetica, arial, sans-serif;">Collaborate with software engineering teams to ensure the reliability, performance, and security of the Federal region’s infrastructure.</span></li>
<li style="font-family: helvetica, arial, sans-serif;"><span style="font-family: helvetica, arial, sans-serif;">Design, deploy, and scale AI/ML/LLM infrastructure across cloud platforms (AWS, Azure, or GCP) ensuring high reliability and performance.</span></li>
<li style="font-family: helvetica, arial, sans-serif;"><span style="font-family: helvetica, arial, sans-serif;">Manage and optimize Kubernetes environments (EKS, AKS, GKE) for AI services, data pipelines, and model operations.</span></li>
<li style="font-family: helvetica, arial, sans-serif;"><span style="font-family: helvetica, arial, sans-serif;">Build and automate end-to-end data and model pipelines for fine-tuning, inference, and RAG workloads using Terraform, Python, and CI/CD tooling.</span></li>
<li style="font-family: helvetica, arial, sans-serif;"><span style="font-family: helvetica, arial, sans-serif;">Utilize automation tools such as GitOps, CI/CD pipelines, and containerization technologies (Docker, Kubernetes) to streamline ML/LLM tasks across the Large Language Model lifecycle.</span></li>
<li style="font-family: helvetica, arial, sans-serif;"><span style="font-family: helvetica, arial, sans-serif;">Implement monitoring, observability, and reliability best practices using Prometheus, Grafana, ELK/EFK, Langfuse, and SLI/SLO/SLA frameworks.</span></li>
<li style="font-family: helvetica, arial, sans-serif;"><span style="font-family: helvetica, arial, sans-serif;">Participate in 24x7 on-call rotations, leading incident response, performance tuning, and cost optimization across SaaS Platform and production workloads</span></li>
<li style="font-family: helvetica, arial, sans-serif;"><span style="font-family: helvetica, arial, sans-serif;">Own infrastructure end to end, leading scaling initiatives, deployments, and automation, and providing technical leadership across the team</span></li>
</ul>
<h3><strong><span style="font-family: helvetica, arial, sans-serif;">Qualifications/Requirements:</span></strong></h3>
<ul>
<li style="font-family: helvetica, arial, sans-serif;"><span style="font-family: helvetica, arial, sans-serif;">Bachelor’s or Master’s degree in Computer Science, Engineering, or related field, or equivalent experience.</span></li>
<li style="font-family: helvetica, arial, sans-serif;"><span style="font-family: helvetica, arial, sans-serif;">8+ years in SRE, DevOps, Platform Engineering, MLOps, or Cloud Infrastructure roles.</span></li>
<li style="font-family: helvetica, arial, sans-serif;"><span style="font-family: helvetica, arial, sans-serif;">4+ years of production experience with Kubernetes (EKS, GKE, AKS) and containerization tools like Docker.</span></li>
<li style="font-family: helvetica, arial, sans-serif;"><span style="font-family: helvetica, arial, sans-serif;">Strong programming skills in Python and proficiency in Zyphyrscript, Bash, Go, or PowerShell.</span></li>
<li style="font-family: helvetica, arial, sans-serif;"><span style="font-family: helvetica, arial, sans-serif;">Proficiency with Infrastructure-as-Code tools (Terraform, CloudFormation).</span></li>
<li style="font-family: helvetica, arial, sans-serif;"><span style="font-family: helvetica, arial, sans-serif;">Experience with Kubernetes Operators, Helm, GitOps (ArgoCD, Flux), or Service Mesh (Istio, Linkerd).</span></li>
<li style="font-family: helvetica, arial, sans-serif;"><span style="font-family: helvetica, arial, sans-serif;">Exposure to serverless compute (AWS Lambda, Azure Functions).</span></li>
<li style="font-family: helvetica, arial, sans-serif;"><span style="font-family: helvetica, arial, sans-serif;">Experience building or automating data and model pipelines for AI/ML/LLM workloads (e.g., RAG, fine-tuning, inference).</span></li>
<li style="font-family: helvetica, arial, sans-serif;"><span style="font-family: helvetica, arial, sans-serif;">Strong understanding of observability and monitoring using Prometheus, Grafana, ELK/EFK, Langfuse, or similar platforms.</span></li>
<li style="font-family: helvetica, arial, sans-serif;"><span style="font-family: helvetica, arial, sans-serif;">Familiarity with SLI/SLO/SLA practices, incident response, and reliability engineering in production environments.</span></li>
</ul>
<h3><span style="font-family: helvetica, arial, sans-serif;"><strong>Preferred Qualifications (Nice to Have):</strong></span></h3>
<ul>
<li style="font-family: helvetica, arial, sans-serif;"><span style="font-family: helvetica, arial, sans-serif;">Cloud certifications (AWS, Azure, or GCP – e.g., Solutions Architect, DevOps Engineer).</span></li>
<li style="font-family: helvetica, arial, sans-serif;"><span style="font-family: helvetica, arial, sans-serif;">Experience with agentic AI frameworks (CrewAI, LangGraph, AutoGen).</span></li>
<li style="font-family: helvetica, arial, sans-serif;"><span style="font-family: helvetica, arial, sans-serif;">Background in hybrid or on-prem AI deployments, including OpenShift or Rancher.</span></li>
<li style="font-family: helvetica, arial, sans-serif;"><span style="font-family: helvetica, arial, sans-serif;">Familiarity with configuration management (Ansible, Chef, Puppet).</span></li>
<li style="font-family: helvetica, arial, sans-serif;"><span style="font-family: helvetica, arial, sans-serif;">Contributions to open-source AI/ML, DevOps, or platform tooling.</span></li>
<li style="font-family: helvetica, arial, sans-serif;"><span style="font-family: helvetica, arial, sans-serif;">Experience with multimodal AI or model observability platforms (RAGAS, AgentOps, Langtrace), Distributed Tracing, OpenTelemetry.</span></li>
<li style="font-family: helvetica, arial, sans-serif;"><span style="font-family: helvetica, arial, sans-serif;">Knowledge of performance tuning, cost efficiency, or capacity planning for AI/LLM infrastructure.</span></li>
<li style="font-family: helvetica, arial, sans-serif;"><span style="font-family: helvetica, arial, sans-serif;">Understanding of security controls and FedRAMP compliance for cloud and various workloads.</span></li>
</ul>
<h3><span style="font-family: helvetica, arial, sans-serif;"><strong>Additional Requirements</strong></span></h3>
<p><span style="font-family: helvetica, arial, sans-serif;">Due to the criteria and security levels required for Corelight’s FedRAMP program, this position requires:</span></p>
<ul>
<li style="font-family: helvetica, arial, sans-serif;"><span style="font-family: helvetica, arial, sans-serif;">U.S. citizenship at the time of hire.</span></li>
<li style="font-family: helvetica, arial, sans-serif;"><span style="font-family: helvetica, arial, sans-serif;">Residence within the contiguous United States.</span></li>
<li style="font-family: helvetica, arial, sans-serif;"><span style="font-family: helvetica, arial, sans-serif;">Willingness to undergo a Single Scope Background Investigation, if required.</span></li>
</ul><div class="content-pay-transparency"><div class="pay-input"><div class="description"><p><span data-sheets-root="1" data-sheets-value="{"1":2,"2":"Notice of Pay Transparency:\nThe compensation for this position ranges from $180,000 - $214,000/year and may vary depending on factors such as your location, skills and experience. Depending on the nature and seniority of the role, a percentage of compensation may come in the form of a commission-based or discretionary bonus. Equity and additional benefits will also be awarded."}" data-sheets-userformat="{"2":47361,"3":{"1":0,"3":1},"11":4,"14":{"1":3,"3":1},"15":"Calibri","16":11,"18":1}" data-sheets-formula="=R1C12&TEXT(R[0]C[-4],"#,000")&" - $"&TEXT(R[0]C[-2],"#,000")&R1C13">Notice of Pay Transparency:<br>The compensation for this position may vary depending on factors such as your location, skills and experience. Depending on the nature and seniority of the role, a percentage of compensation may come in the form of a commission-based or discretionary bonus. Equity and additional benefits will also be awarded.</span></p></div><div class="title">Compensation Range</div><div class="pay-range"><span>$172,000</span><span class="divider">—</span><span>$219,000 USD</span></div></div></div><div class="content-conclusion"><p><strong>Why Join Us?</strong> </p>
<p>Fueled by investments from top-tier venture capital organizations such as Crowdstrike, Accel and Insight, Corelight is one of the fastest growing network detection and response platforms in the industry. Our passionate team thrives in a collaborative, inclusive, and geographically distributed culture. We embrace diverse perspectives, neurodiversity, curiosity and low ego results - fostering an environment where every innovator can solve the toughest challenges in cybersecurity and contribute their best work.</p>
<p>We are looking forward to meeting you. Check us out at<a href="https://www.corelight.com"> www.corelight.com</a></p></div>