Created at: January 07, 2026 00:13
Company: Federal Student Aid
Location: Atlanta, GA, 30301
Job Description:
These positions located in the Federal Student Aid (FSA), Chief Technology Office Technology Operations Division. FSA is modernizing the systems that serve over 17 million students and power more than $120 billion in financial aid each year. We are building a team of senior platform and reliability engineers to strengthen the technical foundation of one of the federal government’s highest-impact digital ecosystems.
Selective Placement Factor You must meet the following selective placement factor: Candidates must possess any combination of the following certifications from a recognized professional organization at the time of hire and acceptance of the position: IT Information Library v4 (ITIL) Project Management Professional (PMP) AWS Certified Advanced Networking Certified Information Systems Security Professional (CISSP) Certified Cloud Security Professional (CCSP) F5 Networks Certified Technology Specialist Minimum Qualification Requirements You may meet the minimum qualifications for the GS-14 if you possess the specialized experience below. Specialized Experience for the GS-14 One year of experience in either federal or non-federal service that is equivalent to at least a GS-13 performing two (2) out of three (3) of the following duties or work assignments: Experience leading the design and deployment of scalable cloud platforms using Infrastructure as Code (IaC), CI/CD, containers, and automated security controls to accelerate engineering delivery and ensure compliance. Experience enhancing reliability and observability for distributed systems, including SLO/SLA development, incident response, root-cause analysis, telemetry workflows, and/or performance/automation improvements. Experience translating platform and reliability engineering concepts into clear documentation, technical standards, and architecture guidance for non-technical audiences, and influencing engineering practices across multiple teams. Basic Experience Requirements You must possess IT-related experience (paid or unpaid) and/or completion of specific, intensive training (e.g., IT certification) demonstrating each of the four competencies listed below: Attention to Detail – Is thorough when performing work and conscientious about attending to detail. Customer Service – Works with clients and customers to assess their needs, provide information or assistance, resolve problems, or satisfy expectations; knows about available products and services; is committed to providing quality products and services. Oral Communication – Expresses information effectively to individuals or groups, taking into account the audience and nature of the information; makes clear and convincing oral presentations; listens to others, attends to nonverbal cues, and responds appropriately. Problem Solving – Identifies problems; determines accuracy and relevance of information; uses sound judgment to generate and evaluate alternatives, and to make recommendations. Knowledge, Skills, and Abilities (KSAs) – Information Technology Specialist (ENTARCH) GS-2210-14 The quality of your experience will be measured by the extent to which you possess the following KSAs. You do not need to provide separate narrative responses to these KSAs, as they will be measured by your responses to the occupational questionnaire. Skill in applying systems engineering and Site Reliability Engineering (SRE) concepts to ensure reliability, performance, scalability, security, and maintainability across complex, multi-cloud environments. Skill in applying systems engineering and Site Reliability Engineering (SRE) concepts to ensure reliability, performance, scalability, security, and maintainability across complex, multi-cloud environments. Knowledge of platform and reliability engineering principles and the ability to apply them through real-world implementation, debugging, optimization, and modernization of cloud environments. Skill in computer engineering cloud automation, observability tooling, testing frameworks, and Continuous improvement/Continuous development (CI/CD) pipelines, including telemetry, logging, alerting, and distributed tracing. Ability to leverage modern cloud, data, and security technologies to design, test, and deploy resilient platform and reliability systems that support mission-critical applications.
APPLICATION LIMIT: This vacancy announcement is limited to the first 250 applications received and will close at 11:59PM Eastern Time on the day that we receive the 250th application, or at 11:59PM Eastern Time on the listed closing date, whichever occurs first. We encourage you to read this entire vacancy announcement prior to submitting your application. As a Platform/Site Reliability Engineer, you will lead the design, development, and evolution of the cloud platforms, automation, and reliability systems that power FSA’s applications. You will develop infrastructure, tooling, and observability capabilities that enable teams to deliver secure, reliable, and high-performing services at scale. You’ll collaborate with cross-functional partners to standardize cloud architectures, improve system reliability, and modernize FSA into a platform-driven, engineering-centric organization. This role blends the mission of public service with the complexity of major commercial cloud and SRE organizations. Your job is to lead the creation of the platforms, guardrails, and reliability practices that let teams ship changes safely and confidently. If you enjoy designing scalable infrastructure, optimizing system reliability, and enabling engineers to move faster, this is the role. As a Platform/Site Reliability Engineer, GS-2210-14, you will be responsible for: Serving as an advisor to the IOG Director and Chief of the Network Support Division, acting as a network architect and engineer to develop and implement solutions across cloud and on-premises environments, while designing reusable platform services, container environments, identity integrations, networking patterns, and infrastructure components. Provide input to design and technical documentation, review final deliverables, and ensure adherence to the enterprise network operations engineering framework through leadership, while serving as a principal-level expert in platform engineering, cloud architecture, Site Reliability Engineering (SRE) practices, and infrastructure automation. Engage with technology leaders, business partners, and contractors to ensure operational requirements and needs are met, while clearly communicating technical concepts to non-technical stakeholders and producing platform standards, design documents, and technical evaluations. Evaluate system security plans and procedures, manage and direct office support contractors, address IT compliance issues, and oversee project planning and updates, while designing and maintaining continuous improvement/continuous Development (CI/CD) pipelines to support automated testing, deployment, change control, and compliance validation. Drive network engineering direction and response for CISA Binding Operational Directives (BODs) impacting data center operations, developing plans and processes to strengthen security, while implementing secure cloud configurations, identity and access management (IAM) models, encryption, and zero-trust architectural patterns.