We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results

Site Reliability Engineer - CTJ - Top Secret

Microsoft
United States, Washington, Redmond
Jul 02, 2025
OverviewDo you have a passion for high scale services and working with some of Microsoft's most critical customers? We're looking for a Site Reliability Engineer with the right mix of software development, on-line services experience and passion for quality to envision, design, and deliver Office 365 government cloud service offerings. Office 365 is at the center of Microsoft's cloud first, devices first strategy as it brings together cloud versions of our most trusted communication and collaboration products like Exchange, SharePoint, and Teams with our cross-platform desktop suites and mobile apps. The Office 365 Enterprise Cloud team works with Microsoft's largest enterprise and government customers to deliver features that meet their specific needs and enable cloud adoption. As you would expect, our customers have the highest expectations for feature quality, security, reliability, availability, and performance. The Site Reliability Engineering (SRE) team provides leadership, direction and accountability for application architecture, system design, and end-to-end implementation. As a Site Reliability Engineer, you will identify and deliver software improvements using your expertise in software development, complexity analysis, and scalable system design. Collaboration skills will be required to work closely with other engineering teams to ensure services/systems are highly stable and performant, meeting the expectations of our government customers and users. At Microsoft, we can offer you a exciting team, exciting challenges, and a fun place to work. The work environment empowers you to have a positive impact on millions of end users. The right candidate for this job (is): Passionate about distributed systems and working with highly scalable services. Enjoys new technological challenges and is motivated to solve them. Excited about making better software and continuously improving the development, integration, and deployment processes. Smart, highly motivated, self-starter who thrives in a bottoms-up, fast-paced, highly technical environment. Effective collaborator, experienced in creating technical partnerships across teams. Unwavering passion for meeting customer demands and delivering a dial tone service. Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
ResponsibilitiesTechnical Knowledge and Domain-Specific Expertise Develops a foundational understanding of distributed systems design, interactions between cloud technology layers and components, basic dependencies at scale, and the code that defines infrastructures. Can contribute to the code base the defines components or features of systems or cloud technologies to improve the reliability and operability of supported products, with direction with other engineers. Develops an understanding of the code, features, and operations of specific products at scale as required to contribute to incremental improvements in product availability, reliability, efficiency, observability, and/or performance; participates in on-boarding, code/design reviews, and regular meetings with the engineering teams that develop and/or manage those products. Contributions to Development and Design Develops and tests basic changes to optimize code and improve the observability, reliability and operability of a defined range of platform, system, or product components or features with direction from other engineers. Supports ongoing engagements with product engineering teams by participating in code/design reviews, regular meetings, on-call rotations, and incident responses throughout product development and operations cycles; draws insights from engagements with product engineering teams and basic analyses of telemetry data to propose potential improvements to code and designs for a defined set of product components or features with guidance from other engineers. Driving Operational Excellence Implements simple configuration and data changes across a predefined range of product components or features with guidance from other engineers to develop an understanding of how configurations, binaries, and data can be managed using code, tooling, and automation. Develops an understanding of how to safely and reliably manage changes in production by using existing tools and automation to enable product engineering teams implement changes across a defined range of components or features, with direction from other engineers. Uses existing tools to troubleshoot problems or flaws affecting the availability, reliability, performance, and/or efficiency of components or features with guidance from other engineers. Suggests potential solutions to resolve and prevent recurring issues and brings them to the attention of other engineers or team leads. Responds to incidents during regular on-call rotations by identifying the level of impact, troubleshooting basic issues, and deploying appropriate fixes to resolve root cause(s); alerts product teams or owners to major customer impacting issues and escalates the resolution of complex issues and/or those affecting multiple components or features to other engineers as needed. Shares details related to incidents and their resolution through post-mortem reports and during regular review meetings. Develops an understanding of key learnings, insights, and best practices that can be applied to improve system, platform, and/or product development and operations by participating in code/design reviews, incident drills and debriefs, and regular meetings, as well interactions with more experienced Site Reliability Engineers (SREs) and members of product engineering teams. Additional ResponsibilitiesDesign, develop, and deliver the required software engineering to serve and protect O365 government clouds.Own deployment, availability, reliability, performance and customer escalation targets for sovereign environments.Proactively identify and reduce issues through design, testing, and implementation of software-based solutions.Collaborate with Engineering and Program Management partners to translate customer, business, and technical requirements into architectural designs and feature releases.Drive efficiencies through software improvement and root cause analysis resulting in service delivery, maturity, and scalability.Work within a highly skilled team of engineers to deliver revolutionary improvements to the cloud and scale them. Other Embody our culture and values
Applied = 0

(web-8588dfb-vpc2p)