Microsoft Cloud Infrastructure and Operations (CO+I) is the engine that powers Microsoft’s cloud services. The group is responsible for designing, building, and operating Microsoft’s global datacenters; managing the programmatic delivery of our critical infrastructure design, equipment procurement, construction delivery, infrastructure innovation, demand planning and capacity utilization of our unified infrastructure; and responsible for all operations needed to run the physical infrastructure.

We focus on smart growth with an emphasis on automation, data-driven engineering, cost‐effectiveness, and environmental sustainability. We deliver the core infrastructure and foundational technologies for Microsoft’s 200+ online businesses including Azure, Office 365, Bing, Xbox Live, Skype, and OneDrive.  Our portfolio is built and managed by a team of subject matter experts working 24x7x365 to support services for more than 1 billion customers and 20 million businesses in over 90 countries worldwide. 

Within CO+I, the Central Operations Datacenter Availability Team is responsible for managing a portfolio of strategic initiatives with a focus on datacenter infrastructure availability. The Central Operations Datacenter Availability Team is seeking a highly motivated and experienced Director, Datacenter Availability – Americas. If you are a leader with a growth mindset, we encourage you to apply for this exciting opportunity. 

In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day.

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. 

Empower Billions!

Responsibilities
• Lead a team of engineers, program and project managers who perform forensic analysis, solution determination, and solution implementation for incidents which have occurred within datacenter infrastructures.
• Function as a subject matter expert of their space, able to speak to all aspects of datacenter functions and failure modes in critical environments.
• Leverage data analysis from the field to develop new design concepts in collaboration with the Design Engineering and Innovation Engineering teams.
• Develop strategic direction for all Availability Improvement initiatives in alignment with Central Operations direction.
• Manage Technical Service Bulletin (TSB) process and establish strategic roadmap for monitoring and implementation of TSBs globally.
• Budgeting for known risks and secure funding through CapEx and OpEx for solutions addressing systemic risks for Critical Infrastructure.
• Collaborate with Reliability Engineering on prognostics and automation to maximize infrastructure availability.
• Collaborate with Central Operations Team partners on methodologies to validate datacenter performance, system control parameters and operational efficiency against design intent and determine quantifiable deviations.
• Collaborate with the Standards Team to drive global standardization and consistency of processes, procedures, and reports with Operations teams for Quarterly Business Reviews.
• Manage the Forensic Team’s troubleshooting and root cause analysis associated with equipment failure.
• Manage the Availability Team’s efforts on reviewing equipment and system performance data to identify issues through trend analysis.
• Analyze full time employee and vendor staffing to include training, procedures, and site requirements as part of root cause analysis and solution implementation.
• Foster and promote implementation of lessons learned from analysis across multiple design, construction, and operational organizations.
• Lead the development of solutions for defects identified through trends and data analysis.
• Work with Design Engineering and Site Operations Engineers to establish visual standards, process improvement and error proofing systems to drive efficiency within the business.
• Identify and monitor the need for use of new tools to improve the quality of data and analytics.
• Other
• Embody our Culture and Values

Qualifications

Required/Minimum Qualifications:
• Bachelor’s Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 8+ years electrical, mechanical, or controls engineering experience
• OR Master’s Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 7+ years electrical, mechanical, or controls engineering experience
• OR Doctorate Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 5+ years electrical, mechanical, or controls engineering experience.

Other Requirements

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to, the following specialized security screenings:
• Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
• Citizenship & Citizenship Verification: This position requires verification of U.S. citizenship due to citizenship-based legal restrictions. Specifically, this position supports United States federal, state, and/or local United States government agency customers and is subject to certain citizenship-based restrictions where required or permitted by applicable law. To meet this legal requirement, citizenship will be verified via a valid passport.

Preferred/Additional Qualifications
• 10+ years of experience in working in critical environments.
• Proficiency in Datacenter or Critical Environment’s Mechanical and Electrical systems.
• Communication and leadership skills to drive projects and build buy-in and support.
• Experience leading and managing a business-critical function.
• Experience leading construction, design, and process reviews assessing availability and reliability threat vectors, while partnering with various teams to design out or eliminate the potential issues.
• Experience driving resolution with an unquenchable desire to understand root causes and incident triggers.
• Understanding of datacenter and topologies or equivalent Mission Critical facility background.
• Analytical skills with the ability to extract and summarize large, complex data from multiple databases and systems.
• Experience in a team dynamic with the ability to influence cross functional team and leadership team in driving process improvement, efficiencies, and best practices.
• Ability to work with people at all levels of management.

Reliability Engineering M5 – The typical base pay range for this role across the U.S. is USD $137,600 – $267,000 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $180,400 – $294,000 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay

Microsoft will accept applications for the role until August 9, 2024.

#COICentralOpsCareers

#COICareers

Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.

Tagged as: Director

Alert me to jobs like this

Director, Datacenter Availability – Americas Fulltime

Microsoft Software Publishers

Job Overview

Log In

Sign Up

Director, Datacenter Availability – Americas Fulltime

Microsoft Software Publishers

Job Overview