Singapore’s Open Data Center Vision 2025 Unveiled

Expansion of Open Systems for AI Initiative

The Open Compute Project Foundation (OCP), a nonprofit international organization dedicated to bringing at-scale innovations and hyperscale best practices to all, has announced an expansion of its Open Systems for AI initiative. This expansion includes the addition of the Open Data Center for AI Strategic Initiative (SI), aimed at addressing critical data center infrastructure challenges such as power, cooling, mechanical systems, and management telemetry.

This new strategic initiative was introduced in response to a significant increase in data center physical infrastructure projects over the past year. It also builds on insights from the OCP Open Systems for AI SI workshop series and a recent open-letter call for collaboration. With strong support from the OCP Board and stakeholders, the Foundation encourages other organizations to sign this letter, which was initiated by Google, Meta, and Microsoft. This move reinforces the OCP Foundation’s mission to support the entire open data center ecosystem, covering both IT and physical data center infrastructure and facilities.

Bacaan Lainnya

Key Objectives of the Open Data Center for AI SI

The mandate of the Open Data Center for AI SI is to develop standardizations for data center infrastructure that allow advanced high-density AI infrastructure to be deployed as flexibly as traditional compute. This involves creating a common understanding of management telemetry, advanced power and cooling technologies, and enabling simpler deployment of a wide variety of AI solutions.

Data center partners, including hyperscalers, neoclouds, co-location providers, enterprise users, and technology providers, face challenges due to siloed efforts that produce competing design requirements. These issues slow down innovations and extend deployment timelines. The goal is to identify and specify requirements for AI data centers so that the physical infrastructure can provide a common ground, enabling fungibility for a diverse AI IT infrastructure.

Current Work Efforts Underway

The Open Data Center for AI SI will build on several ongoing work efforts within the OCP Community:

  • Coolant Distribution Unit (CDU) Project: This project focuses on integrating facilities’ technology cooling systems and facility water systems into IT rack liquid cooling.
  • Facilities-Level Power Distribution Project: This initiative covers the transition to a Direct Current distribution architecture that supports high-powered IT racks.

Other notable contributions include:

  • Mt Diablo (Diablo 400): A power-rack sidecar for powering AI clusters, co-authored by Google, Meta, and Microsoft.
  • Deschutes Coolant Distribution Unit (CDU): Authored by Google.
  • Clemente: A high-performance AI compute tray, authored by Meta.
  • Hyperscale CPU RAS and Debug Requirements: Standardized debug capabilities for CPUs in hyperscale environments, co-authored by AMD, Google, and Microsoft.

Technical Details of Key Specifications

The Diablo specification, developed by Google, Meta, and Microsoft, describes a disaggregated power rack or sidecar rack. It pushes power delivery from today’s 48 volts direct current (VDC) within the rack to either +/-400 VDC or 800VDC. This specification defines power solutions for high-density AI racks, enabling IT racks with capacities ranging from 100 kilowatts up to 1 megawatt. By selecting 400 VDC as the nominal voltage, it leverages the supply chain established by electric vehicles, offering greater economies of scale, proven quality, and more efficient manufacturing through standardized electrical and mechanical interfaces.

The Deschutes CDU is designed to support heat loads of up to 2 MW, with hydraulic capacity targets of 500 GPM at 80-90 psi. This would make it one of the highest CDU thermal capacities available in the industry. The specification enables any CDU supplier to develop, manufacture, and improve upon the design. The CDU is assembled from components sourced from multiple vendors, allowing vendors to build and data center owners to purchase a CDU based on this specification. Installation and maintenance procedures are also shared to enable fast deployments of reliable equipment.

The Clemente specification describes a 1RU tall compute tray that integrates two NVIDIA GB300 Host Processor Modules (HPM) into a form factor with peripherals supporting Meta’s AI/ML training and inference use cases. It marks a milestone as the first deployment of a design using OCP ORv3 HPR (in-progress specification contribution) with sidecar power racks. The platform includes both air-cooled and liquid-cooled components, with CPU, GPU, and switch being liquid cooled, while the remaining components are air cooled.

Ongoing Contributions and Future Prospects

OCP’s Open Systems for AI efforts continue to solidify its position as the premier open organization accelerating the deployment of AI data centers. These resources are collected on OCP’s newly opened AI portal on the OCP Marketplace, providing a centralized location for AI cluster designers, builders, and facility providers to access the latest available AI infrastructure products and reference material.

With the AI infrastructure market evolving rapidly, there is a risk of higher costs due to fragmentation. It is crucial for an organization like the OCP to facilitate a community that identifies commonalities in data center facilities and IT infrastructure. This can help accelerate the market for future generations of AI cluster deployments and data center facility builds.


Pos terkait