Machine Learning

Inspur Information and MEGWARE Build a Leading GPU Cluster

Performance targets were exceeded in machine learning and molecular dynamics, which are essential to advancing scientific research and discovery

Inspur information, a leading IT infrastructure solutions provider, and MEGWARE, a high-performance computing (HPC) solutions provider in Europe, combined efforts to bolster the scientific research capabilities of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) through its Erlangen National Center for High-Performance Computing (NHR@FAU). The advanced GPU cluster, powered by Inspur GPU servers, is fully operational and has far exceeded its original performance targets in machine learning and molecular dynamics.

FAU is a leading scientific research institution in Europe and ranks second among the most innovative European universities according to Reuters. It is renowned for its science and engineering in fields like materials science, chemistry, life sciences, computer science, and biomedical engineering. Machine learning (ML) have become increasingly important for many areas of FAU’s research, particularly in computer science. In addition to ML, molecular dynamics (MD) simulations have made the numerical simulation of many real and complex physical models at FAU possible, and the demand for simulating these models using HPC is growing exponentially.

To meet these massive parallel computing needs, NHR@FAU sought to build the largest computing cluster in the university’s history, significantly expanding its research and HPC capabilities. Being a part of the “NHR Alliance,” which is a federation of nine tier-2 computing centers in Germany, the new system at NHR@FAU would also be open for researchers at other German universities. A Europe-wide tender by NHR@FAU led to the selection of Inspur Information and MEGWARE due to their combination of powerful GPU servers, system integration, and optimization expertise.

The new Inspur-powered GPU cluster “Alex” is the core component of NHR@FAU’s HPC Infrastructure to handle the rapidly growing computing resource demands for ML and MD in scientific research. Alex is among the TOP500 and Green500 of the most powerful and energy efficient HPC systems in the world. It is composed of 32 NF5488A5 and 38 NF5468A5 Inspur GPU servers, providing a total of 256 NVIDIA A100 Tensor Core GPUs and 304 NVIDIA A40 Tensor Core GPUs for maximum GPU computing performance. In addition to the massive GPU resources available, there are 140 AMD EPYC 7713 CPUs, and the total memory capacity is almost 50TB. The cluster is interconnected through a high-speed HDR InfiniBand network, resulting in top-level general-purpose computing with excellent MD and AI performance that runs a multitude of research-specific software with various hardware, while supporting massive ML datasets, molecular dynamics simulations, and improving training efficiency.

As the fundamental component of GPU cluster Alex, Inspur GPU servers provide powerful performance:

  • Inspur NF5488A5 is equipped with 8 NVIDIA A100 GPUs and 2 64-core AMD EPYC 7713 CPUs in a 4U chassis and utilizes an NVSwitch GPU interconnect. The design emphasizes performance while reducing operation and maintenance costs and facilitates ease of installation.
  • Inspur NF5468A5 is equipped with 8 NVIDIA A40 Tensor Core GPUs and 2 AMD EPYC 7713 CPUs in a 4U chassis. It utilizes a PCIe 4.0 high-speed interface for the CPUs and GPUs without using a PCIe switch, which eliminates communication delays between the CPUs and GPUs and improves computing performance.

Inspur and MEGWARE’s HPC solution has greatly enhanced FAU’s scientific research capabilities. The performance for model training and inference has exceeded FAU’s original expectations by 115% after Inspur provided hardware recommendations that were better optimized for FAU’s needs, including the use of Inspur’s flagship servers NF5488A5 and NF5468A5.

NHR@FAU’s Alex cluster running on Inspur GPU servers is successfully executing applications such as ML (Tensorflow, PyTorch), chemical applications (Quantum Espresso, and VASP), and scientific research software such as NAMD, LAMMPS, AMBER, GROMACS, etc. FAU and German universities are now able to perform scientific research that was otherwise impossible just a few years ago, and it now at the forefront of scientific exploration.

Inspur holds the world’s leading GPU server product portfolio, providing industry-leading performance, comprehensive products, and rapid time-to-market capabilities. Inspur GPU servers are widely used in image recognition, speech recognition, natural language processing, and other fields. Inspur has rich selection of NVLink A100 GPU and PCIe GPU servers. Based on innovative designs and full-stack performance optimization capabilities, Inspur has been a top performer in MLPerf, a world-leading AI Benchmark suite, receiving 91 top rankings in single node performance since MLPerf Inference 0.7. According to IDC, the global AI server market reached US$6.66 billion in the 1H2021, with Inspur accounting for 20.2% of the market share, maintaining Inspur’s position as the largest AI server provider in world.

For more such updates and perspectives around Digital Innovation, IoT, Data Infrastructure, AI & Cybersecurity, go to AI-Techpark.com.

Related posts

Advanced Solutions Adds On-Device ML Document Classification

PR Newswire

WHOI and Analog Devices Launch OCIA to Drive Innovations in ML

Business Wire

Deep Learning-powered BrainBox AI Pilot Achieves Energy Savings

Business Wire