Thanks for visiting Scientific Computing World.

You're trying to access an editorial feature that is only available to logged in, registered users of Scientific Computing World. Registering is completely free, so why not sign up with us?

By registering, as well as being able to browse all content on the site without further interruption, you'll also have the option to receive our magazine (multiple times a year) and our email newsletters.

Numascale sets record-breaking STREAM Benchmark

Share this on social media:

Record-breaking results from a shared memory system running the McCalpin STREAM Benchmark, a synthetic benchmark program that measures sustainable memory bandwidth and the corresponding computation rate for simple vector kernels, have been announced by Numascale. The company’s cache coherent shared memory system, which was targeted for big data analytics, reached 10.06 TBytes/second for the Scale function. This ranked 53 per cent higher than the next most scalable system on the list, which achieved 6.59 TBytes/second.

Numascale’s record-breaking system is the first part of a large cloud computing install at a North American data centre facility for the analytics and simulation of sensor data combined with historical data. The system is being used to run analytic models that simulate complex dynamic behaviour in a certain supply chain. The model uses both historical data as well as close to real-time information to predict behavior, and the vast size of the data sets requires large memory short access times in order to be able to complete computations within deadlines.

To run all calculations compiled from disparate data sources in a timely manner — both structured and unstructured — requires significant computing power and a large shared memory. Numascale’s STREAM results indicate that the total bandwidth of the system is capable of supporting large parallel workloads. The STREAM benchmark is specifically designed to test datasets much larger than the available cache on any given system, so its results indicate, to some degree, of the performance of very large, vector-style applications.

Numascale’s system consists of 108 Supermicro 1U servers connected in a 3D torus via their NumaConnect Interconnect technology. Three cabinets with 36 servers apiece were used in a 6x6x3 topology. Each server has 48 cores in three AMD Opteron 6386 CPUs and 192 GBytes memory, providing a single system image and 20.7 TBytes to all 5,184 cores. The system was designed to meet requirements for ‘very large memory’ hardware solutions running a standard single image Linux OS on commodity x86-based servers.

NumaConnect enables scalable server computer systems to be built from commodity components at cluster prices, while providing high performance shared memory programming capabilities. The Interconnect technology eliminates the difficulty of MPI coding for big data problems and typically increases programmer productivity.

In Numascale’s record-breaking system, NumaConnect provides a total physical address space of up to 256 TBytes of system-wide shared memory. It does so using cache coherency logic with a directory-based protocol that scales to 4096 nodes, providing 196,608 cores. In running this STREAM benchmark Numascale’s system did not use all of its cores, as it is a better utilisation of memory channels to let one core run each memory controller, thus avoiding arbitration between different cores and providing optimum memory bandwidth.

For this install, Numscale will deliver a training session to teach best practice software design methods that take full advantage of its unique architecture. Furthermore, the company has signed a development agreement whereby Numascale will co-develop future software solutions with the data centre.

Company: