Chris quits.

February 21, 2007

That made me sad. I just read that Chris, the guy behind HPC Answers, is going to stop blogging.

Chris, we’ll miss you. You’re one of the exceptional guys who really knew what they’re talking about. They’re ain’t not so much people blogging about HPC and you got a hell of an expertise.
Make sure your legacy is well archived in some safe location, your “Answers” were true and indeed very useful answers.

I wish you all the very best for your new job and I hope you drop by any now and then – virtually here or physically in Germany, if you ever happen to be here.

It was an honor to read your blog. Whoring about HPC-topics won’t be the same without you.

Good luck!

Alex.

Tech Tags:


Louisiana State to offer course on HPC in spring 2008

January 26, 2007

f77Via Supercomputing Online:

Thomas Sterling, who’s nowadays teaching the the LSU, is preparing a course on supercomputing for spring 2008. They’re going to broadcast the lectures in high-definition TV to other universities over the internet. He’s also working on a textbook of this topic and they’ll also offer the course on DVD later.

Interesting, makes we want to go back to Uni again.

Tech Tags:


HPC roundup

December 27, 2006

It’s been quite some time, I haven’t blogged much. I’m currently comitted in a project where we’re deploying a couple of new components at a german Telco; a new cluster, a new Network Management System and some kind of Layer-7 Proxy. This keeps me busy, I apologize to my regular readers for the lack of upgrades. So, here a little roundup of things which happened in the last 6 weeks or so.

OK, let’s get started; first there was SC06, which was quite some happening I’ve missed this year. Interesting hardware was the Intel SR1530 systems for example; eight bloody cores in one 1HU-case. Nifty! SiCortex announced a 5832 MIPS-core system for the masses – the SC5832 offers 5.8 TFLOPS (peak) at just 20 kW of power-consumption trough using power-ompitzed MIPS64 cores. Nvidia showed of with CUDA, a library for offloading computing to the GPU and Dell annouced systems with quad-core Opterons. And there was news about IBM’s 1350 hybrid CBE-blade-system.

Considering the HPC/SC-business there wasn’t much in my opinion. First, NEC announced a cooperation with SUN, which I already covered earlier. Unfortunately NEC still didn’t comment on my questions, maybe I’m not worthy enough. Cray fortified it’s DARPA-commitments through getting another 250-million-USD contract with it’s Adaptive Supercomputing Initiative. Bull sold another 43 TFLOPS supercomputer to the french (CCRT). Yeah, and there was that nifty supercomputer in a chapel, Mare Nostrum in Spain. Quite some location for a supercomputer!

On the software-front we had the annoucement from SUN for a new SUN Grid Engine 6.1; XenSource unveils a couple of new virtualization-products.

Conference related stuff: ISC 2007 issued it’s call for papers. Top500.org released it’s BOF-session slides.

So how was the year 2006 for the HPC-business? HPC-wire has a round-up.

OK, that’s it for now; merry belated christmas, a happy new year – your’s truly,
Alexander Janßen.


NEC and SUN team up for hybrid supercomputers

November 21, 2006

Via NEC:

NEC and SUN are teaming up to build hybrid supercomputers; they’re going to blend SUN’s Fire servers with NEC’s SX-series supercomputer to form a “hybrid supercomputer”.

From their press-release:

“Hybrid” supercomputing solutions provide a superior benefit for customers who wish to utilize both vector and scalar computing environments based on the suitability of customer codes. This solution also provides the capability to share data between vector and scalar computing environments.

They also comitted to several other agreements; NEC will also play the role as the integrator of those hybrid-supercomputers.

I really like to see how they’re going to mash together those two pretty different systems. NEC has their own methode of connecting SX-8 nodes with their Internode Crossbar Switch (IXS), whereas SUN Fire servers are regular SMP-machines which are connected by Infiniband or, more traditionally, by Gigabit Ethernet.

If they want “to share data between vector and scalar computing environments” I’d say that they hve to couple those pretty different architectures tightly. How to do that? I don’t know yet.

However, the TOP500 already mention a machine from SUN, an Opteron x4600 Cluster, which is supposed to be a NEC/SUN combination. The NEC-portion is the storage-subsystem, they delivered their iStorage S1800AT for use with SUN’s x4600. Interestingly they also utilize Clearspeed’s accelerator-boards which deliver up to 50 GFLOPS per board at only 25 Watts. The supercomputer is ranked 9th in the list.

Tech Tags:


green500.org – new supercomputing benchmark site

November 19, 2006

SC06 logoVia top500.org Blog:

There was a talk about HPC Power Consumption on SC06 during the BOF session; during this session they pointed out that there’s a Green500 list where supercomputers are not just benchmarked for their peak-power, but compared to the total power consumed. The whole thing isn’t new, it was already presented at HP-PAC, but it somehow was under my radar.

Interestingly they do not only compare FLOPS/Watt but also introduce a power-benchmark known from circuit-design, called the EDn-metric (“E is the energy being used by a system while running a benchmark, and D is the time taken to complete that same benchmark.”) and adapt that one even further to the so called ∂-metric, which gives the user the possibility to put more emphasis on either energy or performance[1]. The paper[2] also has a table comparing several cluster-setups with different benchmarks.

It was my opinion for a long time that clusters and supercomputer will have to be optimzed for a high performance/Watt and a low Watt/space since energy and space will become even more expensive in the future. I’m glad to see that serious people are investigating into that topic.

The more I read about SC06, the more i pity that I couldn’t make it. Maybe next year…

[1] R. Ge, X. Feng, and K. Cameron. Improvement of power-performance efficiency for high-end computing. In The First Workshop on High-Performance, Power-Aware Computing (HP-PAC), Apr. 2005.
[2] Sushant Sharma, Chung-Hsing Hsu, and Wu-chun Feng, Making a case for the Top500 list

Tech Tags:


Cray’s petascale “Hood” XT-4 released

November 14, 2006

Cray Inc.Via cray.com:

Cray’s first computer of their Rainier-line was annouced today, offering a peak-power of 1-Peta-FLOPS. Cray wants to consolidate all of their supercomputer product-lines in their “Rainier”-program, an effort to establish Opterons and Seastar2 technology. The XT-4, also nicknamed “Hood”, will have up to 30,000 dual-core Opteron CPUs and insane amounts of RAM.

XT-4 is now up for sale, although Cray already admitted that they have a backlog of outstanding deliveries.

Tech Tags:


SC06: 28th TOP500 List released

November 14, 2006

SC06 logoLong story short: The 28th TOP500 List was released at SC06 in Tampa.

Tech Tags:


NEC’s SX-8R Vector-CPU to do a 35.2 GFLOPS

November 13, 2006

Via NEC HPCE Europe:

SX-8R single node - picture (C) NECNEC introduces the SX-8R CPU, a makeover of the SX-8 CPU. The CPU now has two Add- and Mul-vector-pipelines each and NEC raised the Gigaherzen from 2.0 to 2.2 GHz. A single CPU is supposed to do a sweat 35.2 GFLOPS. Two entry-level single-node system will be introduced as well; The SX-8R/B will have 1-4 CPUs with a peak-power of 149.6 GFLOPS, where the SX-8R/A will have 4-8 CPUs with a peak-power of 281.6 GFLOPS. The single-node systems use up to 64/128 GB of shared-memory.

The multi-node version SX-8R/M scales up to 512 nodes with 8 CPUs each, resulting in a total peak-power of 153.2 TFLOPS. The bandwidth is supposed to be 288,358.4 GB/s. The system can utilize up to 256 GB per node, resulting in a total 64 TB of RAM for the maximum configuration.SX-8R multi node setup - picture (C) NEC

So, what to do with this beast? Same as we do with all vector-machines: Doing matrix-manipulations. Contrary to “conventional” CPUs they don’t have to iterate over individual “cells” of a matrix doing calculations, vector-CPUs suck in total rows and columns at once into their registers, doing the calculation in a single step. A matrix-multiplication A x B = C for 5 rows and 5 columns would mean that a traditional CPU without optimization would need 5²=25 steps of addition and multiplication for getting the result C. A vector-CPU sucks in a whole row and column at once, resulting in only 5 steps. To say it short: “Traditional” iterative approaches run O(n²), whereas vector-approaches run linear in O(n). This also scales over the number of available pipelines and CPUs: Since the SX-8R has two vector-pipes one could say (naively said) that they’d do a matrix-multiplication in O(n/2) steps. Generally speaking: O(n/C), where C is the number of CPUs, iterative SMP-approaches scale about O(n²/C).

Obviously that’s a naive approach to explain what they’re doing, but that’s how it’s basically working.

Since I work and live in Düsseldorf I really should look into attending a freakshow in the European Supercomputing Centre. By the way, I had a very splendid freakshow at the vintage supercomputing centre of Cray-Cyber.org in Munich two weeks ago. Will put some pictures online later.

And: Sorry for not updating the blog for quite some time, I’m currently busy with a new project (Deploying new clusters for a mobile-phone carrier in Germany). And I’ll miss the SC06. *sigh*
Links:
SX-8R press-release

Tech Tags:


LANL: More on Roadrunner

September 11, 2006

Los Alamos National Laboratories LogoThere’s now a bit more information available about Roadrunner, the new supercomputer for the Los Alamos National Laboratories. We remember, AMD and IBM were fighting for months to get the contract, with a surprising result in the end: Opterons and Cell Broadband Engine (based on IBM’s Power CPU) will be used to build a hybrid supercomputer.

IBM released a bit more information (that was already five days ago, it somehow sneaked under my radar). First of all, IBM won the bid. Second, Roadrunner is supposed to have a peak-performance of 1.6 Peta-FLOPS:

The machine is to be built entirely from commercially available hardware and based on the Linux® operating system. IBM® System x™ 3755 servers based on AMD Opteron technology will be deployed in conjunction with IBM BladeCenter® H systems with Cell B.E. technology. Each system used is designed specifically for high performance implementations.

Designed also with space and power consumption issues in mind, the system will employ advanced cooling and power management technologies and will occupy only 12,000 square feet of floor space, or approximately the size of three basketball courts.

That was basically everything, no word on how the Opterons should be connected, so we’re all still waiting for more on this topic. I still bet a fiver that they’ll use HyperTransport; AMD and IBM are both members of the HyperTransport Consortium.

Tech Tags:


LANL: Opteron + Cell = Roadrunner?

September 6, 2006

Los Alamos National Laboratories LogoVia Heise Newsticker, El Reg:

The Register claims that IBM proposes a new concept for Los Alamos’ new upcoming supercomputer, code-named “Roadrunner”. Both companies are bidding on the project for months, so this solution would be, well, at least very interesting. El Reg writes:

The lab will announce that IBM will build Roadrunner using a hybrid design that makes use of Opteron and Cell systems, according to a report from online rag CNET. The publication cites “sources familiar with the machine” as claiming that the National Nuclear Security Administration (NNSA), which oversees LANL, will reveal IBM’s win “in the coming days.”

So The Reg is quoting CNET, but not giving any pointers, couldn’t find the source yet, so… I file this under “rumours” :)

There was no word yet about how the Cell- and Opteron-CPUs should be integrated; I bet a fiver that they’re going to use HyperTransport and install a Cell-CPU as a Co-processor to the Opterons – probably similar to the technology DRC Computer is using for their Virtex-FPGA integration-solution.

If not, what else would be possible? PCI-Express, for sure, but probably too expensive on a large scale. Ideas, do you know more than Big Reg? Tell me! Or use the fancy comment-function.

Tech Tags: