Back from the ISC’10 Tutorials

May 31, 2010

ISC-10I’m just back from the ISC’10 Tutorial Sessions. Getting to and from Hamburg in one day from Düsseldorf is a pretty harsh thing you could do. First, the A1 was basically just a concatenation of construction sites, making it quite a hassle to get there. Means, I got there just on time.

We arrived at the registration desk at 13:20 sharp – Tutorial would start at 13:30. Registration was smooth. Give yer name and company, grab badge, WiFi-details, a map how to get from the CCH to the University and a schedule for all the tutorials.

First thing we didn’t like: The schedule was divided in tracks – there was a CUDA-track, an Infiniband-track, and so on. A lot of lectures in every individual track, but no timeline, when the individual talks should start! It was hard to decide what to do first.

We grabbed our stuff and had a really short walk of probably 5 minutes to the venue. The building was a typical: A huge 1900’s building, huge, massive, a lot of stairs – but eventually we got into the lecture room B, where we wanted to hear the CUDA tutorial.

Unfortunately, the CUDA talk wasn’t anything new. I really think that the slides of this tutorial were actually used in a NVIDIA webinar about CUDA I attended last year! It was a real Deja Vu, and I think they just changed the date on that slides. Gernot Ziegler started up with the tutorial – I thought this would be “big time”, cause we’d get the opportunity to hear someone who’s really into CUDA – he’s with NVIDIA after all. Anyway, he passed over the tutorial to John Stratton, who did quite a decent job, but he was pretty unlucky for he had to reiterate that presentation I already enjoyed last year in the webinar.

Ah, what a wasted opportunity. See, I’d expect more from the ISC when it comes to CUDA than reciting what was told earlier on less specific channels!

My colleague and me stuck to the talk until the long break – some cool things were said about how to abuse the texture-buffer in a clever way to to really local multidimensional computations, but unfortunately it wasn’t elaborated enough. For my taste.

After the long break we decided to hit the Infiniband-talk. Since we deploy large installations we thought that this gives us some insight about how to deploy Infiniband and how to make up migration-concepts of how to get from Ethernet to Infiniband. Sadly, the talk was just a roundup of the available vendors, their products and performance comparison. That’s not really what we expected, we could’ve just looked that data up elsewhere. While the talk was still going on (I think we were on slide 78 of 145) I decided to check the proceedings and see what this talk would be offering in the next hours. Unfortunately it was about to continue like that.
That was the time when we left the hall and went for the “Hybrid MPI & OpenMP Parallel Programming” lecture.

And wow, that was awesome. We got there pretty late, it must’ve been 16:30, but we were basically stunned by the ideas. Basically MPI and OpenMP both have their advantages and flaws. I certainly thought about combining both technologies, but never did for lazy personal reasons. And then those guys just did it: Awesome. In general, if we got out cluster of SMP-systems with multiple sockets and multiple cores we should be running MPI on the outer computing domain, and OpenMPI on socket- or core-layer. This ain’t not new, but they gave me some insight about why you should do it and what pitfalls my arise.

In the end, I thought I should have sticked to the last tutorial in the first place. Coming back to my first complaint, there wasn’t a real schedule, which is sad. We couldn’t decide to go to which tutorial first since we had no idea when all those lectures were taking place. If the ISC continues to give those tutorials, they should improve their schedule.

Then again, since my collegue and me were both attending the CUDA-talk, we already elaborated on the spot how we could use CUDA in our usecases. We just got rough ideas, but sitting together, listening to that talk wasn’t just a waste of time.
It brought us together.

Now I’m back at home and don’t feel too bad about the ISC lectures – Now I feel sad that I’m not able to attend the rest of the ISC. But I got to work on Monday. Which is in… omg, in about 8 hours.

Good night! I’ll be putting links to this posting tomorrow evening.

Getting Started with HPC Performance Tools

February 21, 2010
NCSA logo

The NCSA is offering a free web-based seminar on getting started with performance tools. The National Center for Supercomputing Applications, based at the University of Illinois, is renowned for it’s expertise in the HPC-environment. Their courses and trainings are among the best in the world.

From their website:

Event Title: Getting Started with Performance Tools
Speaker: Galen Arnold, NCSA
Start Date: 2010-02-25
End Date: 2010-02-25
Start time: 13:30:00 Central Time
End time: 15:00:00 Central Time
Location: NCSA
Event Type: Webinar

Registration is required for this event. Please complete the Registration Form.

This webinar will provide an introduction to performance tools and techniques. A common application, High Performance Linpack (HPL), will be analyzed with profiling tools from a high level progressing down to how the code is mapped onto hardware. To do this, HPL will be analyzed with profiling tools for both user and system time and then a representative component of HPL (matrix multiply) from a near-the-hardware vantage point will be used to show how it can be tuned. Finally, emerging trends in performance tool development will be described.

This tutorial is intended for users with basic parallel programming experience who are new to the performance engineering process.

I signed up. Would you?

Register here.

Chris quits.

February 21, 2007

That made me sad. I just read that Chris, the guy behind HPC Answers, is going to stop blogging.

Chris, we’ll miss you. You’re one of the exceptional guys who really knew what they’re talking about. They’re ain’t not so much people blogging about HPC and you got a hell of an expertise.
Make sure your legacy is well archived in some safe location, your “Answers” were true and indeed very useful answers.

I wish you all the very best for your new job and I hope you drop by any now and then – virtually here or physically in Germany, if you ever happen to be here.

It was an honor to read your blog. Whoring about HPC-topics won’t be the same without you.

Good luck!


Tech Tags:

CUDA SDK available for free download

February 19, 2007

Via Heise Newsticker:

NVDIA finally released their CUDA SDK for free download, means no mandatory registration is necessary anymore.

Unfortunately the SDK is neither free nor open, but free as in free beer. Lots of GigaFLOPS for the masses, I expect lots of distributed computing projects take advantage of it.


[Download] CUDA Programming Guide Version 0.8 (.pdf)
[Download] CUDA Toolkit Version 0.8 Release Notes (.txt)
[Download] CUDA BLAS Library Version 0.8 Reference Documentation (.pdf)
[Download] CUDA FFT Library Version 0.8 Reference Documentation (.pdf)

Complete Install Packages Including Documentation
[Download] Installer for CUDA Toolkit Version 0.8 and CUDA SDK Version 0.8 for Linux X86 32-bit [Red Hat Enterprise Linux 4 (Nahant Update 3)]
[Download] NVIDIA Linux Display Driver Version 97.51 for CUDA Toolkit Version 0.8
[Download] Installer for CUDA Toolkit Version 0.8 and CUDA SDK Version 0.8 for Windows XP (32-bit)
[Download] NVIDIA Windows Display Driver version 97.73 for CUDA Toolkit Version 0.8

Tech Tags:

Louisiana State to offer course on HPC in spring 2008

January 26, 2007

f77Via Supercomputing Online:

Thomas Sterling, who’s nowadays teaching the the LSU, is preparing a course on supercomputing for spring 2008. They’re going to broadcast the lectures in high-definition TV to other universities over the internet. He’s also working on a textbook of this topic and they’ll also offer the course on DVD later.

Interesting, makes we want to go back to Uni again.

Tech Tags:

SUN annouces new HPC-language

January 14, 2007

f77Via Heise Newsticker:

SUN just announced it’s new programming language, “Fortress“, which is supposed to be the successor to FORTRAN. The emphasize lies on parallel computing.


Fortress is a new programming language designed for high-performance computing (HPC) with high programmability. In order to explore breakaway approaches to improving programmability, the Fortress design has not been tied to legacy language syntax or semantics; all aspects of HPC language design have been rethought from the ground up. As a result, we are able to support features in Fortress such as transactions, specification of locality, and implicit parallel computation, as integral features built into the core of the language. Features such as the Fortress component system and test framework facilitate program assembly and testing, and enable powerful compiler optimizations across library boundaries. Even the syntax and type system of Fortress are custom-tailored to modern HPC programming, supporting mathematical notation and static checking of properties such as physical units and dimensions, static type checking of multidimensional arrays and matrices, and definitions of domain-specific language syntax in libraries. Moreover, Fortress has been designed with the intent that it be a “growable” language, gracefully supporting the addition of future language features. In fact, much of the Fortress language itself (even the definition of arrays and other basic types) is encoded in libraries atop a relatively small core language.

A reference implementation (an interpreter, written in Java), is available at the project’s homepage under the BSD-license.

Haven’t looked into it yet, but I’ll definitively will. Stay tuned for updates.

Tech Tags:

HPC roundup

December 27, 2006

It’s been quite some time, I haven’t blogged much. I’m currently comitted in a project where we’re deploying a couple of new components at a german Telco; a new cluster, a new Network Management System and some kind of Layer-7 Proxy. This keeps me busy, I apologize to my regular readers for the lack of upgrades. So, here a little roundup of things which happened in the last 6 weeks or so.

OK, let’s get started; first there was SC06, which was quite some happening I’ve missed this year. Interesting hardware was the Intel SR1530 systems for example; eight bloody cores in one 1HU-case. Nifty! SiCortex announced a 5832 MIPS-core system for the masses – the SC5832 offers 5.8 TFLOPS (peak) at just 20 kW of power-consumption trough using power-ompitzed MIPS64 cores. Nvidia showed of with CUDA, a library for offloading computing to the GPU and Dell annouced systems with quad-core Opterons. And there was news about IBM’s 1350 hybrid CBE-blade-system.

Considering the HPC/SC-business there wasn’t much in my opinion. First, NEC announced a cooperation with SUN, which I already covered earlier. Unfortunately NEC still didn’t comment on my questions, maybe I’m not worthy enough. Cray fortified it’s DARPA-commitments through getting another 250-million-USD contract with it’s Adaptive Supercomputing Initiative. Bull sold another 43 TFLOPS supercomputer to the french (CCRT). Yeah, and there was that nifty supercomputer in a chapel, Mare Nostrum in Spain. Quite some location for a supercomputer!

On the software-front we had the annoucement from SUN for a new SUN Grid Engine 6.1; XenSource unveils a couple of new virtualization-products.

Conference related stuff: ISC 2007 issued it’s call for papers. released it’s BOF-session slides.

So how was the year 2006 for the HPC-business? HPC-wire has a round-up.

OK, that’s it for now; merry belated christmas, a happy new year – your’s truly,
Alexander Janßen.

NEC and SUN team up for hybrid supercomputers

November 21, 2006

Via NEC:

NEC and SUN are teaming up to build hybrid supercomputers; they’re going to blend SUN’s Fire servers with NEC’s SX-series supercomputer to form a “hybrid supercomputer”.

From their press-release:

“Hybrid” supercomputing solutions provide a superior benefit for customers who wish to utilize both vector and scalar computing environments based on the suitability of customer codes. This solution also provides the capability to share data between vector and scalar computing environments.

They also comitted to several other agreements; NEC will also play the role as the integrator of those hybrid-supercomputers.

I really like to see how they’re going to mash together those two pretty different systems. NEC has their own methode of connecting SX-8 nodes with their Internode Crossbar Switch (IXS), whereas SUN Fire servers are regular SMP-machines which are connected by Infiniband or, more traditionally, by Gigabit Ethernet.

If they want “to share data between vector and scalar computing environments” I’d say that they hve to couple those pretty different architectures tightly. How to do that? I don’t know yet.

However, the TOP500 already mention a machine from SUN, an Opteron x4600 Cluster, which is supposed to be a NEC/SUN combination. The NEC-portion is the storage-subsystem, they delivered their iStorage S1800AT for use with SUN’s x4600. Interestingly they also utilize Clearspeed’s accelerator-boards which deliver up to 50 GFLOPS per board at only 25 Watts. The supercomputer is ranked 9th in the list.

Tech Tags: – new supercomputing benchmark site

November 19, 2006

SC06 logoVia Blog:

There was a talk about HPC Power Consumption on SC06 during the BOF session; during this session they pointed out that there’s a Green500 list where supercomputers are not just benchmarked for their peak-power, but compared to the total power consumed. The whole thing isn’t new, it was already presented at HP-PAC, but it somehow was under my radar.

Interestingly they do not only compare FLOPS/Watt but also introduce a power-benchmark known from circuit-design, called the EDn-metric (“E is the energy being used by a system while running a benchmark, and D is the time taken to complete that same benchmark.”) and adapt that one even further to the so called ∂-metric, which gives the user the possibility to put more emphasis on either energy or performance[1]. The paper[2] also has a table comparing several cluster-setups with different benchmarks.

It was my opinion for a long time that clusters and supercomputer will have to be optimzed for a high performance/Watt and a low Watt/space since energy and space will become even more expensive in the future. I’m glad to see that serious people are investigating into that topic.

The more I read about SC06, the more i pity that I couldn’t make it. Maybe next year…

[1] R. Ge, X. Feng, and K. Cameron. Improvement of power-performance efficiency for high-end computing. In The First Workshop on High-Performance, Power-Aware Computing (HP-PAC), Apr. 2005.
[2] Sushant Sharma, Chung-Hsing Hsu, and Wu-chun Feng, Making a case for the Top500 list

Tech Tags:

IBM Cell-based blade system now available

September 12, 2006

IBM LogoVia Supercomputing Online:

Finally some good news so that we can all forget about the TOR-craze for a moment. Supercomputing Online reports that IBM announced today the availability of the IBM BladeCenter QS20, a Cell Broadband Engine based addition to existing IBM infrastructure:

“The IBM BladeCenter QS20 is a Cell BE-based blade system designed for businesses that can benefit from high performance computing power and the unique capabilities of the Cell BE processor to run graphic-intensive applications and is especially suitable for computationally intense, high performance workloads across a number of industries including digital media, medical imaging, aerospace, defense and communications.

The IBM BladeCenter QS20 extends and deepens IBM Power Architecture technology and is complementary to our existing rack-optimized and blade server products based on Intel Xeon, AMD Opteron, and IBM POWER processors.”

One double-wide blade holds two Cell BE processors running at 3.2 GHz and is to be seen as an extension to IBM’s System Cluster 1350. The datasheet mentions 1 GB of RAM (512 MB per processor – does that mean that both CPUs work individually rather than in SMP? I think it’s just an irritating fact in the data-sheet), a 40-GB harddrive, two Gigabit-Ethernet NICs. As an option 1 or 2 InfiniBand 4x adaptors can be connected via PCI-Express. The blade runs Fedora Core 5-based Linux for Cell.

OK, so much for the details, now let’s fire up the mighty oracle-mode: Is that possibly the hardware-basis for the Los Alamos Roadrunner supercomputer we’re all speculating about? Would that possibly mean that the mentioned CBEs of Roadrunner aren’t tightly integrated Co-processors to the Opteron-CPUs? Are they just going to deploy a 1350 together with a bunch of QS20 and wire them up via InfiniBand and Gigabit-Ethernet?

Now that would be cheap and kinda disappointed, wouldn’t it…? And i lose a fiver. Ah, I’m not. Nobody accepted the bet :-) Know more? Are you one of the lucky ones who already deployed a QS20? Tell me! Or use the fancy comment-function.

Tech Tags:


Get every new post delivered to your Inbox.

Join 120 other followers