NASA Appoints Constellation Program Managers

June 15, 2010

NASA LogoThis is so sad. What was once probably one of the coolest jobs on Earth – “Constellation Program Manager” – now turns out to be something deliberately pepped up. NASA News writes:

Lawrence D. Thomas has been appointed manager of NASA’s Constellation Program, which manages the effort to take humans beyond low-Earth orbit and develop the next generation launch vehicle and spacecraft.

Watch the emphasis (mine). We wanted to go to the Moon, Mars and also get back. Orion? Merely an escape-vehicle for the ISS, if at all. Ares? Canceled. Altair? Who knows.

Anyway: Congratulations to Lawrence Thomas!


“AI That Picks Stocks Better Than the Pros” – and then what?

June 15, 2010

Chi squareOut there at Technology Review I found this article about an “AI” (you may raise your eyebrows here) which is supposed to better in stock-market speculations than actual humans. For a brief introduction let me quote TR:

It’s called the Arizona Financial Text system, or AZFinText, and it works by ingesting large quantities of financial news stories (in initial tests, from Yahoo Finance) along with minute-by-minute stock price data, and then using the former to figure out how to predict the latter. Then it buys, or shorts, every stock it believes will move more than 1% of its current price in the next 20 minutes – and it never holds a stock for longer.

TR points out, that analyses similar to the described algorithm exists since the 90ies. However, the new systems doesn’t actually parse all the data, but concentrates on some keywords which seem to be of relevance.

I see two very odd flaws there.

  1. A good AI predicting the stock-market based on human-written text – which could be technically used by *anyone* who could afford it – would lead to a situation where stocks keep heating up. Speculation will grow rapidly and positive feedback loops will possibly run into an overdrive situation. I wouldn’t opt in for a ban on such a software but on full disclosure if this software was used on a certain bid. This could help in debugging situations and to give legislators something to think off when the shit already hit the fan.
  2. If the algorithm actually concentrates on keywords in context rather than in the whole analysis of the text, I bet a fiver that it wouldn’t even take a few weeks until some clever consulting company analyzed the algorithm and makes up a process how to tweak your fiscal reports so that AZFinText favours this text. Think of the stock-market equivalent of a Google bomb.

Nobody in Technology Review’s forum seems to be worried about the real-life implications… I think I’m just pointing out the obvious and that the stock-market professionals already made up their own ideas.


Back from the ISC’10 Tutorials

May 31, 2010

ISC-10I’m just back from the ISC’10 Tutorial Sessions. Getting to and from Hamburg in one day from Düsseldorf is a pretty harsh thing you could do. First, the A1 was basically just a concatenation of construction sites, making it quite a hassle to get there. Means, I got there just on time.

We arrived at the registration desk at 13:20 sharp – Tutorial would start at 13:30. Registration was smooth. Give yer name and company, grab badge, WiFi-details, a map how to get from the CCH to the University and a schedule for all the tutorials.

First thing we didn’t like: The schedule was divided in tracks – there was a CUDA-track, an Infiniband-track, and so on. A lot of lectures in every individual track, but no timeline, when the individual talks should start! It was hard to decide what to do first.

We grabbed our stuff and had a really short walk of probably 5 minutes to the venue. The building was a typical: A huge 1900′s building, huge, massive, a lot of stairs – but eventually we got into the lecture room B, where we wanted to hear the CUDA tutorial.

Unfortunately, the CUDA talk wasn’t anything new. I really think that the slides of this tutorial were actually used in a NVIDIA webinar about CUDA I attended last year! It was a real Deja Vu, and I think they just changed the date on that slides. Gernot Ziegler started up with the tutorial – I thought this would be “big time”, cause we’d get the opportunity to hear someone who’s really into CUDA – he’s with NVIDIA after all. Anyway, he passed over the tutorial to John Stratton, who did quite a decent job, but he was pretty unlucky for he had to reiterate that presentation I already enjoyed last year in the webinar.

Ah, what a wasted opportunity. See, I’d expect more from the ISC when it comes to CUDA than reciting what was told earlier on less specific channels!

My colleague and me stuck to the talk until the long break – some cool things were said about how to abuse the texture-buffer in a clever way to to really local multidimensional computations, but unfortunately it wasn’t elaborated enough. For my taste.

After the long break we decided to hit the Infiniband-talk. Since we deploy large installations we thought that this gives us some insight about how to deploy Infiniband and how to make up migration-concepts of how to get from Ethernet to Infiniband. Sadly, the talk was just a roundup of the available vendors, their products and performance comparison. That’s not really what we expected, we could’ve just looked that data up elsewhere. While the talk was still going on (I think we were on slide 78 of 145) I decided to check the proceedings and see what this talk would be offering in the next hours. Unfortunately it was about to continue like that.
That was the time when we left the hall and went for the “Hybrid MPI & OpenMP Parallel Programming” lecture.

And wow, that was awesome. We got there pretty late, it must’ve been 16:30, but we were basically stunned by the ideas. Basically MPI and OpenMP both have their advantages and flaws. I certainly thought about combining both technologies, but never did for lazy personal reasons. And then those guys just did it: Awesome. In general, if we got out cluster of SMP-systems with multiple sockets and multiple cores we should be running MPI on the outer computing domain, and OpenMPI on socket- or core-layer. This ain’t not new, but they gave me some insight about why you should do it and what pitfalls my arise.

In the end, I thought I should have sticked to the last tutorial in the first place. Coming back to my first complaint, there wasn’t a real schedule, which is sad. We couldn’t decide to go to which tutorial first since we had no idea when all those lectures were taking place. If the ISC continues to give those tutorials, they should improve their schedule.

Then again, since my collegue and me were both attending the CUDA-talk, we already elaborated on the spot how we could use CUDA in our usecases. We just got rough ideas, but sitting together, listening to that talk wasn’t just a waste of time.
It brought us together.

Now I’m back at home and don’t feel too bad about the ISC lectures – Now I feel sad that I’m not able to attend the rest of the ISC. But I got to work on Monday. Which is in… omg, in about 8 hours.

Good night! I’ll be putting links to this posting tomorrow evening.


Supercomputing from scratch + pay: Awesome job

May 27, 2010

Pacific Northwest National Laboratory

The PNNL just recently hired Adolfy Hoisie, the winner of the Gordon Bell Award back in 1996. He’ll be doing pretty interesting stuff: Desingning supercomputing from scratch, both in hardware and software. This is an amazingly interesting task since you can drop all burdens, all hardware- and software-restrictions you ever had. No more tailoring your software to fit the buffers into your cachelines. I mean, that’s just: Wow. Not that I’m envious or someting, I really like my job, but getting a grant for a task like this is like winning the lottery.
Adolfy, come on, show us how it’s to be done! Awesome.


A Better Rice For The World

May 19, 2010

RiceThe Nutritious Rice for the World (Rice) project, a World Community Grid BOINC project, ended a few weeks ago. BOINC (Berkeley Open Infrastructure for Network Computing) is a non-commercial program and infrastructure which allows volunteers to donate their computer’s spare computing resources to take part in very interesting, computing intense scientific projects. Many people around the world contributed their CPU-resources to help figure out the structure of proteins of the most common strains of rice. In the end, about 25,761 years of CPU-time were contributed to the project. IBM heavily contributed to this project through their World Community Grid (WCG) program, offering Rice a massive userbase and community.

Rice is one of the most common foods in various parts of the world. It’s in the interest of us all to find varieties and breeds of rice which are most nutritious or resistant against pests; the project’s goal is to find out which varieties of rice interbreed with others to give the best results so that we’ll get new strains of rice which are harder, better, faster, stronger.

Ram Samudrala

Dr. Ram Samudrala

A lot of BOINC-users who contributed to the project (like myself) are now asking themselves a lot of questions. Who are the people behind the scenes? How much work is necessary to get a project like this into operation? What was IBM’s role? What will happen with the contributed results? And after all, who will benefit from the project?

I think no one can give better answers than Ram Samudrala, PhD and Principal Investigator of a computational genomics research group at the University of Washington. Rocker, scientist and Emacs-admirer – he was so kind to answer me some questions about the project.

Tell us a little about yourself and how you got involved in the Rice-project.
Ram: I’m a professor researching computational biology at the University of Washington Seattle. My overarching interest has been to understand and model how the genome of an organism (genotype) specifies its behaviour and characteristics (phenotype). We develop computational algorithms to this end that are applied to whole genomes and we work on many organisms. Rice was specifically chosen since our collaborators at the Beijing Genomics Institute had just finished sequence (and we annotated the refined version) and I also got a $1.9 million grant from the US National Science Foundation (NSF) to predict the structure and functions of all proteins encoded by the rice genome. We developed algorithms to do this and we applied it to all rice proteins. Then IBM came along and offered us the means to redo some of our calculations on the most difficult proteins using the WCG and then we ported our code over to work on the Grid.

When was the first time you considered using voluntary distributed computing for your project?
Ram: Since the days of SETI@home, and since we built our own local clusters to do structural computational biology, but porting our code to BOINC was always a inertial challenge.

Did you consider using other DC-infrastructures except BOINC, like distributed.net? If yes, why did you decide using BOINC?
Ram: No, we used BOINC since it was what was supported by IBM WCG.

Have you considered asking the NCSA for computing resources?
Ram: Yep, but it’s a cumbersome process, like applying for a grant, and again, porting software to work on different architectures. The barrier is that we get grant money to do research and not develop software. I have used NIST supercomputing resources in the past.

You said you would need 200 years of computing time using your available resources. Besides voluntary distributed computing and the University of Washington, were there other universities or institutes directly contributing computing-resources to your project?
Ram: Not for this project, no.

Rice BOINC Splashscreen

Rice BOINC Splashscreen

You were using algorithms from the Protinfo website. Which one did you actually use, how much effort did you put into customizing it for using it in BOINC? Can you tell us if those algorithms and implementation are released under a free license?
Ram: It’s the Protinfo AB algorithm, which is our ab initio or de novo simulation protocol. IBM spent a fair amount of time porting the code
to work with BOINC. The original algorithms/software are all freely available without any claim of copyright (i.e., in the public domain).

Could you explain “de novo” and “ab initio” for non-scientists, please?
Ram: De novo” and “ab initio” generally are translated to mean “from first principles”. In the old days, this used to mean using pure physics energy potentials for protein folding. These days, to us, it means any set of general principles that is not biased to a particular protein or organism.

If the algorithms you used are under a free license, did you already manage to publish the modifications, if there are any?
Ram: The modifications involving the porting are with IBM and they are unpublished.

(Ed. note: Since the software was released in the public domain there’s no requirement to publish the modifications.)

IBM helped you out in customizing the protein-prediction algorithms for various platforms. Can you tell us how much they contributed?
Ram: All the customisation was done by IBM engineers. We just gave them the original software and ran sanity checks on the output. I’m a strong free software and anti IP proponent, to the degree that I encourage commercial use without restrictions on the software (people can always use the public domain versions if they want to).

Rice Terraces by Flickr user ~MVI~

Rice Terraces

How much time did you save by using the World Community Grid’s infrastructure compared to if you would’ve set it up all on your own, like other projects do?
Ram: IBM took about six months or so to port our software, so I presume it would’ve required that kind of an investment. Keep in mind that they had a lot of prior experience with BOINC. IBM now maintains the code and does the PR and runs the predictions for us. I’d say this would be a full time programmer/sysadm type of person and if I had that extra money, I’d rather spend it on someone doing the basic research.

If there are flaws about BOINC, which would you like to be addressed first?
Ram: I can’t think of any in the way we did it with IBM, but without IBM, the PR machine has to be powerful to get people on board. It’s more than just recruiting people, but also motivating them as IBM does with badges and giving them a sense of community and providing a support infrastructure. This is hard for a research lab to do on their own (it can be done, but is it really the best use of our talents is the questions).

Programming and debugging is an iterative process. Looking at your sourcecode-repository, how many releases of the software were necessary until you got the cow flying?
Ram: For this case, internally we probably had about 10 or so iterations in total, but the basic science part of the software is something that has evolved over 18 years.

How did you do beta-testing, did you use the publicly available beta-projects at WCG? Or, were you actually just doing it in your lab?
Ram: It was mostly in our group. We just submitted sequences for which we knew the answers and we did a dry run initially with the same sequences.

I’m curious there – were these structures predicted by other algorithms or was that done the hard way, using X-ray crystallography?
Ram: These were done the hard way, at the bench. These are our gold standard for when we know we’re right or wrong, so we benchmark our methods against all this. When we did the rice project, we did sequences with known answers to see how well things would work and that there was no chance of anything going wrong.

Dr. Ling-Hong Hung

Dr. Ling-Hong Hung

How was is like getting in touch with the community? Was the feedback helpful? How many people from your team were actually dealing with the community?
Ram: At its peak, we had 3 people dealing with the community, our sysadm and project lead Michal Guerquin, our programmer and scientist Ling-Hong Hung, and myself. Opening our software to the Grid and the community definitely presented some challenges, which I believe will be the focus of our first paper. An interesting tangent of that is that we’ve had to port some of our analysis software to work on GPUs so we could handle all this data. So some good technological developments here that we’ll be writing about shortly.

Michal Guerquin

Michal Guerquin

A lot of people are concerned about “Frankenfood”. Your project’s website explicitly states that this is not about genetic engineering, but about finding the most nutritious rice-strains for interbreeding with other rice-crops. Is there anything you’d like to explain to people who are still concerned?
Ram: We’re simply extending what farmers have been doing for millenia in a more rational way, and also what has been going on in nature for billions of years. The problem to us is scientific and all knowledge that is produced (which from our end will be completely free and transparent) can be used in various ways according to the will of the people. But we have governments and politicians to handle the deeper societal implications. What I mean by this is that people should petition their representatives, as they are doing successfully in many parts of the world, to decide where to go with genetically modified organisms, which I see as ultimately having a socioeconomic/political solution.

Your project is one of the very few with a fixed end, almost all other projects are handing out work-units for new phases. How comes that you’re finished now? Is everything from the rice-genome now analyzed from a computational point-of-view and nothing else left to do?
Ram: Not at all. We obtained a huge amount of data and we’re now pressed to analyse it. I honestly can say that we were overwhelmed with this data. My goal as a scientist though is not just to develop technical tools and produce large tables and graphs but try to come up with something tangible that is prioritised and can be tested at the bench that really changes the make up of rice in a desired manner. The computations and the Grid are the means by which we arrived at this step, but our job now is to figure out where the best low hanging fruit is in collaboration with rice researchers (which we are doing with researchers around the world including IRRI, Phillipines). [Ed. note: IRRI, International Rice Research Institute]

Focussing on the data: Now that you know how those proteins really look like, where do you draw a line and say “this protein is more nutritious than others”? My basic understanding is that the nutritious parts in rice is actually carbohydrates (starch), proteins and some fat. How do I have to imagine this analysis?
Ram: So the proteins we’re talking about are gene products, that carry out almost all the functions in rice (or any other organism). So we use
the protein to refer to a molecule that does this, rather than the nutrition use of the word “protein” which refers to these biological molecules broken down and aggregrated (see “Protein” and “Protein (nutrient)” in Wikipedia).

By nutrition we mean anything that leads to higher range of bioavailable substances like dietary minerals and vitamins. In rice, examples include elements like iron or organics like vitamin A. Incidentally the “golden rice” GMO is a product of Monsanto that has higher beta-carotene, a precursor to vitamin A (“Golden Rice” at Wikipedia). We’d like to get to something like that by crossbreeding without the use of genetic engineering, working on both micro and macronutrients.

So in the end, we need to be able to create a rice strain that does have enriched nutrients and is perhaps better than current strains in
terms of yield and/or hardiness. Before we go off and start crossing rice, there are a number of molecule biology bench experiments that
can be done to say whether predictions we make about the activity of certain proteins will be correct so we’d do them first.

Do you plan to publish all your results in an Open Access Journal?
Ram: Yep, that would be the ideal. Publishing in Open Access Journals also sometimes costs money. I’m not a big fan of the “pay to publish”
model—it’s not a lot of money and some scientists have grants to do this, but it’s not a good principle.

Thank you very much for this interview!
Ram: Thanks; I enjoyed the questions!

Dr. Ram Samudrala is a tenured Professor at the University of Washington, Seattle. He’s head of the Nutritious Rice For The World project and one of the inventors of protein prediction algorithms. He’s a notorious contributor of scientific papers and generally a very nice guy I’d like to buy a drink.

Creative Commons License

Dieses Werk bzw. Inhalt ist unter einer Creative Commons-Lizenz lizenziert.
The rice picture is copyrighted and CC-BY-SA by Flickr-user kadaoor.
The rice-paddy picture is copyrighted and CC-BY by Flickr-user ~MVI~
The pictures of the teammembers were used by permission of the Rice-team.
The BOINC splashscreen is copyrighted by IBM and the World Community Grid and was used with permission.


Virtual School of Computational Science and Engineering offering new courses

May 13, 2010
NCSA logo

NCSA

The VSCSE, the Virtual School of Computational Science and Engineering, is once again offering courses. For this time, they added quite a lot of new sites, where you can attend the courses – 21 sites in all over the US are now available as classrooms. The VSCSE is provided and funded by the Great Lakes Consortium for Petascale Computation (GLCPC), the National Science Foundation (NSF), the State of Illinois, the Committee on Institutional Cooperation (CIC), and Internet2 Commons.

Their press-release:

Want to learn how to use graphics processors for scientific computing? Scale your parallel code to tens of thousands of CPU cores? Deal with ginormous datasets? The Virtual School of Computational Science and Engineering offers these courses and more during its summer program for 2010!

Since 2008, nearly 250 students and researchers have participated in the annual Summer School offered by the Virtual School. During Summer School, students learn new techniques for applying high-performance computing systems to their work. Due to overwhelming demand for courses in previous Summer Schools, we have added 15 sites (for a total of 21 sites) to the 2010 program in order to accommodate additional students. For each course, students attend on-site in one of 10 state-of-the-art, distributed high-definition (HD) classrooms, located at academic and research institutions across the country. These HD classrooms are equipped with live, high-definition videoconferencing technology that provides a high-quality learning experience.

Students attend technical sessions presented by leading researchers in computational science and engineering and use cutting edge, high-performance computing systems provided by TeraGrid resource providers. Course participants apply the techniques learned in hands-on lab sessions, assisted by skilled teaching assistants who work one-on-one and in small groups to answer questions and solve problems posed during the sessions. This summer’s courses are:

The cost for each course is only $100. To participate, prospective students must first be enrolled in the Virtual School. Enrollment is free and can be completed at https://hub.vscse.org/. After enrolling, students select their courses and indicate which of the distributed HD classrooms they would like to attend.

Snacks and an evening reception will be provided; participants are responsible for travel and lodging costs (low-cost dorm accommodations will be provided where possible). Because of the large geographic diversity of participating sites, it is likely that little travel will be required.

For no additional cost, on-site participants can take online short courses on MPI, OpenMP, and CUDA that are designed to help them meet course prerequisites.

For more information on the 2010 courses, including the sites participating in each course and details on enrollment, go to: www.vscse.org/summerschool/2010


OpenRheinRuhr 2010 – Call for Papers

April 14, 2010

OpenRheinRuhr LogoThe second OpenRheinRuhr, where I happen to belong to the organization team, will take place on Saturday, 13th and Sunday, 14th of November 2010. We just published our Call for Papers, so if you feel like you’d like to give a lecture or organize a workshop, you’re invited to submit your papers.

Here the Call for Papers in it’s full glory:

OpenRheinRuhr 2010 – Call for Papers

Trivia: “Pott” is short for “Ruhrpott”, a slang-term for the Ruhr-Area. Also, Pott is German slang for a cooking-pot. Hence the slogan “A pot full of software”.

When: Saturday, 13th and Sunday, 14th of November 2010
Where: Rheinisches Industriemuseum, Oberhausen, Germany

The second OpenRheinRuhr in Oberhausen will follow on from last year’s success in Bottrop. The OpenRheinRuhr is an exhibition and conference dedicated to all free software and internet policy related topics. Users, IT experts, and decision makers will get the chance to meet up and talk in workshops, lectures, and at their booths. Companies are invited to present themselves and their free-software products and services.

Call for Papers

The OpenRheinRuhr invites users, developers, administrators, IT decision makers and civil rights activists to submit papers, workshops and talks about the following topics:

  • New Developments in Free Operating Systems and Applications
  • Desktop and Graphics
    • Multimedia
    • Office Applications
    • Synchronisation with Mobile Devices
  • Internet, Web-technologies
    • Emerging Network Technologies like IPv6 and Multicasting
    • Content Management Systems
    • Workgroup Solutions
  • Community Projects
  • System Administration Topics
    • Virtualisation and Migration
    • Deployment and Configuration Management
    • Scripting and Utilities
  • Business Applications
    • Data Warehouse, ERP and CRM-solutions
    • Configuration and Asset Management Solutions
  • Security, Privacy and Anonymity
    • Anonymisation Technologies
    • VPN and IPSec Solutions
    • Secure Programming and Administration of Applications
  • Law and Licenses

Premises

The Rheinische Industriemuseum is very close to Oberhausen’s central station. The exhibition-area is 790 square meters in total. Three rooms for up to a 100 people for talks and workshops are available.

Submission of papers

Please submit your papers by the 12th of September using our online-form, follow it’s instructions and give us the following details additionally:

  • a short abstract of your submission (one paragraph)
  • a detailed description of the topic of your talk (maximum three paragrahps)
  • the intended audience of your submission (users, sales, adminstrators, marketing)
  • the skill-level of your talk (easy, advanced, expert)

Language of submissions

The audience is almost exclusively German or German-speaking. Submissions in the English language are not banned, but should be limited to rare exceptions. Submissions in languages other than German or English will not be accepted.

Schedule

Talks will last 42 minutes. The organisation team will agree on the individual duration of workshops.

Formats and Licenses

The papers shall be submitted as OpenOffice-, TeX-, DVI- or PDF-files. In the spirit of the Open Access movement the OpenRheinRuhr suggests to publish your papers unter the terms of the GNU Free Documentation License or a Creative Commons license.

Other details needed:

  • full name of the submitter
  • email-address of the submitter
  • a short self-description of the submitter (picture of yourself, short CV, background)
  • what equipment is needed (flipchart, video-beamer, etc.)

Anonymous submissions are also accepted.

The OpenRheinRuhr is looking forward to interesting papers. In case you need any help with your submission, get in touch with us: vortrag@openrheinruhr.de


Follow

Get every new post delivered to your Inbox.