Vintage & unusual Computers on E@H Part II

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 742389408
RAC: 871071
Topic 196672

Reviving an old thread here.....

Speaking of unusual computers: inspired by a volunteer who had experimented with running E@H for the Raspberry Pi System-on-a-Chip computer, I did some tests as soon as I got my own RasPi.

The Raspberry Pi is an inexpensive (ca 35$) no frills computer that fits on a roughly credit-card-size circuit board. Its heart is a chip by Broadcom that has a single core, 32 bit RISC, ARMv6 generation CPU, 512 MB of RAM and a fairly powerful GPU stacked together. It uses an SD flash card for persistent storage, but via its two USB ports you can attach conventional hard drives as well, together with other accessories like keyboard, mouse, WiFi or 3G dongles. It also has a 100Mbit Ethernet port, and two video output ports: analog composite signal and digital HDMI. The HDMI port may look like overkill first but really makes a lot of sense since the GPU is powerful enough to play back HD content without dropping frames. Audio can be played back via analog headphone type jacks or via the HDMI port, making the RasPi an inexpensive platform for home media centers. It also has freely programmable Input/Output ports that makes it ideal for experiments in robotics, home automation etc.

There are several different operating systems available for the Raspberry Pi. An Android port is in prototype state, the ancient RISC OS (anyone remember the Acorn Archimedes???) runs on it, but by far the most popular OSes on RasPi are variants of Debian GNU/Linux. This means tons of software are ready to be installed on your RasPi directly from a huge repository. Web servers, firewalls, games, web camera capturing tools...you name it. And of course: BOINC !!

After installing BOINC, the first obvious thing to do is check the benchmark results:

http://einsteinathome.org/host/6209047

oh oh....the floating point performance is 10 to 20 times worse than that of a moderate desktop CPU (per core!), and this RasPi was already overclocked to almost 1GHz. Ah well, there are lies, big lies, and benchmarks, so let's give it a try anyway.

The BRP4 app is the easiest to port to a new platform, and indeed only a few modifications were needed to build the app for (and indeed on) the Raspi. It can be used via an anonymous platform app_info.xml file. If enough people care about it I will put the changes into the source code distribution.

The good news: the result validated on Einstein@Home. The bad news: it indeed took almost 10 days (running almost 24/7) to complete, just in time for the 14 days deadline.

http://einsteinathome.org/task/323133122

So it's slow. Is it at least energy efficient? I tried to measure the power consumption of the unit, "Kill-A-Watt"-style at the wall socket but got no reading. It was just too low for the meter I used. Given the specs of the board, I think ca 1.5 Watt at the wall socket is realistic. The E@H task took about 0.4 kWh to complete. A PC with a quad core CPU or a hyperthreaded dual core can complete 4 tasks in ca 12 h with (say) 50 Watt, so it needs less than 0.2 kWh per task. GPUs play in a different league altogether, of course. Unfortunately no one has found a way yet to use the GPU on the Raspberry PI for BOINC.

So why bother? The Raspberry Pi hardware is comparable in performance to that of older smartphones like the iPhone 3, newer models are a bit more powerful and have multi-cores, making BOINC on Android something worthwhile to look at. It would probably be possible to unbundle the BRP4 tasks (currently consisting of 8 independent sub-tasks) specifically for slow app versions to make the overall runtime per workunit acceptable, (but that is not a trivial thing to implement because for (cross-)validation you have to re-unite them again.)

Also ARM is trying to position itself in the high performance computing world with a focus on energy efficiency with their newer CPUs. So it can't be wrong to have a look at the architecture.

As for the Raspberry Pi, it is definitely NOT (and was never meant to be) a number cruncher. Its main strength is its price tag: it's an expandable piece of hardware for experiments, e.g. people have used it as on-board flight computers in stratospheric balloon flights, for robot projects, custom-made baby-monitors, vintage arcade game emulators etc. It is fun to work with, that's for sure.

http://www.raspberrypi.org

Cheers
HB


(Source: Wikipedia)

Janus
Janus
Joined: 10 Nov 04
Posts: 27
Credit: 23862534
RAC: 21

Vintage & unusual Computers on E@H Part II

Awesome little project!

You've got the 512MB version. How much memory was allocated to BOINC and BRP4 while they were running? Would it be possible to run this on the old version (or the A version) which only has ~256MB?

Were these speeds achieved with soft- or hardfloat?

Surprisingly the Pi is doing quite fine. I tried the same test with an ARMv7 (Snapdragon S1 in an Android) at 1.1GHz last year and it only did slightly better in the benchmarks and had almost the same in run-time.

Quote:
Is it at least energy efficient?


They are using on-board linear regulators for power. This is a good way of wasting power - I think you can bypass it by doing your own 5=>3.3v via the expansion port (especially useful if you are down-converting from PoE anyways) or with soldering skills. The network chip is also quite power hungry at times, I wonder if unloading its driver will cause it to go into standby-mode.

Quote:
Unfortunately no one has found a way yet to use the GPU on the Raspberry PI for BOINC


And Broadcom says they are not interested in OpenCL on this version of the VideoCore GPU. ARM, on the other hand, have something going for their mali chip-series and NEON on the newer versions of ARM works wonders with FFTW already.

Quote:
If enough people care about it I will put the changes into the source code distribution.


Please do! Lots of people are using these things for cheap 24/7 monitoring systems etc. - they may as well be running Einstein while doing that. The additional power used by the ARM portion of the SoC is negligible.

Another random piece of info of the day: You can get around 20% extra crunching speed by disabling the frame buffer and console on the RaspPi - it frees up a portion of the GPU secret sauce which otherwise locks a piece of the shared memory to refresh the screen from memory at regular intervals.
Of course you will then have to SSH into it to see what is going on...

Also there is something about using L2 cache and it being possible to double the ARM-to-memory path speed in the GPU if you only use the thing for ARM-based crunching; it gives quite a remarkable speedup both for crunching and for file-system operations. Launching anything 3D in that setup would instantly crash it since only part of the core of the GPU-part of the SoC supports that speed. For servers and crunching, though, this is a pretty nifty trick.

And finally: On the Android it actually took some time for it to do the checkpoints - you can speed it up slightly by doing them less often, at the risk of losing crunched data when/if it crashes.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2968648550
RAC: 690528

A recent UK edition of PC Pro

A recent UK edition of PC Pro magazine (cover date January 2013, page 60) has an article and links for making a 64-node supercomputer running MPI: Steps to make a Raspberry Pi Supercomputer.

The same article covers installing BOINC on a solar-powered Pi: Searching for Aliens with a Raspberry Pi and the Sun (you'll gather that the sample project chosen is SETI, not Einstein - sorry about that).

But it's nice to see both projects covered in a mainstream consumer/newsstand magazine.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 742389408
RAC: 871071

RE: Awesome little

Quote:

Awesome little project!

You've got the 512MB version. How much memory was allocated to BOINC and BRP4 while they were running? Would it be possible to run this on the old version (or the A version) which only has ~256MB?


BRP4 needed ca 205 MB, BOINC is rather negligible, so I'm afraid it won't run on the older B or future A version. It might be possible to reduce the memory footprint somewhat tho. We'll look at this in the near future.

Quote:

Were these speeds achieved with soft- or hardfloat?


Hardfloat. I've read about BOINC tests using softfloat and they were 10 times slower than hardfloat.

Quote:

Surprisingly the Pi is doing quite fine. I tried the same test with an ARMv7 (Snapdragon S1 in an Android) at 1.1GHz last year and it only did slightly better in the benchmarks and had almost the same in run-time.


That is very interesting. You might want to try the test again, using the FFTW version 3.3 with ARMv7 NEON support (I don't think that one was available last year already?) Oh, and I used a custom FFTW wisdom file for the Raspi, without it the BRP4 crunching performance was 20% lower.

Quote:

Quote:
The network chip is also quite power hungry at times, I wonder if unloading its driver will cause it to go into standby-mode.

Like many others I'm using a WIFI USB dongle. If you could find a way to put the ethernet chip to sleep, that would be very welcome in the Raspi community!! The Model A w/o Ethernet has a much lower power consumption just because of this (the voltage regulator is there just for the Ethernet chip I think, the ARM SoC uses still different voltages).

Quote:

Quote:
If enough people care about it I will put the changes into the source code distribution.

Please do! Lots of people are using these things for cheap 24/7 monitoring systems etc. - they may as well be running Einstein while doing that. The additional power used by the ARM portion of the SoC is negligible.

Ok, will do soon.

Quote:


Another random piece of info of the day: You can get around 20% extra crunching speed by disabling the frame buffer and console on the RaspPi - it frees up a portion of the GPU secret sauce which otherwise locks a piece of the shared memory to refresh the screen from memory at regular intervals.
Of course you will then have to SSH into it to see what is going on...

Also there is something about using L2 cache and it being possible to double the ARM-to-memory path speed in the GPU if you only use the thing for ARM-based crunching; it gives quite a remarkable speedup both for crunching and for file-system operations. Launching anything 3D in that setup would instantly crash it since only part of the core of the GPU-part of the SoC supports that speed. For servers and crunching, though, this is a pretty nifty trick.

And finally: On the Android it actually took some time for it to do the checkpoints - you can speed it up slightly by doing them less often, at the risk of losing crunched data when/if it crashes.

Thanks very much, I'll try to test this on my Raspi. If you have any pointers to details about those tricks, feel free to send me a PM, I would really appreciate.

Cheers
HB

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 742389408
RAC: 871071

RE: A recent UK edition of

Quote:

A recent UK edition of PC Pro magazine (cover date January 2013, page 60) has an article and links for making a 64-node supercomputer running MPI: Steps to make a Raspberry Pi Supercomputer.

The same article covers installing BOINC on a solar-powered Pi: Searching for Aliens with a Raspberry Pi and the Sun (you'll gather that the sample project chosen is SETI, not Einstein - sorry about that).

But it's nice to see both projects covered in a mainstream consumer/newsstand magazine.

Yeah, saw that one. Very impressive, I like the idea of solar powered BOINC crunching!!

Also not mentioned in the article is Enigma@Home, one of the few projects that offer native Raspberry Pi support at the moment. I'm not sure tho whether they have work available at the moment.

Cheers
HB

Janus
Janus
Joined: 10 Nov 04
Posts: 27
Credit: 23862534
RAC: 21

RE: If you have any

Quote:
If you have any pointers to details about those tricks, feel free to send me a PM, I would really appreciate.


I'm afraid I don't remember where I got this from. Enabling the L2 cache seems to be a kernel level thing now, and as these tricks were picked up long ago I'm not even sure they work with the new version of the binary blobs. Anyways, try things out along the lines of this:
fbset -xres 16 -yres 16 -vxres 16 -vyres 16 -depth 16
/opt/vc/bin/tvservice -p
/opt/vc/bin/tvservice -o
I forgot how to control the polling rate for network and other I/O but it had a similar effect.
I'm guessing that the L2 cache lives on the GPU side of things, so double core_freq to 500 (up from 250 default) but keep any other GPU-related settings as integer ratios of that (they share a clock, so anything else than 1:2, 1:3 etc. won't make much sense).
If you clock both the ARM to 1GHz and the GPU core to >0.5GHz you may end up in a situation where the I/O to the sdcard starts getting spurious errors. Ease off a bit if that happens - or keep the fs on a usb drive

Quote:
FFTW version 3.3 with ARMv7 NEON support


Yeah, back then it was just a patch - what essentially ended up becoming FFTW 3.3.1-beta1.

Quote:
BRP4 needed ca 205 MB


That's pretty nice for the 512MB version at least - leaves a good deal of free space

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 742389408
RAC: 871071

RE: RE: BRP4 needed ca

Quote:

Quote:
BRP4 needed ca 205 MB

That's pretty nice for the 512MB version at least - leaves a good deal of free space

Yeah it looks like we can even reduce the memory footprint to under 128 MB, possibly even under 100MB in exchange for a modest performance loss.

Cheers
HB

MarkJ
MarkJ
Joined: 28 Feb 08
Posts: 437
Credit: 139002861
RAC: 0

I have an older B model with

I have an older B model with 256Mb. It's looking for some work so if its possible to get the Pi supported that would be great, although it sounds like I may need the 512Mb version.

Is it possible to get the smaller BRP4 work units? I guess that would need a separate work queue and extra plumbing on the back end to put them back into multiples of 8 but could be worthwhile.

The guys at Asteroids@home are looking at porting their app to run on the Pi. It took them quite some time to get their app ported to windows, but hopefully this will be easier as its a Linux app originally.

Please keep us posted with progress on the Einstein app.

Janus
Janus
Joined: 10 Nov 04
Posts: 27
Credit: 23862534
RAC: 21

Yeah the original B model

Yeah the original B model (and newly released A models) just doesn't cut it:

Quote:
16-Dec-2012 10:56:32 [Einstein@Home] Binary Radio Pulsar Search (Arecibo) needs 247.96 MB RAM but only 232.25 MB is available for use.

So close though, a measly 15.71MB missing ;)
Actually I think the requirement of 248MB may be wrong - the taskmanager seems to indicate something closer to 205MB, occasionally jumping up to around 215MB. This would actually fit on the small RPi with no changes. Are the project memory requirements for BRP4 set to 247.96MB for a reason or was it just 205MB+some? Would it be possible to set it to exactly 232.1MB or something similar to allow us to have fun with it on the RPi if the additonal 40MB or so is simply a safety margin?

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 742389408
RAC: 871071

Hi! I currently have a Pi

Hi!

I currently have a Pi running at the office that requires around 90MB of RAM during the main phase of the BRP4 computation, I still have to optimize the "start-up" phase, tho. But I'm quite confident that BRP4 can run below 100MB, so that it should run on the Model A and earlier model Bs as well.

And yes, current BRP4 workunits take far too long on the PI to be attractive, but a pool of unbundled WUs (1/8th the size of current tasks) should be ok, but as was mentioned already, this takes some additional plumbing.

Cheers
HB

telegd
telegd
Joined: 17 Apr 07
Posts: 91
Credit: 10212522
RAC: 0

I don't have anything

I don't have anything particular to contribute to this thread except to say thanks for describing your experiments. Very interesting!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.