"In addition, about 350 high-performance, specialized graphics cards (GPUs) have been added in parallel with about 2,000 existing cards for specialized applications. These additions increase Atlas' theoretical peak computing performance to more than 2 PFLOP/s."
That doesn't sound much. I have a £150 GPU that does a theoretical 8 Tflops. They only have 250 times the power of one of my GPUs, yet they say they have 2350 GPUs.
The UPS beats mine though, I only have 1.5kW. But with deep cycle leisure batteries it can last for an eternity. None of the sealed lead acid crap that comes with it. My neighbour once asked me why all my lights were on during a powercut :-)
Pah! "Each cable is rated for 10 Gb/s." I have a 40Gb/s cable between my house and garage. I can't find switches and network cards that go that fast though :-(
They seem to have neater wiring than Summit though:
If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.
"In addition, about 350 high-performance, specialized graphics cards (GPUs) have been added in parallel with about 2,000 existing cards for specialized applications. These additions increase Atlas' theoretical peak computing performance to more than 2 PFLOP/s."
That doesn't sound much. I have a £150 GPU that does a theoretical 8 Tflops. They only have 250 times the power of one of my GPUs, yet they say they have 2350 GPUs.
The UPS beats mine though, I only have 1.5kW. But with deep cycle leisure batteries it can last for an eternity. None of the sealed lead acid crap that comes with it. My neighbour once asked me why all my lights were on during a powercut :-)
Pah! "Each cable is rated for 10 Gb/s." I have a 40Gb/s cable between my house and garage. I can't find switches and network cards that go that fast though :-(
They seem to have neater wiring than Summit though:
and this is the problem. this is not a valid way to measure FLOPS at all. this increase to ~13 PFLOPS is largely from the shifting of systems from O3AS gravitational wave tasks (which awarded much less credit) when GW ran out, over to the only available GPU work, FGRPB1G, which awards ~10x more credit per unit time. so it "looks" like FLOPS increased just because people started earning much more credit with the same devices.
nope. according to the server status page, there are ~13,000 hosts with either an Nvidia or AMD GPU (last 7 days). the vast majority of those are probably slower, low end devices. and there will be some percentage that aren't even crunching or crunching other projects. there are ~1.6 million WUs that still need processed, which equates to ~3.2 million tasks that need to be completed, not even accounting for errors and invalids resulting in resends.
In a real sense we can always exceed the computing power of E@H. Or Atlas for that matter, or <*insert you favourite supercomputer here*>. The multidimensional parameter spaces for these searches can be explored in different ways to look for new signals & regularities. Nowadays there is such a mass of information available from the various detection devices, so there always a wealth of data. What is discarded as noise for one search template may constitute a detection for another. That's because all manner of radiation traverses the universe and our local space. Suppose for a given investigation the search sensitivity goes like the square root of the signal integration time, then to double your chances of finding something you need to quadruple the time. This is quite typical : you can always 'listen' for longer to access the 'quieter' sources. There will always be something to do here at E@H. ;-)
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
The days remaining estimate is derived from the 5-day average WU completed per day metric. That 5-day average has been dropping since it looks like the “Atlas Condor Jobs” super computer ...
I haven't looked at the stats pages for quite a while so the mention of "Atlas Condor Jobs" surprised me since (in years past) the only Atlas entry was for "Atlas AEI Hannover".
So I took a look at the top participants list and can see the Condor entry but none for AEI Hannover. The current value of total credit for Condor is way too low for everything that Atlas had accumulated so it can't just be a rename. From memory, Atlas AEI Hannover had a total credit of many tens of billions.
Realising that the table is constructed in RAC order rather than Total Credit, I clicked on the Total heading to get a reordering. It actually starts from the bottom up so I had to click twice :-). That caused Atlas AEI Hannover to appear as #2, so it's still in the list but no longer usually visible because its RAC is less than 7M these days.
That was quite a blast from the past as several other 'high producers' also appeared as well. In particular I remember Gavin who had some high producing machines and was active in the Forums some years ago. His current RAC is only a shadow of what it was but he must still be around since it's still significant.
It's quite a reminder that there are people who have contributed a lot in the past but whose efforts are no longer normally seen in the default view. Perhaps the default view should be based on total credit to direct attention to past significant contributions.
Condor doesn't get a look-in (yet) if the ordering is based on Total :-).
From memory, Atlas AEI Hannover had a total credit of many tens of billions.
Now that you mention that : I wonder if this total can possibly be all simply due to 'burn-ins' of new nodes ? There's not that many nodes. If correct this suggests to me that in addition to the role of pre- and post-processing E@H work, maybe it is scheduled to do some of the actual work units when it has nothing else/better to do ( using nodes of any age ). I'm pretty sure that once built these supercomputers are kept busy close to 24/7 @ 100%.
Just a thought.
{ Of course, I know who #1 is. But there is a computer at the bottom of the total credit list owned by 'ballen'. I guess that 2005 era computer, probably the very first enrolled, could not now hack the pace. ;-) }
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
Atlas runs BOINC in two different ways: There is a single, low (CPU) priority BOINC client running on every node of Atlas, announcing as many CPU cores as the node has. These clients get and run only CPU jobs. The associated account on E@H is "Atlas AEI Hannover", and it is there basically since 2008.
To make use of the growing number of GPUs on Atlas we recently (~May 2022) developed another scheme of submitting E@H (GPU) tasks as low-priority Condor jobs (minimal priority to not interfere with 'real' people using the GPUs on Atlas). The associated E@H account is "Atlas Condor Jobs". There is basically one host(id) for every GPU on Atlas (~2100). The main reason for setting this up was to help finish the "O3AS1" GW analysis. As this has ended now, the "automatic submission" has been turned off, and the RAC of this account should drop noticeable again.
Will any of the new work types (or old GPU types be converted to use CPU) be for CPUs as well, can BRP7 be done on a CPU if not why not? Time to run I suppose will be a limiting factor, memory wont be.
The current Arecibo large work units can take up to 13 hours if a number are run together but that is not a problem (getting less credit than gamma ray #5 is).
"In addition, about 350
)
"In addition, about 350 high-performance, specialized graphics cards (GPUs) have been added in parallel with about 2,000 existing cards for specialized applications. These additions increase Atlas' theoretical peak computing performance to more than 2 PFLOP/s."
That doesn't sound much. I have a £150 GPU that does a theoretical 8 Tflops. They only have 250 times the power of one of my GPUs, yet they say they have 2350 GPUs.
The UPS beats mine though, I only have 1.5kW. But with deep cycle leisure batteries it can last for an eternity. None of the sealed lead acid crap that comes with it. My neighbour once asked me why all my lights were on during a powercut :-)
Pah! "Each cable is rated for 10 Gb/s." I have a 40Gb/s cable between my house and garage. I can't find switches and network cards that go that fast though :-(
They seem to have neater wiring than Summit though:
If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.
Yup, the technology curve for
)
Yup, the technology curve for GPUs especially is such that by the time it's installed it is well out of date.
Our server status page puts E@H at 13179.3 TFLOPS ~ 13 PFLOPS ( estimated from collective RAC ).
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
Peter Hucker of the Scottish
)
All our GPU compute power is not enough?
Mike Hewson wrote: (
)
and this is the problem. this is not a valid way to measure FLOPS at all. this increase to ~13 PFLOPS is largely from the shifting of systems from O3AS gravitational wave tasks (which awarded much less credit) when GW ran out, over to the only available GPU work, FGRPB1G, which awards ~10x more credit per unit time. so it "looks" like FLOPS increased just because people started earning much more credit with the same devices.
_________________________________________________________________________
Filipe wrote: All our GPU
)
nope. according to the server status page, there are ~13,000 hosts with either an Nvidia or AMD GPU (last 7 days). the vast majority of those are probably slower, low end devices. and there will be some percentage that aren't even crunching or crunching other projects. there are ~1.6 million WUs that still need processed, which equates to ~3.2 million tasks that need to be completed, not even accounting for errors and invalids resulting in resends.
_________________________________________________________________________
In a real sense we can always
)
In a real sense we can always exceed the computing power of E@H. Or Atlas for that matter, or <*insert you favourite supercomputer here*>. The multidimensional parameter spaces for these searches can be explored in different ways to look for new signals & regularities. Nowadays there is such a mass of information available from the various detection devices, so there always a wealth of data. What is discarded as noise for one search template may constitute a detection for another. That's because all manner of radiation traverses the universe and our local space. Suppose for a given investigation the search sensitivity goes like the square root of the signal integration time, then to double your chances of finding something you need to quadruple the time. This is quite typical : you can always 'listen' for longer to access the 'quieter' sources. There will always be something to do here at E@H. ;-)
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
Ian&Steve C. wrote:The days
)
I haven't looked at the stats pages for quite a while so the mention of "Atlas Condor Jobs" surprised me since (in years past) the only Atlas entry was for "Atlas AEI Hannover".
So I took a look at the top participants list and can see the Condor entry but none for AEI Hannover. The current value of total credit for Condor is way too low for everything that Atlas had accumulated so it can't just be a rename. From memory, Atlas AEI Hannover had a total credit of many tens of billions.
Realising that the table is constructed in RAC order rather than Total Credit, I clicked on the Total heading to get a reordering. It actually starts from the bottom up so I had to click twice :-). That caused Atlas AEI Hannover to appear as #2, so it's still in the list but no longer usually visible because its RAC is less than 7M these days.
That was quite a blast from the past as several other 'high producers' also appeared as well. In particular I remember Gavin who had some high producing machines and was active in the Forums some years ago. His current RAC is only a shadow of what it was but he must still be around since it's still significant.
It's quite a reminder that there are people who have contributed a lot in the past but whose efforts are no longer normally seen in the default view. Perhaps the default view should be based on total credit to direct attention to past significant contributions.
Condor doesn't get a look-in (yet) if the ordering is based on Total :-).
Cheers,
Gary.
Gary Roberts wrote:From
)
Now that you mention that : I wonder if this total can possibly be all simply due to 'burn-ins' of new nodes ? There's not that many nodes. If correct this suggests to me that in addition to the role of pre- and post-processing E@H work, maybe it is scheduled to do some of the actual work units when it has nothing else/better to do ( using nodes of any age ). I'm pretty sure that once built these supercomputers are kept busy close to 24/7 @ 100%.
Just a thought.
{ Of course, I know who #1 is. But there is a computer at the bottom of the total credit list owned by 'ballen'. I guess that 2005 era computer, probably the very first enrolled, could not now hack the pace. ;-) }
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
Atlas run BOINC in two
)
Atlas runs BOINC in two different ways: There is a single, low (CPU) priority BOINC client running on every node of Atlas, announcing as many CPU cores as the node has. These clients get and run only CPU jobs. The associated account on E@H is "Atlas AEI Hannover", and it is there basically since 2008.
To make use of the growing number of GPUs on Atlas we recently (~May 2022) developed another scheme of submitting E@H (GPU) tasks as low-priority Condor jobs (minimal priority to not interfere with 'real' people using the GPUs on Atlas). The associated E@H account is "Atlas Condor Jobs". There is basically one host(id) for every GPU on Atlas (~2100). The main reason for setting this up was to help finish the "O3AS1" GW analysis. As this has ended now, the "automatic submission" has been turned off, and the RAC of this account should drop noticeable again.
BM
Will any of the new work
)
Will any of the new work types (or old GPU types be converted to use CPU) be for CPUs as well, can BRP7 be done on a CPU if not why not? Time to run I suppose will be a limiting factor, memory wont be.
The current Arecibo large work units can take up to 13 hours if a number are run together but that is not a problem (getting less credit than gamma ray #5 is).
I would just like to run some new CPU work.
Conan