Parallella, Raspberry Pi, FPGA & All That Stuff

Mike Hewson

Moderator

Joined: 1 Dec 05

Posts: 6588

Credit: 315013245

RAC: 304775

Yeah, I think there's quite a

1 Oct 2013 22:43:58 UTC

Message 111733

(moderation:

)

Yeah, I think there's quite a back story here. On 21/08/2013 update

Quote:

After 5 years of having to constantly â€œdo more with lessâ€ it finally looks like our ship has come in! I canâ€™t say more than that for now, but I will say that the stronger Adapteva is financially the more likely it is that the Parallella platform will be a long term success!

but from 27/09/2013 forum post

Quote:

Sorry for the lack of communication!! We have been in a pretty delicate position (nothing related to the board or the chips). Hopefully some day I can tell everyone the whole horrific story...

FWIW my guess is that they've been occupied at a business, not technical, level with a potential big backer or contract or somesuch that didn't go as well as hoped .....

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 577270218

RAC: 193530

RE: OK, it's here ( my

3 Oct 2013 10:47:48 UTC

Message 111734 in response to message 111730

(moderation:

)

Quote:

OK, it's here ( my coloring ) :

Quote:
On every clock cycle, the following operations can occur:
- 64 bits of instructions can be fetched from memory to the program sequencer.
- 64 bits of data can be passed between the local memory and the CPUâ€™s register file.
- 64 bits can be written into the local memory from the network interface.
- 64 bits can be transferred from the local memory to the network using the local DMA

Oh, I see. But being able to perform one such action per cycle only gives you the throughput. It doesn't tell you how long it will take to finish these actions, i.e. the latency. From you other post:

Quote:
Every router in the mesh is connected to the north, east, west, south, and to a mesh node. Write transactions move through the network, with a latency of 1.5 clock cycles per routing hop. A transaction traversing from the left edge to right edge of a 64- core chip would thus take 12 clock cycles.

That's what I was getting at: the latency to finish that write depends on the distance between the chip and can be much more than 5 clocks (still fast, though!). I hope we didn't just talk about different "how long"s all the time: "How long does it take the sender to send the write?" versus "How long does it take for the write to arrive?". Actually.. your initial statement was "I can write into the memory (or was it register?) of another core in 5 cycles". So it's actually the total latency to finish the write.

Quote:
In theory at least, one might 'unroll' a loop to perform the same essential calculations on several cores, with each core doing what might have been done for a single loop iteration ie. accounting for different values of whatever loop variable(s) would have otherwise been updated per round of the loop.

That's what I'm occasionally using in MATLAB with a parfor loop. The overhead there is significant, though, in that individual loops have to exceed 10's or better 100's of ms of runtime for this to provide any benefit. Which greatly limits its applicability.. so I'm a bit jealous about what you could do at a low level. On the other hand I'm not all that keen on spending the time to hand-tweak such details ;)

Quote:
A subtle bit here is we are using RISC processors which by definition will/may/could have an expanded code memory footprint for a given task(s) c/w their CISC cousins ( but not necessarily ).

Considering Parallela is starting from scratch here and that the individual cores are fairly simple, I'd actually expect their instruction footprint to be less than x86. Especially if 16 bit instructions can sometimes be used.

MrS

Scanning for our furry friends since Jan 2002

Rod

Joined: 3 Jan 06

Posts: 4396

Credit: 811266

RAC: 0

RE: RE: Sorry for the

3 Oct 2013 13:54:25 UTC

Message 111735 in response to message 111733

(moderation:

)

Quote:

Quote:
Sorry for the lack of communication!! We have been in a pretty delicate position (nothing related to the board or the chips). Hopefully some day I can tell everyone the whole horrific story...
my guess is that they've been occupied at a business, not technical, level with a potential big backer or contract or somesuch that didn't go as well as .

I suspect a challenge on their intellectual property.

There are some who can live without wild things and some who cannot. - Aldo Leopold

MarkJ

Joined: 28 Feb 08

Posts: 437

Credit: 139002861

RAC: 0

RE: RE: RE: Sorry for

3 Oct 2013 22:12:51 UTC

Message 111736 in response to message 111735

(moderation:

)

Quote:

Quote:

Quote:
Sorry for the lack of communication!! We have been in a pretty delicate position (nothing related to the board or the chips). Hopefully some day I can tell everyone the whole horrific story...
my guess is that they've been occupied at a business, not technical, level with a potential big backer or contract or somesuch that didn't go as well as .

I suspect a challenge on their intellectual property.

Well they have a new logo but I don't think that was it.

BOINC blog

Mike Hewson

Moderator

Joined: 1 Dec 05

Posts: 6588

Credit: 315013245

RAC: 304775

@Rod + @MarkJ : Intellectual

3 Oct 2013 23:56:49 UTC

Message 111737

(moderation:

)

@Rod + @MarkJ : Intellectual property challenge, yeah there's a thought. It'd be the sort of thing a big player might do to squash a start-up, but I speculate. IMHO ( FWIW ) I reckon their design is brilliant so certainly well worth a patent, which they have.

@MrS :

Quote:

I hope we didn't just talk about different "how long"s all the time ...

Oooops, I think there may have been a tad of that. Sorry :-O :-)
Yup, throughput of one per cycle with latency of five cycles.

Quote:

On the other hand I'm not all that keen on spending the time to hand-tweak such details ;)

One early task for me, when the card arrives, is to create a wide set of assembler macros suitably parameterised. Their implementation of the superscalar aspect is intriguing I think, and with a bit of clever ordering the CPU can really hand off alot of stuff simultaneously. Here the dependencies within the pipeline can be mitigated by attention to the parallel scheduling rules and cycle separations ie. avoid stalls.

Quote:

Especially if 16 bit instructions can sometimes be used.

Yup, using the general registers 0 through 7 with short immediates is now on my list of features to ruthlessly exploit at assembler level. Within 16 bits you would, at most, get room for a signed immediate of 3 bits ( simm3 ) ie. -4 to + 3

Cheers, Mike.

( edit ) More info from Andreas : here and here .... :-)

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 577270218

RAC: 193530

RE: On the other hand I'm

6 Oct 2013 18:56:35 UTC

Message 111738 in response to message 111737

(moderation:

)

Quote:

On the other hand I'm not all that keen on spending the time to hand-tweak such details ;)

... and I'm glad that there are others who are keen to do so :D
(in a clever way, without wasting teir time, of course)

MrS

Scanning for our furry friends since Jan 2002

Mike Hewson

Moderator

Joined: 1 Dec 05

Posts: 6588

Credit: 315013245

RAC: 304775

Well I must say : I am

8 Oct 2013 0:46:37 UTC

Message 111739

(moderation:

)

Well I must say : I am getting itchy fingers for the alleged imminent Parallella delivery ! :-)

Anyway, while waiting I have produced these musings upon possible approaches to the Parallella for FFT, keeping the horrible mathematical mud in the appendices. :-)

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 577270218

RAC: 193530

... just be careful with your

9 Oct 2013 20:41:23 UTC

Message 111740 in response to message 111739

(moderation:

)

... just be careful with your pets and such ;)

MrS

Scanning for our furry friends since Jan 2002

Mike Hewson

Moderator

Joined: 1 Dec 05

Posts: 6588

Credit: 315013245

RAC: 304775

RE: ... just be careful

10 Oct 2013 0:28:21 UTC

Message 111741 in response to message 111740

(moderation:

)

Quote:

... just be careful with your pets and such ;)

MrS

I do so love XKCD, it fills the hole left by Gary Larson when he retired. :-)

Cheers, Mike.

( edit ) Subtitle/hover is 'That cat has some serious periodic components' ....

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 577270218

RAC: 193530

He already had over 300 posts

12 Oct 2013 14:25:51 UTC

Message 111742 in response to message 111741

(moderation:

)

He already had over 300 posts when I discovered XKCD some time ago.. went through all of them :)
I even made myself an A0 poster with some old favorites, it's still happily hanging at the bathroom door. Too bad most guests have trouble getting the (sligthly) nerdy jokes in english!

MrS

Scanning for our furry friends since Jan 2002

Parallella, Raspberry Pi, FPGA & All That Stuff

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner