What is Cell?(PS3)

siddharth2002

National Board President<br /><a href="showthread.
Joined
Oct 29, 2004
Location
Auckland, NZ
Online Cricket Games Owned
Ok guys. I just came across an amazing post on www.ps3portal.com explaining what exactly is "cell" the heart of coming PS3. And I thought it was too good to be not to post it here for our Planetcricket members.Below is an Link and exact copy of that post. Of course credit goes to "udontneed2know"-a registered member at www.ps3portal.com for this amazing information. Please note that its quite long reading but highly interesting and revealing and all you I.T. professionals especially will love it. :cheers Do post ur comments guys.
Link - http://www.ps3forums.com/viewtopic.php?t=7671

Post copy word by word -

PostPosted: Thu Oct 06, 2005 10:26 am
Suprisingly it seems even many PS3 fans just have absolutely no clue as to how the Cell processor actually works, and what it trully is. Considering how important the Cell processor is to the overall product of the PS3, I think I'll lay it out for you guys while I'm sitting back thinking about some algorithms on this stupid program I'm writing for someone.

First of all, to make it simpler I'll break down each piece of the Cell processor with a biological analogy, the human body being the best bet.

I'm sure everyone, atleast most everyone, knows that the Cell architecture is a 1PPE (Power Processing Element, IBM's design ) with 512MB of L2 Cache as well as 56k of L1 Cache. Then I'm sure everyone knows that it also showcases 8 SPE's ( Synergetic Processing Elements ) with 1 turned off and there to increase production yields. They did this so if a sheet comes off the line and after testing an SPE doesn't work, they can still utilize that product as long as the other 7 work. If all 8 end up working, they simply turn 1 SPE off. Each SPE incorporates a sizable 256KB of Local Storage (LS) SRAM. Each SPE has full access to the DMA controller ( Data Memory Access ) which enables it to move data to in from any of the other SPE's.

Going back to the human body analogy, lets look at it this way.

The PPE = The Brain
The SPE = The Muscles
The MFC ( Memory Flow Controller ) = The Heart
The EIB ( Element Interconnect Bus ) = The vascular system keeping the components fed with a digital lifeblood of data

That'll give you a generally better understanding of the various internal components of the Cell Processor. But really what does each of these components do? What special about the design? Why give a crap? Well, some of you probably won't give a crap and I suggest you stop wasting your time reading this. I'm just someone wasting time right now, and in the mood to talk technical babble Laughing

First off we have the PPE. What is it? The PPE is a 64-Bit In-Order, two-issue SuperScaler(superscalar allows more then one instruction pass per clock) Power core design that is optimized for the Cell processor, not a recycled core from another processor thats been built on this planet. Utilizing the AltiVec instruction set ( what IBM calls VMX, highly popular) the Power core supports simultaneous multithreading with up to 2 threads. So from putting all that together, you can see that the PPE itself allows for not only 2 instruction passes per clock, but also its a multithreading design that allows 2 threads to run simultaneously. Perhaps utilizing many dozens of instructions per clock through the multithreading design. IBM used the Power design for the PPE core because it allowed them to put into place their many billions of dollars of investment that they already put into the core, and it allowed for a strong base platform for the rest of the architecture. But they then heavily modified the Power core itself, and turned it into the pint sized little beauty that we call the PPE. Its a nice little thing, and no it is not the same PPE's you'll find in the Xbox360, even though you'll see them both say PPE. They are two completely different design elements.

For more in depth stuff about the PPE, it also has fine-grained multithreading with round-robin thread scheduling. WEEEEE I LOVE TECH STUFF HAHA! I bet alot of people are like " F you! F YOU TO HECK! "lol. And then the others are like " huh " .. and " wtf are you blabbing about ". Like I said, most will probably stop reading by now lol. The rest of you, hello how are you today. What does the above mean? Well, fine-grained processing is used in alot of CG type stuff, but it also allows for true multithreading during cache misses and such. Say for example both threads passing through the PPE are active and fine-grained. When one thread is not active or is stalled for whatever reason, the PPE will activate the 2nd thread and that thread will issue an instruction each cycle while the other thread figures out what has gone wrong, and why its purpose for living hasn't been learned yet. Its strange that instructions wonder about their meaning of life too isn't it? Indeed ...

Anyway sorry about that. Back to the subject at hand. So why is the PPE considered the brain? Pretty much because it does exactly what your own brain does. Your brain tells your muscles to move your arms, makes sure your heart keeps pumping, makes sure that gallon of blood in your body is moved throughout the whole of all your veins each second, etc. This is what the PPE does, but without all the guts and blood and all that other R rated material that your filthy brain likes to think about. Filthy brains, the FDA really needs to regulate whats inside your body .. all that blood .. all those guts and intestines. Nasty stuff ... kids dont need to be scarred mentally by all the nasties. Anyway, the PPE grabs instructions, and through the helpful advice of the programmer then filters all those instructions to the designated SPE's ( if designations have been made ) and the hungry little SPE's will then suck in all those instructions and processes and all that good stuff and begin chomping away at the problems at hand and processes them in less then 1 billionth of a millisecond, sending them through the rest of the processor ( which ill get into later ) and out into wherever that finished processes needs to go.

Pretty simple stuff right? HA! And you thought this was going to be a short explanation. Ha Ha Ha. HAHAHAHAHA!

Well, thats the brain of the Cell Architecture. The PPE. And to think, the PPE was the easiest thing to build with the Cell processor. Now we get into the muscle of Cell, the SPE's. Those amazing little creatures that at one point in time resembled space aliens that probed us poor humans for information about our deepest, darkest secrets ... and the addresses of all local hotties. Crafty aliens ...

So, what are SPE's you ask? Well, lemme tell you. Hehe, heres the fun part.

The SPE is called the Synergetic Processing Element but its been called other things as well. The Systematic Processing Element. The Synergetic Processing Unit (SPU, the other popular one ) The Algorithm Processing Unit. And various other things. Heck people have even simply called them Vector computers. I for one agree, that at its heart the SPE is a vector computer with its own CPU and SRAM. If you programmed in the 70's and 80's, you'll feel right at home .. except you'll be in the year 2005 when those vector computers are the size of skin cell and process at many thousand times the power that you remembered. Weird huh?

Some people have went as far as to say that with a few design tweaks, the Cell processor could provide for a very interesting network processor ( adding integrated packet parsing ).

Anyway, the SPE includes the SXU (Synergistic eXecution Unit) and 256KB of LS SRAM. What I said earlier. The trick is that SOFTWARE must manage the instructions that move in and out of the LS ( local storage if you forgot already, shame on you ) and all that is controlled by the MFC ( if you forgot, scroll up ).

Edit: You know what, I really hope theres not a character limit. Hold on lemme just send this and see .............

Ok back, looks like I'm good for now.

Now, back to the SPE's. Ever wonder why some people have been harping about those SN Systems compilers and all that middleware Sony and IBM and Nvidia are providing? Well its because the design of the Cell architecture, while incredibly efficient, fast and clean, is heavily software dependant. Alot of things usually done in hardware is now done in software on the Cell processor. Most will look at that and say " IBM went back in time! " and they'd be right. I'll talk about that more later though. I'll just say that, next month developers will universally breathe a long sigh of relief and all those PS3 programmers will be allowed back outside to purchase wigs for all that missing hair they pulled out building their own compilers for Cell.

Now, all 8 SPE's within the Cell are identical, no differences whatsoever in each of the 8. The SPE's can process both integer, and floating point numbers in the 8-bit, 16-bit, 32-bit variety .. variaty..why can't I spell that? Oh well. Spellcheckers are nice creations arent they. If you don't know the different between an integer and a floating point, then you need math lessons. Integer = 8. Floating point = 8.0. Difference being the decimal of course. The SPE's can also process both 32-bit (single precision) and 64-bit ( double precision ) data formats. A handy thing to have of course. To get into more of the meat of the SPE, here you go. Each SPE is a 4-way SIMD processor optimized for single precision (32-bit) floating-point but can, like I said above, perform 64-bit as well. The performance hit on performing double precision is around 5-6x. The other special thing about the SPE's, other then having local storage SRAM, is that they also have a very rich pipeline of 128 registers that are 128-bits wide and and are very low latency. The inclusion of data registers was a purposeful one and allows for the holding of more data values closer to the actual SIMD unit instead of purely accessing the LS all the time. So the SPE's actually have a couple of storage choices, either the registers or the LS. But of course, some things can't be stored in registers of course, and thats when you send those to the LS for storage.

Suprisingly the instruction set of the SPE is actually a superset of the vector processor unit instruction set of the Playstation 2's Emotion Engine. I guess Sony did do something right after all eh? Eh?! You with me felluhs! Utilizing this superset of the EE is of course purposeful. Sony wants to give developers atleast SOMETHING that they remember from the PS2. That slick little console they built so long ago and so many developers worked many thousands of man-hours to relearn heavy assembly and vector programming for. So, any long time PS2 developer will feel right at home when they access the SPE instruction set, with of course various optimizations and new instructions. One thing that needs to be noted is that the SPE is not a true multithreading design. There are several reasons IBM chose to do this. Instead of supporting multithreading, they instead slapped 128 128-bit wide registers and local storage so all the required data can be processed without the hint of a miss penalty. Thats right folks, the Cell design is a very penalty-less architecture. Whereas in the Xbox360 where penalties will reach into the hundreds of cycles per miss, the PS3's highest penalty will be around the 12 cycle area. That is why local storage and registers and all that ancient stuff was used. Because back then, performance efficiency was taken more seriously then simply ease of use. And of course, allowing multithreading on each SPE would have increased the die size by *8 Laughing So that also had alot to do with it. So while programming for an SPE, the programmer is assured that all resources for the task given is at hand and not being used up by another SPE. This gives alot of control and assurance to the programmer. And we love that. Love that lots. Me love you long time Sony, 15 dolla!

For even more programming information. Heres a few more inside tidbits. The floating point operation at present ( dont see it changing this late ) is geared for throughput of media and 3D Objects. That means, pretty much exactly like 3DNow SIMD extensions, that IEEE correctness is sacrificed for yet again, speed and efficiency. For such workloads, exact rounding modes and exceptions are largely unimportant. If you do think you'll need IEEE correctness, the double precision format will provide this for you. So really there is nothing to worry about, even though you'll recieve that performance hit by using double precision formats, even if they are in small bursts.

The SPU units are designed for aggregation( grouping of distinct data ) into an array of processing elements. The programming model for the SPU has not been predetermined, and the BPA can support both process pipelining and parallel processing. In a pipeline implementation, a transformation on a data set performs a specific task at each SPU and then passes the intermediate results to the next SPU. The advantage of this technique
is that the code in each SPU is usually quite small and the operations are easy to manage. The pipeline is also more predictable. This predictability is important to the design goals of the Cell processor and is one of the reasons the SPEs have local memory and not cache memory. Its also important to note that the memory of Cell supports the sharing of memory locations. Doing this allows for more concrete memory designs for each program and gives the programmers the ability to allocate different sizes of local storage to each SPE depending on their projected loads. Very impressive design method in my eyes.

In a nutshell lol, if thats possible I'm sure your asking. The SPE is a vector computing monster whos latency is extremely fast and cycle strikes are incredibly small. Considering the bandwidth present, which you'll soon see, you'll realize that bottlenecks within the PS3 will be a very uncommon appearance.

So there you have it, the PPE and the SPE's. The brains and muscle of the Cell architecture. But wait a second, while those 2 are incredibly important of course, you'll notice we still have the heart of the Cell and the vascular system to talk about. Thats where everything trully comes together, and thats where you begin noticing noones talks about them. Right now its all about the PPE's and SPE's, for good reason. But those 2 pretties would be nothing but a smoke-screen if it werent for what comes next. The heart of the PS3, the MFC.

What exactly is the MFC? Well that and the EIB are actually alot simpler and easier to explain. The MFC simply moves the data back and forth between the SPE's and the PPE. Simple right? Really it is. Whats really strong about this MFC though is that it provides more then 128 outstanding instructions to memory. That folks, is a crap-load. To keep the SPE initialization high requires the MFC to support many transaction flows simultaneously. The MFC also has its own memory management unit ( MMU ) that utilizes a subset of the Power cores MMU. The MFC, just like the Power core, utilizes a 64-bit virtual address space. What is virtual address space? Its address space that really doesn't exist, but we tell them it does to trick them. HAR HAR HAR! No, really virtual address space is really a Windows NT esque feature similar to virtual memory.

What the MFC does is it transfers data to and from the SPE and the memory (LS) using simple get and put commands ( assembly language stuff ). Each command can have an instruction modifier ( an "S" prefix ) that instructs the SPE to begin processing instructions from the next program counter register after the data transfer is complete. Pretty much, the commands given to the MFC keeps everything moving seamlessly. SPE's processes fast, so the MFC must be able to keep up effectively and actually stay ahead of the SPE's. More importantly, the MFC can take data from the SPE and load it directly onto the PPE's L2 cache. This is one way to get critical data to the PPE very quickly. So, just like a heart keeps the blood pumping in your body. The MFC keeps the data pumping through the Cell processor effectively and efficiently.

Ahhh good afternoon, I'm back in the saddle again and ready to write about one of the final main points about the Cell design. The Vascular system of the Cell architecture, which we like to call the EIB. The Element Interconnect Bus. Let me just remind people that the MFC and EIB do not process information, just in case anything above may have confused you into thinking it does in some way. The EIB and MFC keeps all the instructions and processes and all the various programming aspects of the Cell processor moving along at a brisk pace, and a very customizable pace I must add. The Cell processor as a whole is incredibly customizable. I'm quite suprised that such a CPU design has even been built, it makes us programmers feel like kids again where for the first time in decades we actually have almost FULL control over what we want our system to do. Its really quite extraordinary for Sony and IBM to provide such a processor. I'm still, even a year later, in quite alot of shock to see this processor in the world.

Anyway, what is the Element Interconnect Bus and why is it so important? Well most of you have heard the word bandwidth before and seen the numbers. That is the overall bandwidth that the EIB provides the Cell Architecture, and altogether at its peak it is a simply flabbergasting 384GB/s bandwidth, but in reality only about 2/3rds of this will likely be used in heavy practice. I've heard along the grapevine that in certain vectorized coding situations utilizing streamed processing ( using all 7 SPE's to stream incredibly large instruction sets all at once, getting it done 7x faster ) you will actually see the EIB have the ability to reach this peak of GB passage along its slim little railroad pipeline.

The inner meat of the EIB is quite extraordinary because unlike most Bus's this thing is customizable. Yet again, another aspect of the Cell architecture that can be customized by the programmers to fit the workloads that their programs will provide. The EIB consists of 4 x 16 byte " rings " which run at half the CPU clock and can allow up to a whopping 3 simultaneous transfers of data sets. So with simple math you'll see that the EIB is capable of passing 96Bytes of data PER CYCLE so with further math you reach the above number of 384GB/s of overall inner system bandwidth. That means the EIB is capable of handling an incredible 384 Gigabytes of information through its pipelines per second. That folks, is pretty incredible. I'd be amazed if that number is ever reached per second, I'm sure it will but Shocked

Now going further into detail. While each ring of the EIB is 16bytes running at half the systems clock speed, you can reconfigure the rings in software to run at 8bytes but at full system speed. During certain workloads this option may be the best to do, since an EIB running at 3.2Ghz is indeed faster then one running at 1.6Ghz, even if its only 8Bytes compared to 16bytes per ring per clock. The genious design decision of the EIB is this.

Like I said the EIB consists of virtual rings, like a race-track that runs a circle around the EIB. Four of these rings are unidirections, meaning its a one way road. But two of these rings run counter the direction of the other 2. This design decision makes the worst case scenario maximum latency of the EIB only half the distance of the ring and not the complete ring. This ring network has the ability to handle 3 simultaneous transfers as I said above, and these transfers are deleted once they hit the designation node. So unlike many bus's where a missed transfer will cycle through the system many times over, causing what we refer to as the lost puppy effect ( lost puppy runs around the neighborhood and if never fed dies, but takes awhile ). With the design of the EIB the way it is, in a ringed design with 2 rings supporting opposite directions thus cutting off missed transfers halfway, the latency of any mistakes is once again incredibly low within the Cell architecture. This is something you'll see over, and over and over about the Cell. The latency is absolutely incredibly low, and the cycle losses are an absolute minimum compared to todays popular CPU's who if asked wouldn't even know how to spell Local Storage lol.

Now, ever since last year Cell and Linux have been brothers. The actual tool chain for the Cell processor was built on PowerPC Linux systems. The programming of the SPE is of course based on C/C++ and Assembly. Also Cell will have the capability of supporting such languages as Fortran and many others. It is also nice to note that currently IBM, Toshiba and Sony, all 3 companies are at work manufactoring Cells for SDK purposes in over 8 different factories. This is why you heard awhile back that only 500 kits had been sent out total. This is because the design of the Cell architecture is so unique that no virtual system could be created for the architecture, and it would have only been a waste of developers time and resources. Instead they waited until they had the first true prototypes of the Cell out and were tested and ready for actual use. The first people to actually use Cell in its final form are lucky developers of the Playstation 3. And of course internal testers of IBM and Sony.

So there you have it. IBM and Sony's amazing processor that some are calling the Vector Revolution Processor. The processor that will bring back heavy vector processing which used to be the most popular and highly acclaimed processors of the 70's and 80's. Why was it put aside? Because like I said, for some reason chipmakers went after ease of use, and not efficiency. Cell is a complex design, highly customizable in almost every single internal aspect, but with a programming model that is not new to the development world. You wil find some programmers putting Cell through its paces in virtually a weeks time, while others will need to think a little bit about exactly what they want to do before actually getting to work. Programming for Cell will be like writing a book a little bit. The first thing you want to do is actually outline what you expect from your program and begin customizing your Cell tools to make Cell work at its most effective with that type of program. This is why in another thread I stated Cell brings about the redesign of programming, and allows for atleast 5,000 different programming models. Virtually its all up to the programmer. The programmer is the Cells master, and the Cell will respond to virtually every single request by the programmer.

Last edited by udontneed2know on Thu Oct 06, 2005 7:20 pm; edited 5 times in total


What is Vector Processing?
One of the things that really oughta be explained is the difference between vector and integer math done in a processor.

Lots of different types of data can be vectorized. But what is a vector you say? A vector is simply a set of relational data.

For instance, lets say that we have numbers, 30, 56, 134, 241, 479, 738. Let's say that all these numbers are units of cents, and that the data is delivered to the user in units of dollars. In order to convert these to dollars, we must perform the same operation on all of them -- dividing by 100. So what do we do? We vectorize! Coders may recognize a vector as what is considered an "array" in higher level languages like PHP or JAVA. So our data becomes [30, 56, 134, 241, 479, 738] / 100. This is where the real power of the CELL comes in. The procs in the X360 need one operation per number to perform this, totalling 6 operations, (each of which might be several cycles). An SPE can perform the operation on the entire vector in one operation: [V] / 100 no matter how large the vector.

So you can save hundreds, even thousands of clock cycles by using vector math. In other words, the CELL doesn't have to be hundreds of times as fast, because its hundreds of times as efficient.

So what kind of data is vectorizeable? All sorts. Fortunately for us, the most vectorizable types of data are: streaming data, audio and video. Just the sorts of things a video game console deals with. But now you also may see why some say that coding for the PS3 is so much harder. In addition to handling more of the system operation, the coder also must look for places that data can be vectorized. It makes you appreciate the art that coding has become.
 
Last edited:

puddleduck

Chairman of Selectors
Joined
Aug 10, 2005
Location
Uk
Online Cricket Games Owned
Nice post mate, I heard somewhere that the Ps3 "Cell" Chip would essentially be able tp power anything ranging from your Ps3 to your microwave.
 

siddharth2002

National Board President<br /><a href="showthread.
Joined
Oct 29, 2004
Location
Auckland, NZ
Online Cricket Games Owned
puddleduck said:
Nice post mate, I heard somewhere that the Ps3 "Cell" Chip would essentially be able tp power anything ranging from your Ps3 to your microwave.
That is very much correct and Sony have officially called this. They want to use Cell or variants of Cell in so many things especially in mobile devices.
 

Cowburn199

International Coach
Joined
Jul 19, 2005
Location
England
Online Cricket Games Owned
siddharth2002 said:
That is very much correct and Sony have officially called this. They want to use Cell or variants of Cell in so many things especially in mobile devices.
Thats one thing i dont like about Sony. They dont focus just on gaming. Nintendo = Gaming. Microsoft = Gaming
 

themuel1

International Coach
Joined
Apr 9, 2004
Online Cricket Games Owned
Cowburn199 said:
Thats one thing i dont like about Sony. They dont focus just on gaming. Nintendo = Gaming. Microsoft = Gaming

Microsoft also do operating systems (windows xp, 2000,98...) and a range of other products (microsoft office), so they don't focus on just gaming.
 

gold639

International Coach
Joined
Sep 30, 2004
Online Cricket Games Owned
I'm still undecided on whether to buy the 360 or the ps3......or to get myself a top of the range graphics card!
 

Cowburn199

International Coach
Joined
Jul 19, 2005
Location
England
Online Cricket Games Owned
gold639 said:
I'm still undecided on whether to buy the 360 or the ps3......or to get myself a top of the range graphics card!
Top of the range Graphics card !
 

gold639

International Coach
Joined
Sep 30, 2004
Online Cricket Games Owned
Yeah that's what I was thinking, but both the consoles look awesome...
 

kmk1284

Club Cricketer
Joined
May 30, 2002
Location
outer heaven
Online Cricket Games Owned
Cowburn199 said:
Thats one thing i dont like about Sony. They dont focus just on gaming. Nintendo = Gaming. Microsoft = Gaming

microsoft = gaming is not fair ,look what they did to the consoles ,they are releasing two different version of the 360 and maybe a hddvd addon later, now if microsoft release their halo3 on the hddvd format ,where does it leave the casual gamer ???, also by the time of xbox-4 there might be 3-4 types of xboxes with different configs and perfomanceand then we will have to call it the pc(personal console), also sony is a kinda big bully in the console world ,hardly any sony 1st party titles are good.
 

siddharth2002

National Board President<br /><a href="showthread.
Joined
Oct 29, 2004
Location
Auckland, NZ
Online Cricket Games Owned
kmk1284 said:
microsoft = gaming is not fair ,look what they did to the consoles ,they are releasing two different version of the 360 and maybe a hddvd addon later, now if microsoft release their halo3 on the hddvd format ,where does it leave the casual gamer ???, also by the time of xbox-4 there might be 3-4 types of xboxes with different configs and perfomanceand then we will have to call it the pc(personal console), also sony is a kinda big bully in the console world ,hardly any sony 1st party titles are good.
No that is not correct. Although I don't own any playstation console but it has got the best titles to date. Of course noone can compete with PC games. And PS3 is going to get more additional games. Watch out in 2006. U will be amazed.
 

Users who are viewing this thread

Top