I now have prepared and uploaded PDF slides for Chapter 11: Floating Point Instructions.  This might be enough assembly language for many people.  With the instructions discussed so far you could write fairly nice programs using arrays, functions, integer math, floating point math and simple I/O instructions.
This chapter also introduces the SSE instructions using 4 packed floats per XMM register or 2 packed doubles per XMM register.  This can be the start of some efficient programming.  Being able to issue commands to perform 4 floating point instructions can require some re-engineering of your algorithm, but if you do this carefully you can unroll loops and keep the SIMD pipeline fairly full.  This could yield perhaps 4 float results per machine cycle which is roughly 10-15 GFLOPs single precision.  Some of the latest CPUs can yield more than 1 SIMD instruction completion per cycle.
It will become more interesting with the later chapters on optimization.
I have also located a few more typos.  Check it all out at http://seyfarth.tv/asm
 
 
No comments:
Post a Comment