Sunaina Pai, TSEC, Contributing Editor                                                      MMX Techonolgy
Feature Focus

Intel MMX for Multimedia PC's

The volume and complexity of data processed by today's PC is increasing exponentially, placing incredible demands on microprocessor performance. The potential of the Internet, games, interactive video, 3D graphics, animation, virtual reality - all of which demand ever increasing performance - motivated Intel to develop MMX technology.

MMX technology improves the performance, the performance of current and future graphics and communications applications while maintaining compatibility with the existing architecture (IA) software base of applications and operating systems. MMX technology is an extension of IA and the most significant enhancement since the Intel 386 processor, which in 1985 extended the architecture to 32 bits. MMX technology includes 57 new instructions and data types to achieve increases levels of performance on the host CPU by exploiting the parallelism inherent in many of the algorithms in these applications. MMX can deliver 50%-100% performance gains for multimedia and communications applications run on the same processor with MMX technology.

MMX Technology Concepts

We observe that MMX technology potential target applications share several characteristics:

Small native data types (e.g. 8-bit pixel, 16-bit audio samples). Compute intensive recurring operations performed on these data types.

A lot of inherent parallelism.
These characteristics pointed the MMX technology definition team in the direction of a Single-Instruction-Multiple - Data (SIMD) architecture in which one instruction performs the same operation on multiple data elements in parallel. This parallel operation on relatively small data elements (8 and 16 bits) is the fundamental factor behind the MMX Technology performance boost. Up until now, when processing 8 or 16 bit data, the existing 32 or 64 bit CPU bandwidth and processing resources on Intel were underutilized. Only the low order 8 - 16 bits were manipulated leaving the remaining bits "unemployed". MMX technology of processing of independent small data elements together enables full utilization of the wide processing capabilities of the CPU. The 64 bits of packed elements were also a packed idea, since the Pentium and P6 generation of processors use 64-bit wide data buses, as opposed to the 32-bit wide buses on previous generations of processors. It was also imperative that processor with MMX technology retain backward compatibility with existing software, including operating systems and applications. We also had to ensure co-existence of existing applications and new applications using MMX technology. Advanced operating systems enable multiple programs to seemingly run in parallel by time-sharing CPU among them. This timesharing is known as multitasking. New applications using MMX instructions should be able to multitask with any other applications. The main technique for full compatibility of MMX was "hiding" it inside the existing floating point state and registers. An operating system does not need to know whether MMX technology is present, since it is hidden inside the floating-point state. Applications that check for presence of MMX technology, if it is built into the processor, they use the new instructions.

MMX Technology Data Structures

Many multimedia algorithms execute the same instructions on many pieces of data in a large data set. Standard processors process only one piece of data with each instruction. MMX technology process several pieces of data with each data with each instruction - a simple type of parallelism that provides a big performance boost for many multimedia algorithms.

As a result, MMX technology defines new data types, which are 64 bits in total size and are composed of independent smaller size data elements. Thus we call them "Packed Data Types".
Four data types are defined in MMX technology.
  • Packed byte: 8 words packed into one 64-bit quantity.
  • Packed word: Four words packed into one 64-bit quantity.
  • Packed Double Word: Two double words packed into one 64-bit quantity.
  • Quad Word: One 64-bit quantity.

A rich set of MMX instructions are defined to perform the parallel operations on multiple data elements packed into the new 64-bit data types (8*8 bits, 4*16 bits or 2*32 bits fixed point data types). MMX technology extends the basic instructions into SIMD versions. These instructions include add, subtract, multiply, compare and shift instructions.

Data Dependent Computations

Multimedia algorithms usually exhibit data-independent control-flow, meaning each operation can execute without needing to know the results of previous operations. These algorithms are the straightest for the new technology to optimize. But some algorithms need to know the results of previous operations before proceeding. Such algorithms need to make use of logical operations to fit into MMX technology. An example is overlaying a sprite over a graphic.

MMX technology has a parallel "compare" instruction that generates a bit mask as its result. This instruction enables the data dependant calculations to be executed on several data elements in parallel, and eliminates the need for any branch instructions.

MMX Technology, Connected PC's and the Internet

The Internet has bought the PC user visual impact and interactivity previously available on the so-called multimedia PC. With a standard web browser, users can download pictures and sound, seeing them locally on their PC's. In addition users can manipulate video and 3D worlds, interacting with them both locally and across the Internet. MMX technology provides new enhanced performance for some of these connected applications. Establishing a voice phone call over the Internet (called Web-Telephony) requires the user's voice to be digitized and then compressed into as few bits as possible. The compressed voice data can then be transmitted over the Internet without needing much bandwidth. On the receiving end of the call, the voice data must be decompressed to produce the voice signal. Voice stream compression and decompression uses filtering and transform techniques both of which are multiply intensive. MMX technology can perform four multiples in parallel, providing significant performance boost for audio compression. Using a Pentium processor with MMX technology, a software application can implement the phone function using only 20% of the cycle of the main processor. Video conferencing has been adapted to standard phone lines and Internet standards and is a step up in complexity over the voice call because video images are sent with voice information. The key to video is good compression, because available bandwidth is limited. The compression method estimating the change from frame to frame, rather than transmitting every picture frame can be speeded up by using MMX technology for computing absolute differences as described earlier.

Conclusion

MMX technology implements a new high performance architectural technique that enhances the performance of IA microprocessors for multimedia and communications applications. These applications are compute- intensive, process a lot of data, use small data types, and provide lots of opportunities for parallelism. MMX technology brings more power to these algorithms by adding data types and instructions that can process data in parallel. This parallel processing is done while maintaining full compatibility with the installed base of operating systems and IT applications.


Back main page Firewalls

The IEEE Bombay Section Student Newsletter


Web Edition of newsletter by Rohit Mordani, TSEC