The ISA Debate

Frequently asked questions about Instruction Set Architectures

Oct 27, 2024

Disclaimer: Opinions shared in this, and all my posts are mine, and mine alone. They do not reflect the views of my employer(s), and are not investment advice.

If you have even remotely followed the processor industry, you would have heard about "ARM vs x86". A lot of arguments made around this topic are a flawed, and through this post I tried to understand this in more detail.

What is an ISA?

In simple words, Instruction Set Architecture (ISA) is the set of rules around the vocabulary that software uses to control hardware.

If all programs were written using binary instructions (0s and 1s that the hardware understands), then we would not need the concept of an ISA. However, to make programming easier, it is done in languages that are easy for humans to understand - called High Level Languages (HLL). Using a compiler, these HLL programs are converted into the vocabulary that the hardware understands - this new program is called as assembly program, and each instruction is called an assembly instruction. Ultimately, each word in this vocabulary maps to some binary values. The ISA defines the number and type of assembly instructions supported, and also how each instruction maps to binary.

From this point forward, all uses of the word "instruction" correspond to "assembly instruction types" - like ADD, SUB, etc.

What are the different types of ISA?

As per the textbook definition, there are two types of ISA

Complex Instruction Set Computer (CISC)
Reduced Instruction Set Computer (RISC)

In simple words, RISC ISA only supports simple instructions - the minimum that is needed to execute all programs. CISC ISA also supports complex instructions which are usually a combination of 2 or more simple instructions. Among the two most popular ISA - x86 is considered CISC, while ARM is considered RISC.

How did this classification emerge?

In around 1980s, David Patterson, and other computer architecture researchers came up with the idea of a RISC ISA - this was actually when the name CISC emerged as well (to name the incumbent ISA). Basically, RISC has some key principles that CISC did not:

All instructions have a throughput of one cycle
No variable length instructions
Do not support arithmetic directly on memory (use registers for arithmetic)

Essentially, the pitch for RISC was to make the compiler work harder to simplify instructions for the hardware (Hence the saying - "RISC: Relegate Important Stuff to Compiler").

It was also observed that a lot of complex instructions supported in CISC architectures were rarely produced by the compiler. So it did not make sense to support the complex hardware structure needed for these rarely used instructions.

Finally, the practitioners of RISC also argued that although RISC increases the number of instructions, it greatly reduces the time taken to execute each instruction be designing better hardware, which will improve the overall processor performance.

Are all the initial advantages of RISC still valid?

In some of the initial proposal papers I found, the proponents of RISC talk about limited transistor availability as one of the main motivations for RISC - since simpler hardware takes fewer transistors, that is a better route. However, with Moore's law, over time this was not a consideration anymore - especially for high performance applications. Today, if you can afford to get better performance by spending more area, it is worth doing that. For example, ARM, supports vector instructions in their ISA now, and high-performance cores can support these instructions better using SIMD implementation (For example, multiple parallel ALUs).

Another early concept of RISC with the idea of "delayed branches" - these were branch instructions that specified that the branch should happen after the next few instructions were executed (the compiler moved instructions in such a way that the overall function still remains the same). This allowed fewer data hazards, so the hazard handling mechanism in hardware could remain simple. But this remained a short-lived ISA choice, compiler usually could not add many delayed branches. Although it was used to reduce hazard handling logic, the handling of delayed branches resulted in complex control logic. The approach the finally helped to reduced hazards was hardware branch predictors.

Why did everyone go towards CISC in the first place?

During the early days of processors, memory was slow and expensive. So, if you had a smaller set of assembly instructions that the processors need to execute, it needed lesser memory space. The idea of instruction caching was also not used widely, so the time spent reading the assembly instructions from memory was also lower if there were fewer instructions. Hence, complex instruction types were combined together.

Do x86 instructions still have a throughput of more than 1 cycle?

From the Pentium Pro generation of processors (in the 1990s), Intel started to use the concept of "micro-ops". The idea was the break down complex instructions into simpler, single cycle instructions on the fly. Each of these instructions is called a micro-op. This approach actually allowed Intel to continue to dominate with a CISC ISA. However, this is still not as optimal as RISC, because generating there micro-ops needs additional hardware and consumes more power (generating micro-ops on variable length instructions is very complex, making the situation worse).

If the ARM ISA is much better than x86, why can't everyone just use the ARM ISA?

In order to change to change the ISA, the following has to happen:

All software written in high level languages need to be recompiled using a compiler for the new ISA - this is a logistical nightmare, but is still the easiest step in the process
All usages of low-level language constructs need to be replaced
- Operating systems usually have a lot of such scenarios - hence supporting Windows on ARM took a very long time.
- Memory ordering across different threads is ISA specific - this is a big reason why PC games, which support multithreading optimized for x86 still cannot run efficiently on ARM CPUs
It needs collaboration between computer architecture companies and software companies - this will only happen if there is a good enough incentive.

Why did Apple succeed when they changed from x86 to ARM in their M1 CPU?

Apple owns the full computing stack - from software to transistor. This allowed them to get everyone on board to make a very significant architecture change. No other company enjoys this benefit.
Apple's Operating System has two unique advantages
- It is a Unix based operating system. Unlike Windows, Unix based operating systems are fully written in C, so they can be recompiled into a different ISA more easily.
- Apple already supported IOS using the ARM ISA on the iPhone, which gives them more experience using the ISA. Also, IOS was built using the x86 based MacOS, so they already had some experience with changing between ISA.
For applications that still do not have native ARM support, an emulator is used to convert x86 instructions to ARM instructions on the fly in software. Apple developed a really effective emulator -Apple Rossetta
- Emulators are still not as effective for performance critical applications like gaming. However, gaming on Macs is not very common.

Having said all this, the move from x86 to ARM was still a phenomenal achievement by Apple.

Is RISC meant for low power applications and CISC for high performance applications?

This is a common misconception. The reason for this is the way processors evolved. During the PC era, Intel dominated the processor market, which made x86, a CISC ISA, the dominant platform. However, during the smartphone era, ARM chips were the dominant ones as they produced chips that consumed lesser power.

But, in both eras, we can see counterexamples. For example, In the 1970s-1990s, IBM constantly designed the fastest supercomputers. The Power ISA that IBM used, is generally classified as a RISC architecture now. More recently, Fujitsu Fugaku, the fastest supercomputer in 2020, is based on the ARM ISA (also RISC). Recently, hyperscalars like Google, Amazon, and Microsoft, and upstarts like Ampere, have shown that ARM CPUs are effective for high performance applications in datacenters.

However, there is some merit to this assumption. It is easier to make a low power general purpose processor using RISC ISA compared to CISC ISA, because complex hardware has more transistors - this means more switching between 0 and 1 states, and also more leakage current.

But for very specific applications, defining as ISA with complex instructions can enable a hardware design that can be extremely power efficient. A classic example is the Google TPU, which uses one instruction to perform multi-step computations (like matrix multiplication) in a power efficient manner. (I must add that traditionally CISC/RISC applies only to general purpose CPUs - but the ISA used by application specific processors fit well with the textbook definition of CISC.)

How many instruction types do we really need in an ISA?

There are a few hobby projects that have attempted to build an ISA with just one instruction. These are called OISCs (One Instruction Set Computer). An example is called the SUBLEQ CPU - only uses the SUBLEQ instruction - which is subtract + load + branch. This CPU does not have registers, and the ISA does not need bits to indicate opcodes (since there is only one!). All arithmetic and branch operations can be performed using just this one instruction. Although this ISA would be sub-par with real workloads, it just shows that we can have all kinds of ISA as long as the hardware is built to support it

Did Intel ever try to change their ISA?

Mutiple times.

In 1986, Intel, to their credit, launched a new ISA called Itanium - which was inspired by RISC principles. However, the ISA never really caught on, primarily due to backward compatibility issues.

Later, in 1997, Intel won a lawsuit against DEC, which forced DEC to sell their StrongARM processor IP to Intel. This also gave Intel an ARMv6 license. Intel used their new ARM IP to develop their own high performance ARM processor called xScale (Intel i960). However, they decided to sell this division in 2006 to Marvel.

But for the various reasons covered above, changing an ISA as significant as x86 is not an easy decision.

A little about RISC-V:

The same group that started the RISC revolution also pioneered an open-source ISA called RISC-V (there were earlier versions, but this was the one that attracted real-life applications). The big selling point of RISC-V is that it is an open ISA - anyone can build a chip based on this ISA, without paying a licensing fee. So far, only embedded applications like microcontrollers have successfully adopted this architecture (there are some RISC-V Linux PCs, but not widely deployed). Like all open-source projects, RISC-V has both advantages and disadvantages. The public ecosystem allows more contributors, which could unlock solutions that everyone can used (and potentially, make the CPU design process faster). However, due to its open nature, they allow multiple non-ratified extension to be supported - so every company can have their own version of RISC-V. This could eventually lead to a very fragmented ISA - so every RISC-V processor may not work with all the RISC-V compilers. Its still very early days, but this is the first major open-source attempt of the hardware industry - so it will be interesting to see where it ends up.

Some closing thoughts:

Although RISC has some advantages over CISC, I think its value in deciding the fate of an ISA is overplayed - 80% of CPU instructions are executed using just 6 opcodes common to both RISC and CISC - LOAD, STORE, ADD, SUB, CMP, BRANCH.

The reality is that a lot of x86 code was developed during the early PC era and is not well optimized. Most ARM programs are newer, usually have no assembly, and use better compiler - which together results in much better performance.

Tomorrow, if the workloads change significantly, then a different ISA might produce better results. The ISA alone cannot influence the fate of a CPU. The major bottleneck for computing today is predictability:

at a macro level, can we predict the kind of workloads we will see in the future
at a micro level, instruction and data predictability (hazard prediction, data caching, etc)

Groups that are able to excel at both these aspects will have the better processor, irrespective of the ISA.

References:

Chip Insights

Discussion about this post

Ready for more?