One of the most interesting things about those old, cartridge based systems, is that by putting a game cartridge in the slot you are in fact plugging a PCB device directly to your console. A modern equivalent of this would be hot swapping devices that run on PCI slots, like video and audio cards on the fly, whenever you want to run a piece of software on your computer. By hot swapping cartridges, users are able to put extra co-processors to their consoles, because they are often built in inside the actual game cartridge. Most users never know this, but the manufacturers of games definitely do – by building in co-processors and digital signal processors in their game cartridges, they are able to get an advantage over the competitors titles by making theirs run a little bit better. The most often used co-processors for the Super Nintendo Entertainment System were Super FX, that was used in Super Mario World 2 to help with rotating sprites and rendering polygon shapes, and the DSP-1 co-processor used in games such as Mario Kart for three dimensional mathematic calculations.
Its often the case that these additional processors are so detached from the main processing unit that it’s actually possible to implement them using hardware emulation, or high-level emulation to be exacts. This isn’t the case just for the co-processors used in the Super Nintendo. The case is similar when it comest to Nintendo 64 video code emulation, and some other areas as well.
It’s important not to think about this in terms of individual hardware instructions. You need to look at the big picture, and focus on trying to replicate behaviors of entire subsystems together. By employing this approach, it’s possible to emulate specific behaviors and operations with no overhead at all. But this approach also has it’s drawbacks, the main one being that by working this way, it’s difficult to retain the timing information that is required to properly execute specific individual instructions. But what’s worst is that the emulation is still far from perfect, since minor flaws and bugs that make each system unique will be lost in the translation.
That was the high-level emulation approach, or HLE. The low-level approach to emulation OS to treat those co-processors just like you would treat the main processing unit, and to execute every instruction every time. This results in the accurate replication of the timings, and the games emulated this way don’t run faster than they were intended to. But this puts much more strain on the hardware that’s running the emulator. When using low-level emulation, a game like Super Mario Kart will run almost a third slower than a game like Super Mario World, just because the former uses co-processors in it’s original iteration. When emulating through a high-level approach, both games run just as smoothly.
This obviously shows that low-level emulation is an intensive process, and it’s no wonder that most commercial and free emulators today use high-level emulation to make things run smoothly across a variety of modern hardware.