Abstract
It is well known that single-rail, bundled-delay circuits provide good area efficiency but it can be difficult to match them with appropriate delay models. Conversely delay insensitive circuits such as those employing dual-rail codes are larger but it is easier to ensure timing correctness. In terms of speed, bundled-delay circuits need conservative timing but dual-rail circuits can require an appreciable completion detection overhead.
This paper compares designs in both of these styles and also a delay-insensitive 1-of-4 coded circuit using the practical example of an ARM Thumb instruction decode,: The results show that, through the application of careful optimizations, the 1-of-4 circuits out-performed single-rail circuits and reduced the power compared to dual-rail circuits.