Though it didn’t attract a ton of attention at the time, back in 2013 ARM announced the ARMv8-R architecture.
Now just under 3 years later, ARM is announcing their first ARMv8-R CPU design this evening with the Cortex-R52. An upgrade of sorts to ARM’s existing Cortex-R5, the R52 is the company’s first implementation of ARMv8-R. R52 makes specific use of many of the new features enabled by the architecture, while improving performance at the same time. ARM is pitching the new CPU core at markets that need a safety-critical CPU – a market that the Cortex-R series has been in for a while – where the deterministic nature of the CPU’s execution model is critical to ensuring quick and accurate execution.
While the focus on today’s CPU design announcement is on functionality and utility over microarchitecture, ARM has revealed a bit about how the Cortex-R52 is organized under the hood. The microarchitecture is a direct evolution of the previous Cortex-R5. This means we’re looking at a dual-issue in-order execution pipeline, with a pipeline length of 8 stages. Broadly speaking, this description is very similar to that of the better-known Cortex-A7/A53 cores, which implies that this is a real-time optimized version of the basic elements in that design.
As the Cortex-R series is focused on determinism and real-time responsiveness over total performance, ARM doesn’t heavily promote these cores on the basis of performance. But at least within the Cortex-R family, they are talking about a performance increase of upwards of 35% in common CPU benchmarks. More important for this market than throughput however is responsiveness: for the R52, ARM has done some specific work to improve interrupt entry and context switching performance, doubling the former and achieving a staggering 14-fold increase on the latter.
The big deal here of course
is the deterministic nature of the CPU. The entire microarchitecture is
optimized to avoid variable time, non-deterministic operations, which is why
it’s an in-order processor to begin with. This design extends to how memory is
managed as well, with ARM avoiding a virtual memory system and its associated
TLB translation-misses in favor of a model they call the Protected System
Memory Architecture (PSMA), which is used in conjunction with an MPU to handle
memory operations without the translation.
On the safety side of
matters, the R52 has a few different error-resiliency features to ensure
accuracy. Multi-core lock step returns for this design, allowing two R52 cores
to execute the same task in parallel for redundancy. And on the memory side of
matters, ECC is offered across both the memory busses and the memory itself, in
order to avoid random bitflips
Meanwhile in terms of new
functionality for hardware developers, as part of ARMv8-R, Cortex-R52
implements support for hardware virtualization. Like virtually everything else
in R52, this is deterministic as well, with the hypervisor working with the MPU
to offer each guest OS its own section of the physical memory space. According
to ARM this is a particularly important advancement, as previous means of
separating tasks on real-time CPUs were non-deterministic, which is an obvious
problem for the target market.
The
significance of virtualization in a real-time processor is that it allows for
multiple tasks to be executed on the R52 without interfering with each other.
In large, complex devices (e.g. cars), this allows for fewer processors within
the device, as these tasks can be consolidated onto a smaller number of
processors. At the same time, the rigid separation between the tasks means that
it’s possible to run both safety-critical and non-critical (but still
real-time) tasks on an R52 together, knowing that the latter will not interrupt
the safety-critical tasks. For cars and other devices where there is stringent
safety certification, this is especially useful as it means that other tasks can
be added (via their own guest OS) without invalidating the certifications of
the safety-critical tasks.
This
is also why ARM’s earlier context switching and interrupt entry improvements
are so important. With a hypervisor now in play and multiple tasks executing on
a single processor, the vastly improved ability to switch between tasks is
critical for allowing multi-tasking without a major performance hit from
context switching overhead.
ARM is particular
interested in the Advanced Driver Assistance Systems (ADAS) market, where the
Cortex-R is part of a full system of ARM IP. A full ADAS setup from start to
end would utilize all three processor types – M, R, and A – with the Cortex-R
handling the real-time decision making and executing on those decisions, while
Cortex-A would be used to handle sensor perception/interpretation, and Cortex-M
would be in many of the individual sensors.
Wrapping things up, as with
most other ARM IP announcements, the announcement of the Cortex-R52 is setting
the stage for future products. ARM isn’t talking about specific customers at
this time, but they already have a number of companies who have licensed
ARMv8-R and will be in need of a CPU design to go with it. To that end, we
should be seeing Cortex-R52 start appearing under the hood of various devices
in the coming years.