ARM has introduced a new processor of cortex M series, and this time it is Cortex M7. The ARM M7 processor is the most recent and highest performance member of the energy-efficient Cortex-M processor family. ARM quotes “The versatility and new memory features of the Cortex-M7 enable more powerful, smarter and reliable microcontrollers that can be used across a multitude of embedded applications”
The primary focus of the Cortex-M7 is improved performance. ARM’s goal was to elevate the M series performance to a level previously unseen, while maintaining the M series’ signature such as small die size and tiny power consumption as well as the excellent responsiveness and ease-of-use of the ARMv7-M architecture. There are at least two reasons ARM focused on performance for the M7 processor. First, they want to further drive a wedge between traditional 8- and 16-bit microcontrollers and provide ARM a further differentiated market position; second, the M7 will help support the Internet of Things and wearable device markets. Focusing on enhanced DSP capabilities, the Cortex M7 is more suited to audio and visual sensor hub processing than any previous M series design.

The Cortex M7 has twice the DSP power of the M4 by executing twice as many instructions simultaneously, and it also helps that the M7 can operate at a higher clock frequency than the M4. It’s backed by the Keil CMSIS DSP library, and includes a single and double precision FPU.
It was developed to provide a low-cost platform that meets the needs of MCU implementation, with a reduced pin count and low-power consumption, while delivering outstanding computational performance and low interrupt latency. You can also use two M7 cores in lock step running the same code – one following two cycles behind the other – so that glitches can be detected by external electronics if the two CPUs sudden behave slightly differently.


The optional Floating Point Unit (FPU) provides:
Automated stacking of floating-point state is delayed until the ISR attempts to execute a floating-point instruction. This reduces the latency to enter the ISR and removes floating-point context save for ISRs that do not use floating-point.
It provides Instructions for single-precision data-processing operations. And optional instructions for double-precision data-processing operations.
FPU also provides Combined multiply and Accumulate instructions for increased precision. And easy hardware support for conversion, addition, subtraction, multiplication with optional accumulate, division, and square-root.
The NVIC is closely integrated with the core to achieve low-latency interrupt processing.
NVIC have 1 to 240 configurable external interrupts. This is configured at implementation.
It also has Configurable levels of interrupt priority from 8 to 256. Configured at implementation. You can also do dynamic reprioritization of interrupts.
NVIC features have support for tail-chaining and late arrival of interrupts. This enables back-to-back interrupt processing without the overhead of state saving and restoration between interrupts.


The memory protection unit (MPU) is used to manage the CPU accesses to memory to prevent one task to accidentally corrupt the memory or resources used by any other active task. This memory area is organized into up to 8 protected areas that can in turn be divided up into 8 subareas. The protection area sizes are between 32 bytes and the whole 4 gigabytes of addressable memory.
Tightly coupled memory (TCM) is a technology which ARM’s partners can use to extend the effective caching of a single M7 processor and has only been seen in previous A and R series designs. In use, it can have the performance of a cache but, unlike cache, its contents are directly controlled by the developer. Developers can place critical code and data inside TCM that can be deterministically accessed with high performance in routines such as interrupt service requests. The M7 supports up to 16 MB of tightly coupled memory.
The AHB-Lite peripheral (AHBP) interface provides access suitable for low latency system peripherals. It provides support for unaligned memory accesses, write buffer for buffering of write data, and exclusive access transfers for multiprocessor systems.


The ARM Cortex-M7 features a six-stage, dual-issue superscalar pipeline with single- and double-precision floating point units which can execute two instruction at a time. Whereas the Cortex-M4 can execute just one instruction at one time. This is where most of the speed-up comes from. The M7 can run at a higher clock frequency than M4 and together these give on average two-times uplift in DSP performance for M7 over M4.
By doubling the performance, ARM calculates appliances and gadgets using the M7 can more quickly perform the complex mathematics which required to finely control motor movement in robots; analyse microphone, touchscreen, and other sensors data.