# **Engineer-to-Engineer Note**

Technical notes on using Analog Devices products and development tools Visit our Web resources http://www.analog.com/ee-notes and http://www.analog.com/processors or e-mail processor.support@analog.com or processor.tools.support@analog.com for technical support.

# Using the ADSP-CM41x MATH Unit for Clarke and Park Transforms

Contributed by Punarva Katte

Rev 2 - April 2, 2018

## Introduction

The ADSP-CM41x family of mixed-signal control processors provides an on-chip MATH accelerator unit, which can be used to offload most of the common transcendental functions such as  $e^x$ , sin(x), cos(x). atan2(y/x), etc., from the Cortex-M4F core. The accelerator is tightly coupled to the Cortex-M4F core and is within the core clock domain. The unit is operated with a simple and flexible store-load mechanism by storing operands to registers and reading results from other registers.

Clarke and Park transformations are matrices of transformation to convert the current/voltage system of any ac-machine from one base to another. The Clarke transform converts a three-phase system into a two-phase system in a stationary frame. The Park transform converts a two-phase system from a stationary frame to a rotating frame. By changing the reference frame, it is possible to considerably simplify the complexity of the mathematical machine model. These techniques are invaluable tools in the digital control of acmachines.

The purpose of this EE-note is to introduce users to the MATH accelerator unit and techniques that may be used to reduce the computation time of mathematical calculations, such as the Clarke and Park transforms.

# **MATH Accelerator Unit**

The MATH unit provides accelerated functions such as reciprocal, square root, trigonometric functions, exponential functions, and their inverses. It also provides accelerated functions to convert between rectangular and polar coordinates. Most operations by this tightly-coupled accelerator complete in a deterministic number of core clock cycles, faster than the Cortex-M4F core could accomplish the same task. <u>Table 1</u> provides the cycles taken to execute each of the functions using the MATH unit. The two columns, *Full Domain* and *Normal Domain*, correspond to different ranges of input to the function. Refer to the *ADSP-CM41x Hardware Reference Manual*<sup>[1]</sup> for more details.

The MATH unit provides an easy-to-use function calculator for general programming operations. Its operands, results, and functions adhere to the *IEEE 754-2008 Single-Precision Floating-Point Arithmetic Standard*<sup>[2]</sup>. In general, results returned are accurate to within a standard relative error of 23.5 bits compared to the infinitely precise mathematical result.

Copyright 2018, Analog Devices, Inc. All rights reserved. Analog Devices assumes no responsibility for customer product design or the use or application of customers' products or for any infringements of patents or rights of others which may result from Analog Devices assistance. All trademarks and logos are property of their respective holders. Information furnished by Analog Devices applications and development tools engineers is believed to be accurate and reliable, however no responsibility is assumed by Analog Devices regarding technical accuracy and topicality of the content provided in Analog Devices Engineer-to-Engineer Notes.



| Function                                | Full Domain                                                   | Normal Domain          |  |  |
|-----------------------------------------|---------------------------------------------------------------|------------------------|--|--|
| adi_recipf(x) or $1/x$                  | 9 cycles                                                      | 9 cycles               |  |  |
| adi_sqrtf(x) or $\sqrt{x}$              | 9 cycles                                                      | 9 cycles               |  |  |
| $adi_expf(x)$ or $e^x$                  | 9 cycles                                                      | 9 cycles               |  |  |
| adi_exp2f(x) or $2^x$                   | 8 cycles                                                      | 8 cycles               |  |  |
| adi_log2f(x)                            | 10 cycles                                                     | 10 cycles              |  |  |
| adi_lnf(x)                              | 11 cycles                                                     | 11 cycles              |  |  |
| adi_sinf(x)                             | $x = [-\infty, +\infty]$ : 14 cycles                          | x = (-8,+8): 9 cycles  |  |  |
| adi_cosf(x)                             | $x = [-\infty, +\infty]$ : 14 cycles                          | x = (-8,+8): 9 cycles  |  |  |
| adi_tanf(x)                             | $x = [-\infty, +\infty]$ : 20 cycles                          | x = (-8,+8): 13 cycles |  |  |
| adi_asinf(x)                            | x  ≤0.5: 11 cycles                                            |                        |  |  |
|                                         | $0.5 <  \mathbf{x}  \le 0.75$ : 12 cycles                     |                        |  |  |
|                                         | $0.75 <  x  \le 1$ : 15 cycles                                |                        |  |  |
| adi_acosf(x)                            | x  ≤0.5: 12 cycles                                            |                        |  |  |
|                                         | $0.5 <  \mathbf{x}  \le 0.75$ : 13 cycles                     |                        |  |  |
|                                         | $0.75 <  x  \le 1$ : 14 cycles (and 15 cycles for negative x) |                        |  |  |
| adi_atanf(x)                            | x  ≤0.00325: 8 cycles                                         |                        |  |  |
|                                         | x  >0.00325: 20 cycles                                        |                        |  |  |
| adi_atan2f (x,y)                        | $X: [-\infty, +\infty]$                                       |                        |  |  |
|                                         | y: $[-\infty, +\infty]$ : 22 cycles                           |                        |  |  |
| adi_hypotf(x,y) or $\sqrt{(x^2 + y^2)}$ | x  ≤0.00325: 10 cycles                                        |                        |  |  |
|                                         | x  >0.00325: 22 cycles                                        |                        |  |  |
| adi_rtopf(x,y)                          | $x: [-\infty, +\infty]$                                       |                        |  |  |
|                                         | y: $[-\infty, +\infty]$ : 33 cycles                           |                        |  |  |
| adi_ptorf(r,a)                          | r:[0,+∞]                                                      | r:[0,+∞]               |  |  |
|                                         | a: $(-\infty, +\infty)$ : 20 cycles                           | a:(-8, +8): 15 cycles  |  |  |

Table 1. Core Clock Cycles taken by MATH Unit Operations

Note:

- 1. adi\_tanf requires an additional 8 cycles if the input operand (after normalizing by  $2\pi$ ) falls in the range  $\left(\frac{\pi}{4}, \frac{3\pi}{4}\right)$  or  $\left(-\frac{3\pi}{4}, -\frac{\pi}{4}\right)$
- 2. Cycle counts do not include latencies associated with loading and storing from/to the MATH accelerator registers. With code optimization, it is possible to achieve a MMR latency of:
  - a. 4-5 cycles, for single-operand functions
  - b. 6 cycles, for double-operand, single-output functions like adi\_atan2f and adi\_hypotf
  - c. 7 cycles, for double-operand, double-output functions like adi\_rtopf and adi\_ptorf

The MATH unit is operated using stores and loads for operands and results. All hardware synchronization is handled automatically. Refer to the *Math Programming Model* in the *ADSP-CM41x Mixed-Signal Control Processor Hardware Reference Manual*<sup>[1]</sup> for a guide to the optimal usage of the MATH functions using C or assembly.



## **FPMark**<sup>TM</sup>

Similar to Coremark, Dhrystone, Whetstone, etc., FPMark<sup>[3]</sup> is an EEMBC benchmark used to evaluate embedded processors or processing units. Unlike other benchmarks, it is used to evaluate the floating-point computation capability of a processor. The FPMark suite consists of algorithms like Fast Fourier Transform, Horner's Method, Linear Algebra, ArcTan, Neural Net, Black Scholes, Enhanced Livermore Loops, LU Decomposition, and Ray Tracer.

Both the Cortex-M4F core and the ADSP-CM41x MATH unit can perform floating-point operations. Hence, FPMark is a good benchmark to compare the performance between the two. <u>Table 2</u> shows the performance improvement (in %) when using the MATH unit as compared to the Cortex-M4F core, for each algorithm.

| FPMark Algorithm         | Performance<br>Improvement |  |  |
|--------------------------|----------------------------|--|--|
| ArcTan                   | None                       |  |  |
| Black Scholes            | 24.68%                     |  |  |
| Horner's Method          | None                       |  |  |
| Linear Algebra           | None                       |  |  |
| Enhanced Livermore Loops | 15.44%                     |  |  |
| Neural Net               | 22.08%                     |  |  |

Table 2. Performance Comparison between Cortex-M4F Core and ADSP-CM41x MATH Unit

# **Clarke and Park Transforms**

A significant breakthrough in the analysis of three-phase ac machines was the development of the *Reference Frame Theory*<sup>[4]</sup>. These techniques are invaluable for analysis, simulation and digital control (like current, torque and flux) of AC machines. Over the years, many different reference frames have been proposed for the analysis of ac machines, with the most commonly used being the *Stationary Reference Frame* and the *Rotor Reference Frame*.

## Clarke Transform (Three-Phase to Two-Phase)

Three-phase ac machines are conventionally modeled using phase variable notation, though it is possible to transform the system to an equivalent two-phase representation. The transformation from three-phase to two-phase quantities is written in matrix form as Equation 1:

$$\begin{bmatrix} i_{\alpha}(t) \\ i_{\beta}(t) \end{bmatrix} = \frac{2}{3} \begin{bmatrix} 1 & \cos(\gamma) & \cos(2\gamma) \\ 0 & \sin(\gamma) & \sin(2\gamma) \end{bmatrix} \begin{bmatrix} i_{A}(t) \\ i_{B}(t) \\ i_{C}(t) \end{bmatrix}$$

- $\gamma$  is the separation between axes of the three-phase machine, which is conventionally  $\frac{2\pi}{2}$
- $i_A$ ,  $i_B$  and  $i_C$  are three-phase stator currents
- $i_{\alpha}$  and  $i_{\beta}$  are two-phase stator currents



Note that the transformation is equally valid for the voltages and flux linkages.

Substituting  $\gamma = \frac{2\pi}{3}$ , this becomes Equation 2:

$$i_{\alpha} = \frac{1}{3}(2i_A - i_B - i_C)$$
  
 $i_{\beta} = \frac{1}{\sqrt{3}}(i_B - i_C)$ 

In a balanced three-phase ac-machine,  $i_A + i_B + i_C = 0$ , which simplifies to Equation 3:

$$i_{\alpha} = i_{A}$$
$$i_{\beta} = \frac{1}{\sqrt{3}}(i_{A} + 2i_{B})$$

#### Park/Inverse Park Transform (Vector Rotation)

In the analysis of electrical machines, it is generally necessary to adopt a common reference frame for both the rotor and the stator. For this reason, a second transformation, known as a *vector rotation*, is formulated that rotates space vector quantities through a known angle. The transformation can be written in matrix form as Equation 4:

$$\begin{bmatrix} i_d \\ i_q \end{bmatrix} = \begin{bmatrix} \cos(\theta) & \sin(\theta) \\ -\sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} i_\alpha \\ i_\beta \end{bmatrix}$$

- $\theta$  is the angle of rotor from stator
- $i_d$  and  $i_q$  are direct and quadrature axis components of the rotated current space vector, respectively

This transformation is referred to as the Park Transformation.

Similarly, the transformation used to rotate from rotor frame to stator frame is the *Inverse Park Transformation*, and the matrix form of this transform is shown in Equation 5:

$$\begin{bmatrix} i_{\alpha} \\ i_{\beta} \end{bmatrix} = \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} i_{d} \\ i_{q} \end{bmatrix}$$

The elimination of position dependency from the machine electrical variables is the main advantage of vector rotation.

Figure 1 illustrates the Clarke and Park Transforms between different Reference Frames.





Figure 1. Current Space Vectors Illustrating Clarke-Park Transforms

## **Transform Implementation on ADSP-CM41x Devices**

Equation 2 (Clarke Transform), Equation 3 (Balanced Clarke Transform), Equation 4 (Park Transform), and Equation 5 (Inverse Park Transform) are implemented on the ADSP-CM41x device using simple C code <sup>[6]</sup> as a part of one of four implementations:

- I. the math.h library, which runs on the Cortex-M4F core
- II. the CMSIS library <sup>[5]</sup>, which runs on the Cortex-M4F core
- III. the MATH Unit Accelerator
- IV. the MATH Unit Accelerator operations interleaved with Cortex-M4F core

All four are run with code in SRAM or flash memory, and the number of core cycles required are measured. <u>Table 3</u> shows the performance of each function averaged over 128 samples of input data with each setting.

| Motor Control Kernels |             | Clarke<br>Transform |        | Clarke Transform<br>(Balanced) |        | Park Transform |         | Inverse Park<br>Transform |         |
|-----------------------|-------------|---------------------|--------|--------------------------------|--------|----------------|---------|---------------------------|---------|
|                       |             | SRAM                | Flash  | SRAM                           | Flash  | SRAM           | Flash   | SRAM                      | Flash   |
| Cortex-M4<br>Core c   |             | 21.078              | 21.078 | 13.055                         | 13.055 | 200.547        | 238.484 | 201.547                   | 238.484 |
| Cortex-M4<br>Core c   | . ,         | NA                  | NA     | 11.813                         | 11.813 | 153.266        | 161.117 | 153.266                   | 161.117 |
| MATH                  | Core cycles | NA                  | NA     | NA                             | NA     | 35.055         | 35.070  | 35.055                    | 35.070  |
| Accelerator           | ns          |                     |        |                                |        | 146.061        | 146.126 | 146.061                   | 146.126 |
| MATH +                | Core cycles | NA                  | NA     | NA                             | NA     | 32.063         | 32.086  | 32.063                    | 32.086  |
| Cortex-M4F            | ns          |                     |        |                                |        | 133.594        | 133.691 | 133.594                   | 133.691 |

Table 3. Performance of Motor Control Functions



The results in <u>Table 3</u> were obtained using a core clock of 240 MHz and a system clock of 96 MHz with random values chosen for the input currents and theta.



Along with measuring the performance, the code also verifies the accuracy by comparing the output currents with current implementation and standard implementation (with same set of inputs).

## Techniques for Optimal Use of the MATH Unit

This section describes some optimization techniques for getting the best performance out of the MATH unit.

#### adi\_math.h

When developing a C/C++-based project using the MATH unit functions, include the adi\_math.h header file that is included in the ADSP-CM419F EZ-KIT® Board Support Package. This file defines the MATH unit operations as inline functions, and including it will directly replace the corresponding math.h library functions so that the code need not be changed when moving across platforms. An example is shown in Listing 1.

```
#define sinf(x) adi_sinf(x)
inline
float32_t adi_sinf (float32_t x ) {pADI_MATH0->SINF = x; return pADI_MATH0->RES1;}
```

Listing 1. sinf() implementation in adi\_math.h

With the appropriate compiler optimizations, the inline functions are disassembled to the most optimal assembly instructions, as described in the *Math Assembler Programming Model* in the *ADSP-CM41x Mixed-Signal Control Processor Hardware Reference Manual*<sup>[1]</sup>.

#### **Compiler Optimization**

The IAR tools provide four levels of compiler optimization – none, low, medium, and high - and seven different optimization techniques – common subexpression elimination, loop unrolling, function inlining, code motion, type-based alias analysis, static clustering, and instruction scheduling. For efficient use of the adi\_math.h library, choose the high (with speed) optimization level and enable at least the common subexpression elimination and function inlining optimization techniques.

## **Code Interleaving**

The MATH unit does not require the Cortex-M4F core to remain idle while it performs calculations. To optimize code for performance, it is useful to move unrelated code between the operand-store and the result-load operations. In this manner, the effective time of a function call is eliminated (except for the MMR read/write latency).

For instance, referring to park.h in the attached project files <sup>[6]</sup>, the function park\_xform\_interl() interleaves the computation of sin() and cos() on the Math Unit with other code (memory access, ALU and MAC operations) on Cortex-M4F. In <u>Table 3</u> we see that this reduces the computation of 128 Park transforms (when running on SRAM) by 383 cycles (=128\*(35.055-32.063)).

Refer to the *Math Programming Model* in the *ADSP-CM41x Mixed-Signal Control Processor Hardware Reference Manual* <sup>[1]</sup> for more information on interleaving code with MATH unit operations.



## References

- [1] ADSP-CM41x Mixed-Signal Control Processor with ARM Cortex-M4/ARM Cortex-M0 and 16-bit ADCs Hardware Reference Manual. Rev 0.3, October 2017. Analog Devices, Inc.
- [2] *IEEE Standard for Floating-Point Arithmetic (IEEE Std 754-2008).* August 29, 2008. IEEE Computer Society (Sponsored by the Microprocessor Standards Committee).
- [3] *EEMBC*® *FPMark*<sup>TM</sup>: *The Embedded Industry's First Standardized Floating-Point Benchmark Suite*. August, 2013. The Embedded Microprocessor Benchmark Consortium.
- [4] Clarke and Park Transform. In: Electric Power Quality. Power Systems. 2011. Chattopadhyay S., Mitra M., Sengupta S. Springer, Dordrecht.
- [5] CMSIS-DSP Controller Functions (<u>https://www.keil.com/pack/doc/CMSIS/DSP/html/group\_groupController.html</u>)
- [6] Associated ZIP File for EE-396: EE396\_v02. April 2018. Analog Devices, Inc.

| Revision                                      | Description                                                                                                                                                                                                                               |  |
|-----------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| Rev 2 – April 2, 2018<br>by Punarva Katte     | <ul> <li>Added performance numbers for Clarke and Park transforms using:</li> <li>CMSIS library</li> <li>MATH Unit Accelerator operations interleaved with Cortex-M4F core</li> <li>Attached code for performance measurement.</li> </ul> |  |
| Rev 1 – January 3rd, 2017<br>by Punarva Katte | Initial Release                                                                                                                                                                                                                           |  |

## **Document History**