AI Engine-ML v2 Intrinsics User Guide: Shift-Round-Saturate

AI Engine-ML v2 Intrinsics User Guide v2025.1

Loading...

Searching...

No Matches

Intrinsics for moving values from accumulator data-types to vector data-types. More...

Topics
	AIE interface

	Floating-point interface

	Size interface

Detailed Description

Intrinsics for moving values from accumulator data-types to vector data-types.

Moving data from accumulator data-types back to standard vector data-types requires a reduction in precision. For fixed-point arithmetic, an appropriate transformation involving shifting out lower order bits, rounding and/or saturation can be applied using the SRS family of intrinsics. The shift amount is specified as a parameter (in the range -4 to 59), while the rounding and saturation is applied based on global mode registers of the processor (see Mode Settings).

There are three main variants of the SRS intrinsics based on width of input and output data-types:

ssrs is used to convert integer
- 32-bit accumulator data into a corresponding 8-bit vector
- 64-bit accumulator data into a corresponding 16-bit vector
lsrs is used to convert integer
- 32-bit accumulator data into a corresponding 16-bit vector
- 64-bit accumulator data into a corresponding 32-bit vector
srs is used to convert floating-point accumulators into a corresponding bfloat16 vector

Both ssrs and lsrs modes can be prefixed with 'u' in which case the resulting datatype will be unsigned.

Example

Using the ssrs intrinsic the 32 accumulator lanes of a v32acc32 are shifted directly to the 32 output lanes of a v32int8. Each lane does a separate shifting, rounding and saturation (depending on the parameters):

v32int8 o0 = ssrs(acc0,0)

v32uint8 o1 = ussrs(acc0,0)

Definition me_chess.h:508

Definition me_chess.h:509

v32uint8 ussrs(v32acc32 acc, int shft, int sign)

Definition me_srs.h:257

v32int8 ssrs(v32acc32 acc, int shft, int sign)

Definition me_srs.h:256

As indicated in the name each SRS intrinsic performs three operations: Shifting (down, right), saturation and rounding. The first step is to compute saturation:

input_datatype saturation ( input_datatype ival , int shift , bool & has_sat )
{
  input_datatype oval
  input_datatype max
  input_datatype min
 
  if ( get_sat() ) // Please see set_sat() and get_sat()
  {
    min = - 2^( output_precision - 1 )
    max =   2^( output_precision - 1 ) - 1
 
    if ( is_unsigned( output_datatype ) )
    {
      min = 0
      max = 2 ^ output_precision - 1
    }
    else if ( get_symsat() ) // Please see set_sym_sat() and get_sym_sat()
      min = - 2 ^( output_precision - 1 ) + 1
 
    max = max << shift
    min = min << shift
 
    if ( ival > max )
    {
      oval    = max
      has_sat = True // See set_srs_sat()
    }
    else if ( ival < min )
    {
      oval    = min
      has_sat = True // See set_srs_sat()
    }
    else
    {
      oval = ival
    }
  }
  else
    oval = ival
  return oval
}

The rounding factor is then checked according to the selected rounding mode in Rounding modes. Finally, the shift is performed and the rounding factor is applied, as such:

output_datatype lane_srs ( input_datatype ival , int shift, bool & sat)
{
  input_datatype  oval_aux
  output_datatype oval
  bool round = False
  sat        = False
 
  oval_aux = saturation( ival, shift, sat )
  round    = rounding  ( ival, shift      ) // Please see the rounding modes available
 
  oval = oval_aux >> shift
 
  if ( round )
    oval += 1
 
  return oval
}

The full srs call then applies the above algorithm to all lanes of a vector and sets the status saturation bit (if saturation is triggered):

vec_output_datatype srs ( vec_input_datatype ival , int shift, bool & sat)
{
 
  vec_output_datatype out
  bool sat     = False
  bool sat_aux = False
 
  for i in lanes(ival)
  {
    r    = lane_srs(i, shift, sat_aux)
    sat |= sat_aux
    out = upd_elem(out,i,r)
  }
 
  if sat
    set_srs_sat()
  return out
 
}

Note: Saturation status is not cleared automatically. If set, it will remain set until the user clears the status bit.

See also: 'ups' intrinsics (Upshift)

UG1639 © 2025 Advanced Micro Devices, Inc. All rights reserved.