Intrinsics for moving values from accumulator data-types to vector data-types.
More...
Intrinsics for moving values from accumulator data-types to vector data-types.
Moving data from accumulator data-types back to standard vector data-types requires a reduction in precision. For fixed-point arithmetic, an appropriate transformation involving shifting out lower order bits, rounding and/or saturation can be applied using the SRS family of intrinsics. The shift amount is specified as a parameter (in the range -4 to 59), while the rounding and saturation is applied based on global mode registers of the processor (see Mode Settings).
There are three main variants of the SRS intrinsics based on width of input and output data-types:
- ssrs is used to convert integer
- 32-bit accumulator data into a corresponding 8-bit vector
- 64-bit accumulator data into a corresponding 16-bit vector
- lsrs is used to convert integer
- 32-bit accumulator data into a corresponding 16-bit vector
- 64-bit accumulator data into a corresponding 32-bit vector
- srs is used to convert floating-point accumulators into a corresponding bfloat16 vector
Both ssrs and lsrs modes can be prefixed with 'u' in which case the resulting datatype will be unsigned.
Example
Using the ssrs intrinsic the 32 accumulator lanes of a v32acc32 are shifted directly to the 32 output lanes of a v32int8. Each lane does a separate shifting, rounding and saturation (depending on the parameters):
Definition me_chess.h:508
Definition me_chess.h:509
v32uint8 ussrs(v32acc32 acc, int shft, int sign)
Definition me_srs.h:257
v32int8 ssrs(v32acc32 acc, int shft, int sign)
Definition me_srs.h:256
As indicated in the name each SRS intrinsic performs three operations: Shifting (down, right), saturation and rounding. The first step is to compute saturation:
input_datatype saturation ( input_datatype ival ,
int shift ,
bool & has_sat )
{
input_datatype oval
{
min = - 2^( output_precision - 1 )
max = 2^( output_precision - 1 ) - 1
if ( is_unsigned( output_datatype ) )
{
max = 2 ^ output_precision - 1
}
min = - 2 ^( output_precision - 1 ) + 1
{
has_sat = True
}
{
has_sat = True
}
else
{
oval = ival
}
}
else
oval = ival
return oval
}
v64int8 min(v64int8 a, v64int8 b)
Calculates the minimum between two input vectors.
Definition me_vadd.h:443
v64int8 max(v64int8 a, v64int8 b)
Calculates the maximum between two input vectors.
Definition me_vadd.h:508
unsigned int get_symsat()
Get symmetric saturation mode.
Definition me_set_mode.h:160
unsigned int get_sat()
Get saturation mode.
Definition me_set_mode.h:158
v64int8 shift(v64int8 a, v64int8 b, unsigned int shift)
Definition me_scl2vec.h:169
The rounding factor is then checked according to the selected rounding mode in Rounding modes. Finally, the shift is performed and the rounding factor is applied, as such:
output_datatype lane_srs ( input_datatype ival ,
int shift,
bool & sat)
{
input_datatype oval_aux
output_datatype oval
bool round = False
sat = False
oval_aux = saturation( ival,
shift, sat )
round = rounding ( ival,
shift )
if ( round )
oval += 1
return oval
}
The full srs call then applies the above algorithm to all lanes of a vector and sets the status saturation bit (if saturation is triggered):
vec_output_datatype srs ( vec_input_datatype ival ,
int shift,
bool & sat)
{
vec_output_datatype out
bool sat = False
bool sat_aux = False
for i in lanes(ival)
{
r = lane_srs(i,
shift, sat_aux)
sat |= sat_aux
}
if sat
set_srs_sat()
return out
}
v128int4 upd_elem(v128int4 v, int idx, v2int4 b)
- Note
- Saturation status is not cleared automatically. If set, it will remain set until the user clears the status bit.
- See also
- 'ups' intrinsics (Upshift)