- 
add: Implementation optimization on AIE-ML 
- 
add_reduce: Implement on AIE-ML 
- 
bit/or/xor: Implement scalar x vector variants of bit operations 
- 
equal/not_equal: Add fix in which not all lanes were being compared for certain vector sizes. 
- 
fft: Interface change to enhance portability across AIE/AIE-ML 
- 
fft: Add initial support on AIE-ML 
- 
fft: Add alignment checks for x86sim in FFT iterators 
- 
fft: Make FFT output interface uniform for radix 2 cint16 upscale version on AIE 
- 
filter_even/filter_odd: Functional fixes 
- 
filter_even/filter_odd: Performance improvement for 4b/8b/16b implementations 
- 
filter_even/filter_odd: Performance optimization on AIE-ML 
- 
filter_even/filter_odd: Do not require step argument to be a compile-time constant 
- 
interleave_zip/interleave_unzip: Improve performance when configuration is a run-time value 
- 
interleave_*: Do not require step argument to be a compile-time constant 
- 
load_floor_v/load_floor_bytes_v: New functions that floor the pointer to a requested boundary before performing the load. 
- 
load_unaligned_v/store_unaligned_v: Performance optimization on AIE-ML 
- 
lut/parallel_lookup/linear_approx: First implementation of look-up based linear functions on AIE-ML. 
- 
max_reduce/min_reduce: Add 8b implementation 
- 
max_reduce/min_reduce: Implement on AIE-ML 
- 
mmul: Implement new shapes for AIE-ML 
- 
mmul: Initial support for 4b multiplication 
- 
mmul: Add support for 80b accumulation for 16b x 32b / 32b x 16b cases 
- 
mmul: Change dimension names from MxNxK to MxKxN 
- 
mmul: Add size_A/size_B/size_C data members 
- 
mul: Optimized mul+conj operations to merged into a single intrinsic call on AIE-ML 
- 
sin/cos/sincos: Fix to avoid int -> unsigned conversions that reduce the range 
- 
sin/cos/sincos: Use a compile-time division to compute 1/PI 
- 
sin/cos/sincos: Fix floating-point range 
- 
sin/cos/sincos: Optimized implementation for float vector 
- 
shuffle_up/shuffle_down: Elements don't wrap around anymore. Instead, new elements are undefined. 
- 
shuffle_up_rotate/shuffle_down_rotate: New variants added for the cases in which elements need to wrap-around 
- 
shuffle_up_replicate: Variant added which replicates the first element. 
- 
shuffle_up_fill: Variant added which fills new elements with elements from another vector. 
- 
shuffle_*: Optimization in shuffle primitives on AIE, especially for 8b/16b cases 
- 
sliding_mul: Fixes to handle larger Step values for cfloat variants 
- 
sliding_mul: Initial implementation for 16b x 16b and cint16b x cint16b on AIE-ML 
- 
sliding_mul: Optimized mul+conj operations to merged into a single intrinsic call on AIE-ML 
- 
sliding_mul_sym: Fixes in start computation for filters with DataStepX > 1 
- 
sliding_mul_sym: Add missing int32 x int16 / int16 x int32 type combinations 
- 
sliding_mul_sym: Fix two-buffer sliding_mul_sym acc80 
- 
sliding_mul_sym: Add support for separate left/right start arguments 
- 
store_v: Support pointers annotated with storage attributes