
* **[AOCL-BLAS](#aocl-blas)**
* **[Documentation](#documentation)**
* **[Contacts](#contacts)**
* **[Acknowledgments](#Acknowlegemnts)**

AOCL-BLAS
------------------

AOCL-BLAS is AMD's optimized version of BLAS targeted for AMD EPYC and Ryzen CPUs. It is developed as a forked version of BLIS(FLAME/BLIS). All the features and functionalities of FLAME/BLIS are retained and supported as it is with this library.

AOCL-BLAS has been implemented with various optimizations such as follows:

*	Reduced framework overheads
*	Vectorized kernels for AMD EPYC<sup>TM</sup> CPUs
*	Optimal selection of the algorithm/code paths

Starting from AOCL-BLAS version 3.1, AOCL-BLAS has added support for the following:

*	Automatic selection of number of threads using AOCL dynamic feature
*	Windows version of AOCL-BLAS with Microsoft Visual Studio integration

Starting from AOCL-BLAS version 3.2, AOCL-BLAS has added support for the following:

*	AOCL Progress support feature.
*	Runtime Thread Control using OpenMP API.

Starting from AOCL-BLAS version 4.0, AOCL-BLAS has added support for the following:

*      	zen4 support for AOCL-BLAS.
*       Dynamic dispatcher support for zen4 config.
*	LPGEMM support

Starting from AOCL-BLAS version 4.1, AOCL-BLAS has added support for the following:

*	Dynamic dispatch and amdzen configuration support are added to aocl_gemm addon.
*	AVX 512 based optimizations for zen4 platform
	- SGEMM, DGEMM, ZGEMM
	- DTRSM
	- DAXPY & ZAXPY, ZGEMV, DDOTV, SCALV & ZSCALV
* 	Improved support for OpenMP nested parallelism.

Starting from AOCL-BLAS version 4.2, AOCL-BLAS has added support for the following:

*       BLAS Extension APIs 
*       Added support for AOCL_ENABLE_INSTRUCTIONS env. Variable
*       Performance Optimizations for below APIs
	•		DGEMM for tiny sizes
	•		SGEMM, ZGEMM, DTRSM, ZTRSM, XGEMV, ZAXPBYV, Z/ZDSCALV
*	CMake Build system Update for Windows

Starting from AOCL-BLAS version 5.0, AOCL-BLAS has added support for the following:

*       Zen5 configuration support on Turin.
* 	Turin optimizations for D/ZGEMM, DTRSM, and DNRM2 APIs. 
*	AVX-512 improvements:
        *       ZGEMV, D/ZAXPYF, D/ZDOTXF, ZDOTV, C/ZSCALV, DNRM2, S/D/ZCOPY
	*	S/D/C/ZAXPBYV, DTRSV, DGEMMT, D/ZTRSM, and D/ZGEMM
* 	Additional APIs and Post-Ops support in addition to the improved performance for the existing APIs in aocl_gemm add-on

Starting from AOCL-BLAS version 5.1, AOCL-BLAS has added support for the following:

* 	DGEMM, DTRSM Block Tuning for Zen5.
* 	Performance Optimizations: 
	*	DGEMM, DGEMV, ZGEMM, DTRSV, DCOPYV on Zen4/5
	*	DSCALV, DDOTV on Zen3
* 	LPGEMM:
	*	AOCL_ENABLE_INSTRUCTIONS support
	*	Threading Framework Optimizations
	*	WOQ with/without Group Quantization


The upstream repository (FLAME/BLIS) contains further information on AOCL-BLAS, including background information on AOCL-BLAS design, usage examples, and a complete AOCL-BLAS API reference.

Documentation
-------------

The AOCL user guide contains detailed information on the following:

*	Installation
*	Debugging
*	Performance optimization
*	Examples

For more information, refer AOCL user guide (https://developer.amd.com/amd-aocl/#userguide).

You can find the developer documents in the docs folder of this repo.


Contacts
--------.

AOCL-BLAS is developed and maintained by AMD. For support or queries, you can email us on toolchainsupport@amd.com.

You can also raise any issue/suggestion on the GitHub repository (https://github.com/amd/amd-blis/issues).


