Metadata-Version: 2.4
Name: waic
Version: 0.7.0
Summary: Windows AI Compiler (WAIC) - build and run MLOPs on AIESim
Home-page: https://gitenterprise.xilinx.com/IPSP/WAIC
Classifier: Programming Language :: Python :: 3
Classifier: License :: Other/Proprietary License :: TBD
Classifier: Operating System :: Microsoft :: Windows
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: beartype>=0.22.0
Requires-Dist: cachetools
Requires-Dist: dataclass_wizard==0.36.2
Requires-Dist: dotenv
Requires-Dist: json5
Requires-Dist: pydantic
Requires-Dist: pytest
Requires-Dist: pyyaml
Requires-Dist: ml_dtypes
Requires-Dist: onnx>=1.17.0
Requires-Dist: onnxruntime
Requires-Dist: ortools
Requires-Dist: numpy==1.26.4
Requires-Dist: scipy
Requires-Dist: argparse
Requires-Dist: tabulate
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# Windows AI Compiler (WAIC)
## Setup
### Install miniforge (if on Windows)
Download and install the latest installer from https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Windows-x86_64.exe

Open Miniforge Prompt, and do the following
```
# To use in Git Bash
conda init bash
# To use in Powershell
conda init powershell
```

### Git LFS setup
This project uses [Git Large File Storage (LFS)](https://git-lfs.com/) to manage binary files and large files.
Please make sure Git LFS is installed and initialized before cloning or pulling the repository.

For more details, see: https://amd.atlassian.net/wiki/spaces/AIG/pages/1132472126/Git+LFS+in+the+RyzenAI+context

### Clone repo
```sh
git clone --recurse-submodules <repo_url>
# if you get error that git-lfs is not present, run this first
# conda install -c conda-forge git-lfs
```
### Setup virtual env

*Launch LSF server (needed for **installing onnx runtime and launching AICompiler/AISim**)*
```sh
source /group/xsjfarm/lsf/conf/cshrc.lsf    # if in csh
source /group/xsjfarm/lsf/conf/profile.lsf  # if in bash

# Launch LSF server
# OPTION 1: wait for xterm terminal to showup
bsub -Is -q long -R "select[osdistro=rhel && (osver=ws8)]" -R "rusage[mem=32768]" xterm -fa 'Monospace' -fs 12&
# OPTION 2: starts LSF job in current terminal
bsub -Is -q long -R "select[osdistro=rhel && (osver=ws8)]" -R "rusage[mem=32768]" env -i TERM=$TERM bash -l
```
*Setup conda in XSJ server*
```
bash
source /tool/pandora64/etc/modules/INIT/bash
module load miniforge3/3.13.2
```
*Setup and Activate conda env*
```
#first time: conda env create -f env.yml
#    use --prefix /everest/psdv_cases_bkup/<user>/.conda/env/WAIC to specify custom conda env location
#    then link env: ln -s /everest/psdv_cases_bkup/<userid>/.conda/env/WAIC ~/.conda/envs/WAIC
#    if you want to update existing env: conda env update -f env.yml
conda activate WAIC
#first time: conda init | Restart terminal
#cd go/to/WAIC
#first time: git-hooks/install-hooks.sh
# for aie2p flow use this
source settings.sh
# for aie4 flow use this
source settings_aie4.sh
```

## To Run WAIC in Windows Setup

### Activate Conda in Windows  

1-download miniforge: https://conda-forge.org/miniforge/  
2-add path of miniforge in system PATH:  
  C:\ProgramData\miniforge3\Scripts  
  C:\ProgramData\miniforge3\Lib  
  C:\ProgramData\miniforge3  
3-open powershell and type conda  # conda option should come  
4-conda init powershell           # for first time   
5-go to WAIC root folder,type conda env create -f env.yml  
6-type conda env list, to see WAIC as env  
7-conda activate WAIC  

### WAIC Windows bin generation  

Run the command in Powershell:  
1-git clone --recurse-submodules https://gitenterprise.xilinx.com/IPSP/WAIC.git  
2-settings.ps1  # it will build the xaiengine dll and set env path  

git pull for very first time,you might faced issue at aie-rt commit not found,  
1-to unblock this issue: run " git submodule update --init --recursive "  
2-settings.ps1  # it will build the xaiengine dll and set env path  


## WAIC Runner script
```
usage: use "WAIC.py --help" for more info

Windows AI Compiler (WAIC) - build and run MLOPs on AIESim

options:
  -h, --help            show this help message and exit
  -mp MODEL_PATH, --model_path MODEL_PATH
                        Path to onnx model (or JSON) and output destination
  -to_i8 INT4_TO_INT8, --int4_to_int8 INT4_TO_INT8
                        path to additional model data file for large models. Optional Field. Default value = 0
  -o {4x4,8x4}, --overlay {4x4,8x4}
                        Name of overlay to run
  -clean, --delete_dir  delete output directory if it already exists
  -ck COMBINE_KERNELS, --combine_kernels COMBINE_KERNELS
                        Use combine kernel file
  -txn BUILD_TXN, --build_txn BUILD_TXN
                        Generate bin for each OP, 'all', 'none', '<layer number>'
  -sim RUN_SIM, --run_sim RUN_SIM
                        run sim for each Op, 'all', 'none', '<layer number>'
  -t {build_run,run,none,build,hw_run}, --test {build_run,run,none,build,hw_run}
                        Run the test flow on HW if new binaries are generated
  -d {strix,med,swv}, --device {strix,med,swv}
                        Name of device to run; e.g., strix or phoenix etc. Default = 'strix'
  -skip SKIP_STEP [SKIP_STEP ...], --skip_step SKIP_STEP [SKIP_STEP ...]
                        Skip WAIC step, none, 1.0: skip L1 shape inference, 2+: skip everything after L1 fusion
  --cpp_me              call CPP ME instead of Python ME
  --fusion_seq FUSION_SEQ
                        Force a specific fusion sequence file to be used instead of the default one
  -O {0,1,2,3}, --optimization_level {0,1,2,3}
                        Set an optimization level
  -lsf, --lsf           use lsf
  -HW_IP HW_IP, --HW_IP HW_IP
                        Set HW IP address
  --perf_testing        Enable performance testing mode.
  -golden_io [GOLDEN_IO ...], --golden_io [GOLDEN_IO ...]
                        Enable golden IO testing mode. Specify subfolders (e.g., 'conv', 'psmu', 'mha'). Include 'update' to replace golden files using DES -> SRC.If no subfolders are given, all available subfolders will be used.
  -rename, --rename     Rename layers folder in WAIC_Outputs.
  -profile_perf, --profile_perf
                        xrt recort_timer profiling
  -rel_err_pc, --rel_err_pc
                        Use average relative error for HW test
  -bfm, --tiler_bfm     Use tiler bfm instead of actual tiler
  -bfm_mode {M4K1N8,M1K1N32}, --tiler_bfm_mode {M4K1N8,M1K1N32}
                        Tensor Split
  -vcd, --dump_waves    Dump vcd trace from AIESIM run
  -kdbg, --kernel_debug
                        Enable kernel debug print and large program memory
  --frontend_only       Runs Front end only (till DMA Compiler stage)
  --cpp_fe CPP_FE       Path to the shared library interface to compile with flexml
  -dbg, --debug         Dump dbg log to 'dbg_log.txt'
  -df DEBUG_FILE_NAME, --debug_file_name DEBUG_FILE_NAME
                        Debug log file name
  -v {debug,info,error}, --verbose {debug,info,error}
                        Verbosity for debug logs
  --call_DMAC           Call DMAC directly for OGOAT OPs instead of dumping .py files
  -p, --profile         Profile auto scheduler
  -pf PROFILE_GRAPH_NAME, --profile_graph_name PROFILE_GRAPH_NAME
                        Profile graph file name
  -output OUTPUT_DIR, --output_dir OUTPUT_DIR
                        output directory
  --local               Don't use bsub to build for HW on LSF cluster, build on local machine
  -shape_params IN_SHAPE_PARAMS, --in_shape_params IN_SHAPE_PARAMS
                        Dynamic shape parameters for inputs as a JSON string. Optional Field. Default value = '{}'
  --disable_fast_pm     To disable fast pm load, Default = False
  --fixed_input_values FIXED_INPUT_VALUES
                        Fixed input values to the neural network. JSON syntax: input name -> value. Optional Field. Default value = '{}'
  --default_shape_params_values DEFAULT_SHAPE_PARAMS_VALUES
                        YML file specifying default shape parameters and graph input values. Default: /scratch/Project/npu/WAIC/OGOAT/src/L1_fusion/default_shape_params_values.yml
  --assert_on_error     Error out if there's an assertion
  -j J                  Number of workers for parallel Tiler, Scheduler execution (default: auto-detect CPU cores, use -j 1 for sequential execution)
  --qdq_optimization {0,1}
                        Enable QDQ optimization at end of L1 fusion. Default is 0 (disabled).
  -m {dev,release}, --mode {dev,release}
                        dev mode will generate all 6 waic bins [ ctrl.bin, ifm.bin, ofm.bin, param.bin, txn.bin, wgt.bin ] and release mode will generate 3 bins [ctrl.bin, param.bin, txn.bin ]
  -infer_batch SHAPE_INFERENCE_OUTPUTS, --shape_inference_outputs SHAPE_INFERENCE_OUTPUTS
                        max batch size during onnx runtime inferencing
  -no_dtype_downcast, --no_dtype_downcast
                        Disable dtype downcasting during L1 fusion
aie4_options:
  --qhw4_runner         Enable qhw4 flow
  -dmp {wgt, ort}, --data_dump {wgt, ort}
                        Data dump option for run_ort. Default value = wgt
  -workers AIE4_NUM_WORKERS, --aie4_num_workers AIE4_NUM_WORKERS
                        AIE4 int number of workers for parallel subgraph compilation.
  -include_op AIE4_INCLUDE_OP, --aie4_include_op AIE4_INCLUDE_OP
                        AIE4 Comma Separated List of Operators that should be included while compiling.
  -skip_op AIE4_SKIP_OP, --aie4_skip_op AIE4_SKIP_OP
                        AIE4 Comma Separated List of Operators that should be skipped while compiling.
  --aie4_layer_ids AIE4_LAYER_IDS
                        AIE4 Key of block in JSON to compile. Compiles all blocks if not set.
  -fp16 {true, 1, yes, false, 0, no}, --aie4_is_qdq_fp16 {true, 1, yes, false, 0 , no}
                        AIE4 QDQ datatpye is FP16 or BF16? (Default -> True (QDQ DType is BF16))
```

### Example cmd line

Call WAIC with the CPP F.E, the path to the vaiml shared library has to be provided and can come from:
the TA (i.e /proj/aiebuilds/ryzen-ai/ryzen-ai-TA/release_rai_1_6/ryzenai_release_daily_latest/lnx64/lib/python3.10/site-packages/flexml/flexml_extras/lib/libvaiml.so)
or from a local build.
```
python WAIC.py -mp OGOAT/models/conv_model.onnx -o 8x4 -clean --cpp_fe /path/to/vaiml/shared_lib
```

Using ONNX model as input:
```
python WAIC.py -mp OGOAT/models/conv_model.onnx -o 8x4 -clean
python WAIC.py -mp OGOAT/models/silu_model.onnx -o 8x4 -txn all -clean
python WAIC.py -mp OGOAT/models/PSR_MHA_self_attn_1head.onnx -o 4x4 -txn all -dbg -clean
python WAIC.py -mp OGOAT/models/matmul_model.onnx -o 4x4 -txn all -t build_run -clean
python WAIC.py -mp OGOAT/models/silu_model.onnx -o 8x4 -sim all -clean
python WAIC.py -mp OGOAT/models/PSMU_ST0.onnx -o 8x4 -clean; python WAIC.py -mp OGOAT/models/silu_model.onnx -o 8x4 -sim MatMul_4
python WAIC.py -mp OGOAT/models/silu_model.onnx -o 8x4 -sim all -clean --frontend_only
```
Using Tiler output(.json) as input:
```
python WAIC.py -mp OGOAT/input/unit_test/scheduling_engine/conv_M4N8_qdq_bias_uint16xuint8xuint16.json -txn all -o 8x4 -clean
python WAIC.py -mp OGOAT/input/unit_test/scheduling_engine/silu_8x4_cstm_input.json -txn all -o 8x4 -clean
```
* `-txn all` generates binaries. 
* For testing generated binaries on silicon, add `-t` argument. 
  * Option `build_run` generates xclbin in WAIC_Outputs folder and uses it for XRT test on HW. 
  * Option `run`, uses existing xclbin (xclbin_path can be modified in WAIC.py) to run XRT test on HW. 
  * Example: `python WAIC.py -mp OGOAT/models/matmul_model.onnx -o 4x4 -txn all -t build_run ` generates the binaries (txn.bin, ctrl.bin, param.bin) and out.xclbin in WAIC_Outputs dir.
  * The successful test output looks the snapshot below, summary of the test is copied to `WAIC_Outputs` dir as `output_timestamp.json`

![test_output_sample](https://media.gitenterprise.xilinx.com/user/2464/files/34fe3a4e-26c1-4edd-b573-c33d43cd82e7)

## Steps to Build the Wheel

### Ensure the Project Structure: Your project directory should look like this:

```
WAIC/
├── OGOAT/
│   ├── src/
│   │   ├── L1_fusion/
│   │   ├── Tiler/
│   │   ├── Scheduling_Engine/
│   │   └── ...
├── dataflow/
├── dmacompiler/
├── HW_requirements/
├── kernels/
├── WAIC.py
├── setup.py
├── README.md
```
### Pre-compile the weight formatting Pyhton modules
see section "Runtime", "Build formatting libraries"

### Add XAIENGINE and CDO
Source settings and add the created files to waic_deps

```
mkdir waic_deps
cp -r aie-rt/driver/src/build/include/* waic_deps
cp -r aie-rt/driver/src/build/Debug/* waic_deps
cp -r Win_dependency/cdo_library/* waic_deps
cp __init__.py waic_deps ## empty init file to include waic_deps in wheel
```

### Install build Tool: 
Run the following command to install the build tool:
```pip install build```

### Build the Wheel: 
Navigate to the project directory and run:

```python build_waic.py dev``` or ```python build_waic.py rel```

This will generate a .whl file and a sdist, e.g., .tar.gz in the dist/ directory. Provide ```--wheel``` flag to build only the .whl file.

### Install the Wheel: 
To test the wheel, create a new conda env and activate env (in new terminal)
```
conda env create -f env_rel.yml
conda activate WAIC_rel
```

Install it using:
```pip install dist/waic-1.0.0-py3-none-any.whl```

### Run the CLI Tool: 
After installation, you can run the waic CLI tool:
```waic --help```

### Set library paths
```
set XAIENGINE_PATH=%CONDA_PREFIX%\Lib\site-packages\waic_deps
set LIBRARY_PATH_XAIENGINE=%CONDA_PREFIX%\Lib\site-packages\waic_deps
set LIBRARY_PATH_CDO=%CONDA_PREFIX%\Lib\site-packages\waic_deps
set PATH=%LIBRARY_PATH_CDO%;%LIBRARY_PATH_XAIENGINE%;%PATH%
```

### Example Commands for Running WAIC
Using an ONNX Model as Input
```
waic -mp OGOAT/models/conv_model.onnx -o 8x4 -clean
waic -mp OGOAT/models/silu_model.onnx -o 8x4 -txn all -clean
waic -mp OGOAT/models/PSR_MHA_self_attn_1head.onnx -o 4x4 -txn all -dbg -clean
waic -mp OGOAT/models/matmul_model.onnx -o 4x4 -txn all -t build_run -clean
waic -mp OGOAT/models/silu_model.onnx -o 8x4 -sim all -clean
waic -mp OGOAT/models/PSMU_ST0.onnx -o 8x4 -clean
waic -mp OGOAT/models/silu_model.onnx -o 8x4 -sim MatMul_4
waic -mp OGOAT/models/silu_model.onnx -o 8x4 -sim all -clean --frontend_only
```

Using Tiler Output (.json) as Input
```
waic -mp OGOAT/input/unit_test/scheduling_engine/conv_M4N8_qdq_bias_uint16xuint8xuint16.json -txn all -o 8x4 -clean
waic -mp OGOAT/input/unit_test/scheduling_engine/silu_8x4_cstm_input.json -txn all -o 8x4 -clean
```

## For Developers
### Install git pre-commit hooks
```
Linux:
source git-hooks/install-hooks.sh
```
```
Windows (use GIT bash):
bash git-hooks/install-hooks.sh
```
*To run pylint on all the files in the repo*
```
pylint . --recursive y > lint.txt
```
*To run regression script **locally** (use "-u" to update golden file)*
```
python OGOAT/misc_tools/regression_script.py -test misc_tools/test_list.csv
```

### Automatic format (check) of Python files

The automatic format (checking) for a Python file can be activated by inserting
`# fmt: on` at the top.
The pre-commit hook will check that those files are formatted correctly.

To manually format a Python file, please run:
```
black <filename.py>
```

To manually format all staged (`git add`ed) Python files that have the
`# fmt: on` marker to activate automatic format checks, please run:
```
python OGOAT/misc_tools/python-automation.py --staged --format
```

To configure formatting of all Python files in VSCode, please install the
`ms-python.black-formatter` extension and add the following section to
`.vscode/settings.json`:
```
 "[python]": {
    "editor.defaultFormatter": "ms-python.black-formatter"
  }
```
To enable automatic formatting of all Python files on saving, please add:
```
  "[python]": {
    "editor.defaultFormatter": "ms-python.black-formatter",
    "editor.formatOnSave": true
  }
```

## Runtime
### Build formatting libraries

1. conda env create --file=env.yml
2. conda activate WAIC
3. cmake -B ./runtime_build/ -S .
4. cmake --build ./runtime_build/ --config Release
5. cp runtime_build/runtime/Release/* prebuilt/runtime

### Run ORT
*refer to DATA_GEN_README.md for more detail

1. data structure: path under OGOAT/
  1.1. create "data/<model>" folder. ex. "data/PSW"
  1.2. create input and output folders: ex. "attention_files", "embeddings", "msft_output"
  1.3. put data into corresponding folder
2. run ort: python ./OGOAT/src/ORT/run_ort.py --model_name OGOAT/models/Model-PSW-QDQ-v1.onnx
     this step will create the DataGen output folder under WAIC_Outputs which is needed for ifm, wgt and ofm formatting


### Run weight formatting

```
For all nodes with generated tiling json
python ./OGOAT/src/Ort/wgt_formatter.py --ir_json <ir json> --tiling_json <tiling json>
```

### Run Ifm/Ofm formatting

```
For all nodes with generated tiling json
python ./OGOAT/src/Ort/act_formatter.py --ir_json <ir json> --tiling_json <tiling json>
```

### Run transaction bin update
```
For all nodes with generated tiling json
python ./OGOAT/src/Ort/txn_update.py --tiling_json <tiling json> --xrt True
```
```
For PSV single Conv
python ./OGOAT/src/Ort/txn_update.py --tiling_json <tiling json> --conv 1 --txn_bin <txn.bin>
```
### Building xclbin
```
launch lsf for centos terminal
bsub -R "select[osdistro=rhel && osver=ws8]" xterm &
bash 
source /tool/pandora64/etc/modules/INIT/bash
module load miniforge3/3.11.5
source settings.sh
input arguments:
   -o OVERLAY, --overlay OVERLAY          
                          Name of overlay to run e.g., 4x4 or 8x4 etc
   -k KERNEL_NAME, --kernel_names KERNEL_NAME     
                         Comma-separated list of kernel names
   -i KERNEL_INCLUDES, --kernel_includes KERNEL_INCLUDES  
                         Comma-separated list of kernel includes
   -d OUTPUT_DIR, --output_dir  OUTPUT_DIR     
                         Output directory
   -f KERNEL_FILE  --kernel_file KERNEL_FILE
                     Path to file.json which has kernel names and kernel includes
  
example for matmul using kernel names and kernel includes: python dataflow/xclbin/xclbin_build.py -o 4x4 -k run_a16w8_gemm_tdm,run_a16w8_gemm_qdq -i super.hh,conv/direct_conv_int16x8_generic/direct_conv_int16x8_generic_gemm_wrapper.cc -d WAIC_Outputs
  
example using json file: python dataflow/xclbin/xclbin_build.py -o 4x4 -f <kernel_list.json> -d .\
```

# Operator Regression
---
Refer to dataflow/README.md and OGOAT/README.md

# Continuous Integration (CI) Overview

We have two CI workflows based on how often tests need to run.
---

## 🛠 Pull Request CI (`ci.yml`)

Runs on **every pull request**. It includes fast, essential tests to quickly check core functionality.

### Included Tests:
- ✅ FrontEnd tests  
- ✅ Conv 
- ✅ MatMul (1 shape)  

These tests are executed **in parallel** to minimize PR validation time.

---

## 🌙 Nightly CI (`ci_nightly.yml`)

This workflow runs **nightly**, covering a more extensive set of tests to detect regressions.

### Included Test:
- 🧪 Conv 
- 🧪 Gelu_qdq  
- 🧪 MatMul_qdq_biasgelu  
- 🧪 Add_qdq_BroadCast   
- 🧪 MatMul tests (6 shapes)  
- 🧪 Multi-Head Attention (MHA) unit tests  
- 🧪 Additional regression tests (via `run_tests` and `run_add_tests` scripts from tools directory)

---
