# **AIE-4 Models**

This repository contains **AIE-4 operator and model build flows**. It allows you to:

* Build individual operator
* Compile fused ONNX models using L2/L3 allocator
* Run simulation / CERT compilation
* Perform regression tests and validation


---

## **Repository Structure**

The structure of the codebase is as follows.

1. `dmacompiler/` - submodule containing all dmacompiler source code
2. `kernel/` - One folder per operator with a `common.hh`. Contains all kernel source
   code and defines the top-level super kernel interface. Kernels should operate on data
   resident in L1 without any locking. Also contains Kernel Param generators.
3. `scheduler/` - Dataflow schedules per operator with a `common.py`. Compiles the dataflow
   from shape and kernel granularity information. Each dataflow must have a default
   sub-volume and split mode for test. The overlay is defined in `common.py`.
4. `buildscripts/` - Each Op has a `*.py` file to interface with the front end. `common.py`
   contains utility code.
5. `buildtest/` - Each Op has a `test_op.py` file to call the buildscript and test the Op.
   should host all the stress/regression test code here.
6. `host/` - Simulation test code in one `*.cpp` file per operator with a `common.hpp`.
   Defines the DI model, formatting code in a `*.hpp` code. Bin files are dumped out for
   debugging purposes.
7. `graph/` - Generates the L3 and L2 address locations from the ONNX graph. The final
   output is a common JSON format understood by `build_aie4.py` which contains all shapes to build.
8. `kerneltest/` - Contains any single-core tests to validate kernel code and wrappers.
   Organized as one folder per test suite with independent build automation (Makefile, etc.)
9. `build_aie4.py` - Single step to build either an operator with a specific shape OR all operators
   associated with a particular model. All options are controlled with command line arguments.
10. `dma_check.sh` - Regression script to check all model and operator builds.
11. `lint.sh` - Linting script to run a python and C++ linter on all files.
12. `settings.sh` - Shell script to initialize all environment variables for the build.

---

## **Windows Setup**

### 1. Create WAIC Conda Env

```bash
git clone --recurse-submodules https://gitenterprise.xilinx.com/IPSP/WAIC.git
conda env create -f env.yml
```

### 2. Activate

```bash
conda activate WAIC
```

### 3. Enable long paths if needed

If you see this error:

```
ERROR: [WinError 206] The filename or extension is too long
```

Enable Win32 long paths:
`gpedit.msc` → Computer Configuration → Administrative Templates → System → Filesystem → **Enable Win32 long paths**

### 4. Load AIE settings

```powershell
.\settings.ps1
```

---

## **Linux Setup**

### 1. Clone repo and submodules

```bash
git clone --recurse-submodules <repo-url>
```

### 2. LSF setup for AIE4

```bash
source /group/xsjfarm/lsf/conf/profile.lsf
bsub -R "select[(osver=ws8)]" -Is -q medium xterm &
```

### 3. Create & activate venv

```bash
bash
source settings.sh
/tool/pandora64/bin/python3.10 -m venv env
source env/bin/activate
pip install -r requirements.txt
```

### 4. Everyday setup

```bash
source settings.sh
source env/bin/activate
```

This sets paths and tools necessary for builds.

---

## **Optional Tools: HTML Diff Support**

For `dma_check.sh` colorful diffs:

```bash
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.3/install.sh | bash
nvm install --lts
nvm use --lts
```

---

## **Compiling an Operator**

### Initialize environment

```bash
cd aie4_models
bash
source settings.sh
source env/bin/activate
```

### Discover operator build options

```bash
cd buildtest
python test_<operator>.py --help
```

Provide flags like `--target` and `--shape-index` to build one configuration.

---

## **Model Compilation**

`build_aie4.py` supports two flows:

### 1. ONNX-Based Compilation

Compile the fused model package from WAIC using the following steps.

**First, comment out these three lines in the WAIC repo:**

* `float32: "bfloat16"` in `dtype_downcast_map` in
  [*DequantizeLinear_kernel_metadata.yaml*](https://gitenterprise.xilinx.com/IPSP/WAIC/blob/main/OGOAT/Collaterals/DequantizeLinear_kernel_metadata.yaml)

* `float32: "bfloat16"` in `dtype_downcast_map` in
  [*QuantizeLinear_kernel_metadata.yaml*](https://gitenterprise.xilinx.com/IPSP/WAIC/blob/main/OGOAT/Collaterals/QuantizeLinear_kernel_metadata.yaml)

* `LinearPlusLut` fusion sequence in
  [*fusion_seq.yml*](https://gitenterprise.xilinx.com/IPSP/WAIC/blob/main/OGOAT/src/L1_fusion/fusion_seq.yml)

**Then run WAIC:**

```bash
python WAIC.py -mp <model_path> --skip 2+ --qdq_optimization 0 -txn none -clean
```

**Use the generated files in `aie4_models`:**

```bash
python build_aie4.py \
  -um /path/unfused.onnx \
  -fm /path/fused.onnx \
  -ir /path/fused_IR.json \
  -un /path/fused_unique_nodes.json \
  -tm /path/tensor_map.json \
  -mdp /path/model/data \
  -rmd true \
  --target cert \
  --clean
```

Common arguments:

* `--unfused_model (-um)` → original ONNX
* `--fused_model (-fm)` → fused ONNX
* `--ir_json (-ir)` → fused IR metadata
* `--unique_nodes (-un)` → unique-nodes metadata
* `--tensor_map (-tm)` → tensor-map metadata
* `--model_data_path (-mdp)` → weight data
* `--read_model_data (-rmd)` → read & format weights
* `--layer_ids` → compile only selected blocks

Layer selection options:

```
--layer_ids 3
--layer_ids 1,2,3
--layer_ids 1-5
```

`sim` and `cert_sim` allow only one layer at a time.


### YAML-Based Model Compilation (Recommended)

Instead of long CLI commands, use `model_cfg.yaml`:

```yaml
fused_model: /path/fused.onnx
ir_json: /path/fused_IR.json
unfused_model: /path/unfused.onnx
unique_nodes: /path/fused_unique_nodes.json
tensor_map: /path/tensor_map.json
target: cert
clean: true
```

Run with:

```bash
python build_aie4.py --cfg model_cfg.yaml
```

Override at runtime (example):

```bash
python build_aie4.py --cfg model_cfg.yaml -t dataflow
```



### 2. JSON-Based Compilation

```bash
python build_aie4.py \
  --json graph/resnet_tiling_L2_fused.json \
  --target cert \
  --clean
```


### Other Useful Flags

```
--clean         # removes output folder before build
--target        # dataflow | sim | cert_sim | cert (default dataflow)
-o              # output folder
-L2L3           # place tensors in both L2 & L3
-skip_op        # list of operators to skip during compilation
-include_op     # list of operators to be included during compilation
```


### Quick Examples

#### **Compile fused ONNX to Graph only**

```bash
python build_aie4.py -fm models/ResNet50_fused.onnx -ir models/ResNet50_IR.json -skip --clean
```

#### **Compile Fused ONNX TO Node List only**
Generate per-subgraph node lists without running the full compilation flow.

Enable this mode by either:
- Setting `gen_node_list: true` in `model_cfg.yaml`, **or**
- Passing the flag via CLI:

```bash
python build_aie4.py --cfg model_cfg.yaml --gen_node_list True
```

#### **Compile fused ONNX to CERT (Full Model)**

```bash
python build_aie4.py \
  -um models/ResNet50_unfused.onnx \
  -fm models/ResNet50_fused.onnx \
  -ir models/ResNet50_IR.json \
  -un models/ResNet50_unique_nodes.json \
  -tm models/ResNet50_tensor_map.json \
  -mdp /path/to/model/data \
  -rmd true \
  --target cert \
  --clean
```

#### **Compile fused ONNX to CERT (One Layer)**

```bash
python build_aie4.py \
  -um models/ResNet50_unfused.onnx \
  -fm models/ResNet50_fused.onnx \
  -ir models/ResNet50_IR.json \
  -un models/ResNet50_unique_nodes.json \
  -tm models/ResNet50_tensor_map.json \
  --layer_ids 3 \
  --target cert \
  --clean
```

#### **Compile JSON to CERT**

```bash
python build_aie4.py --json graph/resnet_tiling_L2_fused.json --target cert --clean
```

#### **Compile only block 3 from JSON**

```bash
python build_aie4.py --json graph/resnet_tiling_L2_fused.json --layer_ids 3 --target cert
```

#### **Compile blocks 1,2,3 from JSON**

```bash
python build_aie4.py --json graph/resnet_tiling_L2_fused.json --layer_ids 1,2,3 --target cert
```

#### **Compile blocks 1–5 (inclusive) from JSON**

```bash
python build_aie4.py --json graph/resnet_tiling_L2_fused.json --layer_ids 1-5 --target cert
```

---

## ML Timeline Profiling

This repo supports **ML Timeline Profiling** across multiple entrypoints. Use the section that matches how you run builds/tests.


### 1) Enable via `build_aie4.py` (model or subgraph tests)

You can enable ML timeline logging either through YAML config or the CLI.

**Option A — YAML**

Set via YAML:

```yaml
# model_cfg.yaml (or equivalent)
ml_timeline_: true
```

Then run as usual (example):

```bash
python build_aie4.py --cfg model_cfg.yaml
```

**Option B — CLI flag**

Pass the flag directly:

```bash
python build_aie4.py ... --ml_timeline true
```


### 2) Enable via `run_hw.py` or `debug_scripts/qhw4_op_level_debug.py` (model or subgraph tests)

Enable profiling by passing the ML timeline flag on the command line:

```bash
python run_hw.py ... --ml_timeline
```

or

```bash
python debug_scripts/qhw4_op_level_debug.py ... --ml_timeline
```

> Note: This flag is intended for both **model-level** and **subgraph-level** runs through these scripts.


### 3) Enable via `buildtest` script runs (e.g., `buildtest/test_op.py`)

For `buildtest`-driven flows, ML timeline logging is controlled via an environment variable.

Set `ML_TIMER_LOG_LEVEL` to `"1"` when invoking the script:

```bash
ML_TIMER_LOG_LEVEL=1 python buildtest/test_op.py --target cert --hwtest
```

---


## **Build AIEBU**

This script automates updating and rebuilding the **AIEBU** tool (`aiebu-asm`) inside this repository.
It ensures that your environment always uses the latest version of the AIEBU compiler from the official repository.


### Overview

`update_aiebu.py` performs the following actions automatically:

1. **Creates a temporary folder** inside the repository.

2. **Clones** the `main-ge` branch of the official AIEBU repository
   (including all submodules):

   ```
   https://github.com/Xilinx/aiebu
   ```

3. **Builds AIEBU** depending on the target platform:

   * **Linux** → `build.sh` (Pandora modules, Boost, LDFLAGS fix)
   * **Windows** → `build22.bat -opt` (MSVC build flow)

4. Handles build failures gracefully:

   * Some unit tests may fail in AIEBU builds
   * The script continues as long as `aiebu-asm` is generated

5. **Copies the built executable** into the correct location:

   * On Linux → `cert_sim/aiebu-asm`
   * On Windows → `prebuilt/aiebu-asm.exe`

6. **Writes the commit hash** of the cloned AIEBU repository into:

   * `cert_sim/aiebu_linux_commit.txt`
   * OR
   * `prebuilt/aiebu_windows_commit.txt`

7. **Deletes** the temporary directory.

8. **Stages and commits only the relevant files** (no debug or map files).

### Requirements

#### General

* Python **3.8+**
* `git` available in PATH

#### Linux Build Requirements

* Access to the internal Pandora environment:

  ```bash
  source /tool/pandora64/etc/modules/INIT/bash
  module load boost
  ```
* `build.sh` requires:

  * Boost module
  * GNU toolchain
  * LDFLAGS fix (`-ldl`)
* Bash shell

#### Windows Build Requirements

* Visual Studio 2022 / MSVC Build Tools
* `build22.bat` available in PATH
* PowerShell or cmd


### Usage

#### Linux

Build using the Pandora toolchain:

```bash
python update_aiebu.py linux
```

#### Windows

Build using MSVC:

```powershell
python update_aiebu.py windows
```



### Output

#### Linux Output

Generated files:

```
cert_sim/aiebu-asm
cert_sim/aiebu_linux_commit.txt
```

Where:

* `aiebu-asm` is copied from:

  ```
  temp/aiebu/build/Debug/opt/xilinx/aiebu/bin/aiebu-asm
  ```

* `aiebu_linux_commit.txt` contains the cloned commit hash.


#### Windows Output

Generated files:

```
prebuilt/aiebu-asm.exe
prebuilt/aiebu_windows_commit.txt
```

Where:

* `aiebu-asm.exe` is copied from:

  ```
  temp/aiebu/build/WBuild/Release/xilinx/aiebu/aiebu-asm.exe
  ```

* `aiebu_windows_commit.txt` contains the cloned commit hash.


### Notes

* You can run the script from **any directory**; it always operates relative to the folder where the script resides.
* The build may return a non-zero exit code if unit tests fail — **this is expected** for AIEBU.

  * The script only cares that `aiebu-asm` (or `.exe`) exists.
* The script **does not commit entire folders**:

  * Only the following are staged:

    * `aiebu-asm`
    * `aiebu_*_commit.txt`
* `temp/` is always deleted automatically.
* After the script finishes, you only need to push the commit.


### Commit Messages

The script automatically generates one of the following messages:

* `Update aiebu-asm Linux build`
* `Update aiebu-asm Windows build`


### Hardware Test Flow (`HW_test`)

Hardware test framework supports:

- Copying subfolders to the remote Windows DUT
- Running `dolphin_test_aie4.py` for each op/subgraph
- Collecting logs, results, and summaries
- Optional golden IO update/copy mode
- Operator mode (`--op`) and subgraph mode (`--sg`)
- Filtering folders via wildcard patterns
- Local Windows execution or remote execution via SSH + SMB
- Optional debug mode

please refer to README_hwtests.md for details.

---

