# CI/CD Automation for AIE4 Models

Automated regression testing system for AIE4 models using LSF job scheduler.

## Overview

This directory contains scripts for automated testing and continuous integration:

- **`ci_cron_job.sh`**: Automation wrapper (git branching, email, output management)
- **`run_lsf_tests.sh`**: Environment setup wrapper (LSF, venv, dependencies)
- **`run_lsf_tests.py`**: Core test orchestrator (submission, monitoring, reporting)

These scripts work together to provide a complete CI/CD pipeline for running regression tests on LSF clusters.

---

## Architecture

### File Hierarchy

```bash
ci/
├── ci_cron_job.sh          # Outer orchestrator
├── run_lsf_tests.sh        # Environment setup wrapper
├── run_lsf_tests.py        # Core Python orchestrator
├── report_template.md      # Markdown template for reports
└── report_template.html    # HTML template for reports

buildtest/
├── pytest_lsf.py           # Interactive CLI (separate tool)
└── pytest_lsf_wrapper.sh   # Job executor (runs inside LSF)
```

### Data Flow

```bash
Cron/Manual Trigger
       ↓
ci_cron_job.sh          # Job naming, git switching, email setup
       ↓
run_lsf_tests.sh        # Source LSF profile, activate venv
       ↓
run_lsf_tests.py        # Collect & submit tests, monitor, report
       ↓
    LSF Jobs            # Each test runs independently
       ↓
pytest_lsf_wrapper.sh   # Setup environment inside LSF job
       ↓
    pytest              # Execute test
```

### Script Responsibilities

| Script                  | Language | Runs On          | Purpose                  |
| ----------------------- | -------- | ---------------- | ------------------------ |
| `ci_cron_job.sh`        | Bash     | Login node       | Automation orchestration |
| `run_lsf_tests.sh`      | Bash     | Login node       | Environment setup        |
| `run_lsf_tests.py`      | Python   | Login node       | Test logic & monitoring  |
| `pytest_lsf_wrapper.sh` | Bash     | LSF compute node | Job executor             |

---

## Quick Start

### Manual Regression Run

```bash
# Simple run on current branch
cd ci
bash ci_cron_job.sh -k "test_conv"

# With custom output directory
bash ci_cron_job.sh --output-dir /path/to/logs -k "test_binary"

# Switch to remote branch first
bash ci_cron_job.sh --remote-branch origin/main

# Full featured run with email
bash ci_cron_job.sh \
  --remote-branch origin/main \
  --output-dir /everest/logs/$(date +%Y%m%d) \
  --emails "team@amd.com" \
  -k "test_conv"
```

### Scheduled Cron Job

```bash
# Nightly regression at 2 AM
0 2 * * * cd /path/to/aie4_models/ci && bash ci_cron_job.sh --remote-branch origin/main --emails "team@amd.com"

# Multiple scheduled runs
0 2 * * * cd /path/to/aie4_models/ci && bash ci_cron_job.sh --remote-branch origin/main --emails "team@amd.com"
0 4 * * * cd /path/to/aie4_models/ci && bash ci_cron_job.sh --remote-branch origin/develop --emails "dev-team@amd.com"
```

---

## Scripts Reference

### ci_cron_job.sh

**Purpose**: Outer automation wrapper for scheduled and manual regression runs.

**Responsibilities**:

- Generate unique job prefixes
- Manage output directories
- Optional git branch switching (local or remote)
- Email notifications with HTML reports
- Cleanup and error handling

**Command-Line Options**:

```bash
--output-dir <path>           # Custom output directory (default: /tmp/aie4_cron_<timestamp>)
--local-branch <branch>       # Switch to existing local branch
--remote-branch <remote/branch> # Fetch and switch to remote branch
--emails <comma-separated>    # Email recipients for notifications
--sendmail <path>             # Custom sendmail path (default: /usr/sbin/sendmail)
--target <target>             # Build target: dataflow, sim, cert_sim, cert (default: sim)
--buildtest-output-root <path> # Custom output root for pytest (supports {{worker_id}}, {{uuid}} templates)
--hwtest                      # Run hardware validation after builds
-k <expression>               # Test filter (passed to pytest)
```

**Examples**:

```bash
# Default: /tmp output, no email, current branch
bash ci_cron_job.sh -k "test_gemm"

# Custom output directory
bash ci_cron_job.sh --output-dir /everest/logs/regression

# Test on remote branch
bash ci_cron_job.sh --remote-branch origin/main

# Full automation with email
bash ci_cron_job.sh \
  --remote-branch origin/main \
  --output-dir /everest/logs/$(date +%Y%m%d) \
  --emails "user1@amd.com,user2@amd.com"

# Test local changes
bash ci_cron_job.sh --local-branch my_feature -k "test_conv"

# Run hardware tests (CERT target)
bash ci_cron_job.sh --target cert --hwtest -k "(test_conv or test_gemm) and ([0] or -0])"

# Hardware tests with email notification
bash ci_cron_job.sh \
  --target cert \
  --hwtest \
  --output-dir /everest/logs/hw_tests_$(date +%Y%m%d) \
  --emails "team@amd.com" \
  -k "(test_conv or test_gemm) and ([0] or -0])"

# Use custom output directory with template variables
bash ci_cron_job.sh \
  --target sim \
  --buildtest-output-root "/tmp/test_{{worker_id}}_{{uuid}}" \
  -k "test_conv"

# Simulation tests with unique output per worker
bash ci_cron_job.sh \
  --target sim \
  --buildtest-output-root "/everest/builds/sim_{{uuid}}" \
  --emails "team@amd.com" \
  -k "test_binary"
```

**Important Notes**:

- Cannot use both `--local-branch` and `--remote-branch` together
- Checks for uncommitted changes before switching branches
- Returns to original branch on exit (cleanup)
- Email is disabled by default (requires `--emails` flag)
- Creates unique output directory if not specified

---

### run_lsf_tests.sh

**Purpose**: Environment setup wrapper before Python execution.

**Responsibilities**:

- Check LSF availability (`bjobs` permissions)
- Source LSF profile (`/group/xsjfarm/lsf/conf/profile.lsf`)
- Source repository settings (`settings.sh`)
- Create/activate virtual environment
- Install Python dependencies
- Execute `run_lsf_tests.py`

**Usage**:

```bash
# Standalone usage (no git switching, no email)
bash run_lsf_tests.sh -k "test_conv"

# With custom output directory
bash run_lsf_tests.sh --output-dir /path/to/logs -k "."

# Custom job prefix
bash run_lsf_tests.sh --job-prefix "manual_test_run" -k "test_binary"
```

**When to Use**:

- Manual testing without git branch switching
- Quick regression runs without email notifications
- Debugging CI scripts

**Environment Setup**:

1. Sources LSF profile (enables `bjobs`, `bsub`, etc.)
2. Sources `settings.sh` (sets project-specific variables)
3. Creates virtual environment at `../env` if not exists
4. Activates virtual environment
5. Installs dependencies from `../requirements.txt`
6. Sets `AIE4_ROOT_DIR` environment variable

---

### run_lsf_tests.py

**Purpose**: Core Python orchestrator for test submission, monitoring, and reporting.

**Responsibilities**:

- Collect tests using pytest API
- Submit each test as separate LSF job
- Monitor job status with efficient batching
- Capture logs before killing timed-out jobs
- Check for LSF exit codes and DI_FAIL errors
- Generate HTML/Markdown reports
- Handle interrupts (Ctrl+C) gracefully

**Command-Line Options**:

```bash
--job-prefix <prefix>    # Unique job identifier (auto-generated if not provided)
--output-dir <path>      # Directory for LSF logs (default: buildtest/lsf_logs)
--target <target>        # Build target: dataflow, sim, cert_sim, cert (default: sim)
--output-root <path>     # Output root for pytest --output-root (supports {{worker_id}}, {{uuid}} templates)
--hwtest                 # Run hardware validation after builds
-k <expression>          # Pytest test filter expression
```

**Examples**:

```bash
# Direct Python execution (requires manual env setup)
python run_lsf_tests.py -k "test_conv"

# Custom job prefix and output with specific target
python run_lsf_tests.py \
  --job-prefix "regression_$(date +%Y%m%d)" \
  --output-dir /tmp/logs \
  --target sim \
  -k "test_binary"

# Hardware testing with CERT target
python run_lsf_tests.py \
  --target cert \
  --hwtest \
  --job-prefix "hw_test_$(date +%Y%m%d)" \
  -k "(test_conv or test_gemm) and ([0] or -0])"

# Simulation with custom output root using template variables
python run_lsf_tests.py \
  --target sim \
  --output-root "/tmp/test_{{worker_id}}_{{uuid}}" \
  -k "test_conv"
```

**Key Features**:

1. **Efficient Job Monitoring**:

   - Single `bjobs` call for all jobs (not one per job)
   - Polls every 10 seconds
   - Tracks jobs by ID (not pattern matching)

2. **Timeout Handling**:

   - Default: 3 hours (`MAX_WAIT_SECONDS`)
   - Captures logs BEFORE killing jobs
   - Avoids LSF termination messages obscuring actual output

3. **Memory-Efficient Log Reading**:

   - Handles 100+ MB log files
   - Reads last N lines without loading entire file
   - Uses binary seek for large files

4. **Comprehensive Failure Detection**:

   - Checks LSF exit codes ("Successfully completed" vs "Exited with exit code")
   - Searches for `DI_FAIL:` in output files (line-by-line)
   - Scoped to current run (no false positives from old logs)

5. **Interrupt Handling**:
   - Catches Ctrl+C
   - Kills all submitted jobs with `bkill`
   - Generates failure report
   - Exits with code 130

**Execution Flow**:

The script follows this sequence when running tests:

1. **Test Collection**: Uses pytest API to collect all matching tests
2. **Job Submission**: Submits each test as a separate LSF job via `bsub`
3. **Status Monitoring**:
   - Polls `bjobs` every 10 seconds with batch calls
   - Tracks job status (RUN, PEND, DONE, EXIT)
   - Detects when all jobs complete or timeout occurs
4. **Failure Analysis**:
   - Reads LSF `.out` files for exit codes
   - Searches for "Successfully completed" or "Exited with exit code X"
   - Scans for `DI_FAIL:` errors line-by-line
5. **Report Generation**: Creates HTML/Markdown report with:
   - Test summary (passed/failed counts)
   - Failed job details with exit codes
   - Log file excerpts
   - Timestamp and job prefix
6. **Notification**: Sends email (if run via `ci_cron_job.sh`)

---

## Configuration

### Output Directory

The output directory stores LSF log files and reports.

**Default Behavior**:

- `ci_cron_job.sh`: `/tmp/aie4_cron_<timestamp>`
- `run_lsf_tests.sh`: `buildtest/lsf_logs`
- `run_lsf_tests.py`: `buildtest/lsf_logs`

**Custom Directory**:

```bash
# Via ci_cron_job.sh
bash ci_cron_job.sh --output-dir /everest/logs/$(date +%Y%m%d)

# Via run_lsf_tests.sh
bash run_lsf_tests.sh --output-dir /tmp/my_test_run

# Via run_lsf_tests.py
python run_lsf_tests.py --output-dir /path/to/logs
```

**Output Structure**:

```bash
<output-dir>/
├── cron_job.log                              # Wrapper script log
├── cron_<timestamp>_<pid>_<rand>_<test>.out  # LSF stdout (per test)
├── cron_<timestamp>_<pid>_<rand>_<test>.err  # LSF stderr (per test)
└── cron_<timestamp>_<pid>_<rand>_report.html # HTML report
```

---

### Email Notifications

Email notifications are sent via `sendmail` with HTML reports.

**Configuration**:

```bash
# Single recipient
bash ci_cron_job.sh --emails "user@amd.com"

# Multiple recipients (comma-separated)
bash ci_cron_job.sh --emails "user1@amd.com,user2@amd.com,user3@amd.com"

# Custom sendmail path
bash ci_cron_job.sh \
  --emails "user@amd.com" \
  --sendmail "/usr/local/bin/sendmail"
```

**Email Content**:

- Subject: "AIE4 Cron Job SUCCESS/FAILED - <timestamp>"
- Content-Type: text/html
- Body: Full HTML report with:
  - Test summary
  - Failed job tables
  - Log excerpts
  - DI_FAIL errors

**Important**:

- Email is **disabled by default** (requires `--emails` flag)
- Subject line reflects report status (not bash exit code)
- Report location parsed from `cron_job.log`

**Testing Email**:

```bash
# Test sendmail
echo "Test email body" | /usr/sbin/sendmail -t <<EOF
To: your.email@amd.com
Subject: Test Email
EOF
```

---

### Git Branch Switching

Optional feature for testing different branches without manual checkout.

**Local Branch**:

```bash
# Switch to existing local branch
bash ci_cron_job.sh --local-branch my_feature

# Example: Test changes on feature branch
bash ci_cron_job.sh --local-branch dev_branch -k "test_conv"
```

**Remote Branch**:

```bash
# Fetch and switch to remote branch
bash ci_cron_job.sh --remote-branch origin/main

# Test on colleague's fork
bash ci_cron_job.sh --remote-branch upstream/feature_x

# Cron job: always use remote branch
bash ci_cron_job.sh --remote-branch origin/ci_cron_runner
```

**How It Works**:

1. **Validation**:

   - Checks for uncommitted changes (`git status --porcelain`)
   - Cannot use both `--local-branch` and `--remote-branch`

2. **Local Branch**:

   - Verifies branch exists locally
   - Checks out branch
   - Saves original branch for cleanup

3. **Remote Branch**:

   - Parses `remote/branch` format
   - Fetches from remote
   - Creates/updates local tracking branch
   - Updates submodules

4. **Cleanup**:
   - Returns to original branch on exit
   - Runs even if script fails (trap handler)

**Best Practices**:

- For cron jobs: Always use `--remote-branch` (ensures latest code)
- For manual testing: Use `--local-branch` or omit (test local changes)
- For production: Use dedicated CI branch (e.g., `origin/ci_cron_runner`)

---

## Advanced Topics

### Performance Optimizations

#### 1. Batch bjobs Calls

**Problem**: Calling `bjobs <job_id>` 193 times per iteration took 14-15 seconds instead of 10.

**Solution**: Single batch call for all jobs:

```python
# Before (slow): 193 subprocess calls
for job_id in job_ids:
    subprocess.run(["bjobs", "-w", job_id], ...)

# After (fast): 1 subprocess call
subprocess.run(["bjobs", "-w"] + job_ids, ...)
```

**Impact**: Reduced iteration time from 14-15s to ~10s, saving ~50 minutes on 2-hour runs.

#### 2. Log Capture Before Killing

**Problem**: After `bkill`, LSF appends termination info to `.out` files. Reading last 20 lines only showed LSF messages, not actual job output.

**Solution**: Capture logs BEFORE killing jobs:

```python
def handle_timeout_by_ids(...):
    # Capture logs BEFORE killing
    captured_logs = get_incomplete_job_logs(incomplete_jobs)

    # Then kill jobs
    subprocess.run(["bkill"] + still_running_ids)

    return incomplete_table, incomplete_jobs, captured_logs
```

**Impact**: Reports now show actual job output instead of LSF termination messages.

#### 3. Memory-Efficient File Reading

**Problem**: Loading 100+ MB log files into memory with `f.read()`.

**Solution**: Read last N lines without loading entire file:

```python
def read_last_lines(file_path, num_lines=20):
    file_size = file_path.stat().st_size

    if file_size < 1_000_000:
        # Small files: read normally
        with open(file_path, 'r') as f:
            lines = f.readlines()
            return "".join(lines[-num_lines:])

    # Large files: seek to end and read buffer
    with open(file_path, 'rb') as f:
        buffer_size = 100_000
        f.seek(max(0, file_size - buffer_size))
        data = f.read()

    text = data.decode('utf-8', errors='ignore')
    lines = text.splitlines(keepends=True)
    return "".join(lines[-num_lines:])
```

**Impact**: Handles 100+ MB files without memory issues.

#### 4. Line-by-Line DI_FAIL Detection

**Problem**: `find Output/ -name AIESimulator.log -exec grep DI_FAIL` was slow and searched old runs.

**Solution**: Line-by-line reading scoped to current run:

```python
def check_di_failures(submitted_jobs):
    for job_info in submitted_jobs:
        with open(job_info.out_file, 'r', errors='ignore') as f:
            for line in f:  # Streaming read
                if 'DI_FAIL:' in line:
                    di_fail_lines.append(line.strip())
```

**Impact**: Faster, more reliable, no false positives from old logs.

---

### Customization

#### Changing Timeout

Edit `run_lsf_tests.py`:

```python
# Default: 3 hours
MAX_WAIT_SECONDS = 3 * 60 * 60

# Change to 4 hours
MAX_WAIT_SECONDS = 4 * 60 * 60
```

#### Custom LSF Queue

Pass via command line:

```bash
# Queue is set in submit_lsf_job() call
# Currently hardcoded to "medium"
# To customize, edit run_lsf_tests.py or pass to pytest_lsf.py
```

#### Resource Requirements

Edit `run_lsf_tests.py` in `submit_all_tests()`:

```python
job_id = submit_lsf_job(
    test_id=test_id,
    target=BuildTarget.SIM,
    job_name=job_name,
    queue="medium",        # Change queue
    mem_limit="16GB",      # Change memory
    output_dir=output_dir,
    dry_run=False
)
```

---

### Integration

#### Crontab

**Setup**:

```bash
# Edit crontab
crontab -e

# Add nightly run at 2 AM
0 2 * * * cd /path/to/aie4_models/ci && bash ci_cron_job.sh --remote-branch origin/main --emails "team@amd.com" >> /tmp/cron.log 2>&1

# Multiple runs
0 2 * * * cd /path/to/aie4_models/ci && bash ci_cron_job.sh --remote-branch origin/main --emails "team@amd.com"
0 6 * * * cd /path/to/aie4_models/ci && bash ci_cron_job.sh --remote-branch origin/develop --emails "dev-team@amd.com"

# Weekend long runs
0 0 * * 6 cd /path/to/aie4_models/ci && bash ci_cron_job.sh --remote-branch origin/main --emails "team@amd.com" -k "."
```

**Best Practices**:

- Always use absolute paths in cron
- Always use `--remote-branch` (ensures latest code)
- Redirect output to log file
- Test cron command manually first
- Use dedicated email for cron notifications

#### GitHub Actions

**Example Workflow**:

```yaml
name: Regression Tests

on:
  schedule:
    - cron: "0 2 * * *" # Daily at 2 AM
  workflow_dispatch: # Manual trigger

jobs:
  regression:
    runs-on: [aie4-runner, lsf] # LSF-enabled runner

    steps:
      - name: Checkout
        uses: actions/checkout@v3
        with:
          submodules: recursive

      - name: Run Regression
        run: |
          set -euo pipefail
          cd ci
          bash ci_cron_job.sh \
            --remote-branch origin/${{ github.ref_name }} \
            --output-dir /tmp/gh_actions_${{ github.run_id }} \
            --emails "team@amd.com"

      - name: Upload Report
        if: always()
        uses: actions/upload-artifact@v3
        with:
          name: test-report
          path: /tmp/gh_actions_${{ github.run_id }}/*.html
```

**Hardware Tests Workflow** (`.github/workflows/hw-tests.yml`):

The repository includes an automated hardware testing workflow:

```yaml
name: Hardware Tests

on:
  workflow_call: # Called from main CI pipeline

jobs:
  hw-tests:
    runs-on: [aie4-runner, lsf]
    timeout-minutes: 180

    steps:
      - name: Run hardware tests
        run: |
          source env/bin/activate
          mkdir -p ci-artifacts/logs
          source settings.sh
          bash hw_tests.sh 2>&1 | tee ci-artifacts/logs/hw_tests.log

      - name: Parse log
        run: |
          grep -q "DI_PASS" "ci-artifacts/logs/hw_tests.log" && score=0 || score=$?
          echo "score=${score}" >> $GITHUB_OUTPUT

      - name: Post PR comment with results
        if: github.event_name == 'pull_request' && steps.logcheck.outputs.score != '0'
        uses: ./.github/actions/post-pr-comment
```

**Key Features**:

- Runs on every PR via `workflow_call` from main CI pipeline
- Uses `hw_tests.sh` script at repository root for simplified execution
- Tests hardware with default filter: `(test_conv or test_gemm) and ([0] or -0])`
- Parses log for `DI_PASS` to determine success/failure
- Posts PR comment on failures
- Uses 180-minute timeout for long-running hardware tests

**Alternative LSF-based Workflow** (`.github/workflows/hw-tests-lsf.yml`):

For more complex scenarios with LSF job submission and custom output directories:

```yaml
name: Hardware Tests (LSF)

on:
  workflow_dispatch: # Manual trigger only

jobs:
  hw-tests:
    runs-on: [aie4-runner, lsf]

    steps:
      - name: Create output directory
        run: |
          OUTPUT_DIR="/everest/ppsdv_cases_nobkup/shashank/logs/hw_tests_$(date +%Y%m%d_%H%M%S)"
          BUILDTEST_OUTPUT_DIR="$OUTPUT_DIR/Output"
          mkdir -p "$BUILDTEST_OUTPUT_DIR"

      - name: Run hardware tests
        run: |
          cd ci
          bash ci_cron_job.sh \
            --output-dir "$OUTPUT_DIR" \
            -k "(test_conv or test_gemm) and ([0] or -0])" \
            --target cert \
            --buildtest-output-root "$BUILDTEST_OUTPUT_DIR/Output_{{uuid}}"

      - name: Flatten and validate
        run: |
          # Flatten Output_* directories
          for worker_dir in "$BUILDTEST_OUTPUT_DIR"/Output_*/; do
              mv "$worker_dir"* "$BUILDTEST_OUTPUT_DIR/" 2>/dev/null || true
              rm -rf "$worker_dir" 2>/dev/null || true
          done

          # Run hardware validation
          python -c "
          from buildtest.common import run_hw_validation
          run_hw_validation(out_dir='$BUILDTEST_OUTPUT_DIR')
          "
```

**Self-Hosted Runner Setup**:

```bash
# On LSF-enabled machine
mkdir -p ~/actions-runner && cd ~/actions-runner

# Download and configure GitHub Actions runner
# Follow GitHub's instructions for your repo

# Start runner in background with tmux
tmux new -d -s github-runner './run.sh'

# Or as systemd service (see GitHub docs)
```

---

## Troubleshooting

### LSF Issues

**"No unfinished job found" or jobs not appearing**:

```bash
# Check LSF permissions
bjobs -u $USER

# Verify LSF profile is sourced
source /group/xsjfarm/lsf/conf/profile.lsf

# Check if bsub works
echo "echo test" | bsub -q medium -J test_job
```

**Jobs submitted but script doesn't wait**:

- Check `wait_for_jobs()` logic in `run_lsf_tests.py`
- Verify job IDs are being captured correctly
- Check bjobs output format hasn't changed

**Jobs timeout after 3 hours**:

- Increase `MAX_WAIT_SECONDS` in `run_lsf_tests.py`
- Use faster LSF queue
- Check if jobs are actually stuck (use `bjobs -l <job_id>`)

### Git Issues

**"Cannot specify both --local-branch and --remote-branch"**:

- Use only one branch option
- For remote testing, use `--remote-branch`
- For local testing, omit both flags

**"You have uncommitted changes"**:

```bash
# Stash changes
git stash

# Or commit them
git commit -am "WIP"

# Then run script
bash ci_cron_job.sh --local-branch main
```

**Remote branch not found**:

```bash
# Check remote exists
git remote -v

# List remote branches
git ls-remote --heads origin

# Fetch latest
git fetch origin

# Use correct format: remote/branch
bash ci_cron_job.sh --remote-branch origin/main
```

### Email Issues

**Email not received**:

```bash
# Test sendmail
echo "Test" | /usr/sbin/sendmail -t <<EOF
To: your.email@amd.com
Subject: Test
EOF

# Check mail logs
tail -f /var/log/mail.log

# Verify sendmail path
which sendmail

# Use custom sendmail
bash ci_cron_job.sh --emails "user@amd.com" --sendmail "/usr/bin/sendmail"
```

**Email sent but report empty**:

- Check if report file exists
- Verify report parsing in `ci_cron_job.sh` cleanup()
- Check for "Report saved to:" in `cron_job.log`

### Report Issues

**Script reports success but tests failed**:

- Check HTML report for individual test failures
- Look for DI_FAIL errors in report
- Examine LSF `.out` files directly

**Report shows LSF termination messages instead of test output**:

- This was fixed by capturing logs before killing
- Ensure you're using latest version of `run_lsf_tests.py`
- Verify `handle_timeout_by_ids()` captures logs before `bkill`

**Report missing log excerpts**:

- Check if log files exist
- Verify `get_incomplete_job_logs()` is called
- Check file permissions on log files

### Environment Issues

**"Virtual environment not found"**:

```bash
# Create manually
cd /path/to/aie4_models
/tool/pandora64/bin/python3.10 -m venv env

# Or let script create it
bash ci/run_lsf_tests.sh -k "test_binary"
```

**"settings.sh not found"**:

- Ensure you're in the correct repository
- Check file exists: `ls -la settings.sh`

**Pip install fails**:

```bash
# Update pip
env/bin/pip install --upgrade pip

# Install dependencies manually
env/bin/pip install -r requirements.txt
```

---

## Development

### Testing CI Scripts Locally

```bash
# Test environment setup only
bash run_lsf_tests.sh --help

# Test with dry-run (collects tests but doesn't submit)
python run_lsf_tests.py -k "test_binary[add_8-0]" --dry-run

# Test single job submission
bash run_lsf_tests.sh -k "test_binary[add_8-0]"

# Test timeout handling (set MAX_WAIT_SECONDS=60 for quick test)
# Edit run_lsf_tests.py temporarily, then:
bash run_lsf_tests.sh -k "test_conv"

# Test email (without actual testing)
bash ci_cron_job.sh --emails "your.email@amd.com" -k "test_binary[add_8-0]"
```

### Debugging Tips

**Enable verbose logging**:

```python
# In run_lsf_tests.py
import logging
logging.basicConfig(level=logging.DEBUG)
```

**Check intermediate files**:

```bash
# LSF output files
ls -lh <output-dir>/*.out
ls -lh <output-dir>/*.err

# Cron job log
tail -f <output-dir>/cron_job.log

# HTML report
cat <output-dir>/*_report.html
```

**Monitor LSF jobs**:

```bash
# Watch jobs
watch -n 5 bjobs

# Check specific job
bjobs -l <job_id>

# View job output while running
bpeek <job_id>

# Check job history
bhist -l <job_id>
```

### Contributing

When modifying CI scripts:

1. **Test locally first**:

   ```bash
   bash run_lsf_tests.sh -k "test_binary[add_8-0]"
   ```

2. **Check for errors**:

   ```bash
   shellcheck ci_cron_job.sh run_lsf_tests.sh
   pylint run_lsf_tests.py
   ```

3. **Update documentation**:

   - Update this README if changing behavior
   - Update comments in code
   - Update examples if adding options

4. **Test email notifications**:

   ```bash
   bash ci_cron_job.sh --emails "your.email@amd.com" -k "test_binary[add_8-0]"
   ```

5. **Test git branch switching**:

   ```bash
   # Test local branch
   bash ci_cron_job.sh --local-branch main -k "test_binary[add_8-0]"

   # Test remote branch
   bash ci_cron_job.sh --remote-branch origin/main -k "test_binary[add_8-0]"
   ```

---

## See Also

- **[README_pytests.md](../README_pytests.md)**: Basic pytest usage and interactive LSF submission
- **[buildtest/pytest_lsf.py](../buildtest/pytest_lsf.py)**: Interactive CLI for manual test submission
- **pytest documentation**: <https://docs.pytest.org/>

---

## Summary

| Use Case                   | Command                                                                                                                      |
|----------------------------|------------------------------------------------------------------------------------------------------------------------------|
| Quick manual test          | `bash run_lsf_tests.sh -k "test_conv"`                                                                                       |
| Full regression with email | `bash ci_cron_job.sh --remote-branch origin/main --emails "team@amd.com"`                                                    |
| Test local changes         | `bash ci_cron_job.sh -k "test_binary"`                                                                                       |
| Test remote branch         | `bash ci_cron_job.sh --remote-branch origin/develop`                                                                         |
| Hardware tests (simple)    | `bash hw_tests.sh` (at repository root)                                                                                      |
| Hardware tests (LSF)       | `bash ci_cron_job.sh --target cert --hwtest -k "(test_conv or test_gemm) and ([0] or -0])"`                                  |
| Custom output directory    | `bash ci_cron_job.sh --target sim --buildtest-output-root "/tmp/test_{{uuid}}" -k "test_conv"`                               |
| Scheduled cron job         | `0 2 * * * cd /path/to/ci && bash ci_cron_job.sh --remote-branch origin/main --emails "team@amd.com"`                        |
| GitHub Actions             | See [Integration](#github-actions) section                                                                                   |

For basic pytest usage and interactive testing, see [README_pytests.md](../README_pytests.md).
