NVDLA综合 – Zoneofee

使用TSMC28工艺库实现，全部资料来源于公开网络，仅供学习交流，侵权请联系删除。

RTL代码生成

NVDLA仓库clone下来之后checkout到nv_small分支，执行make：

git checkout nv_small
make

按提示配置，生成的tree.make如下：

##======================= 										  
## Project Name Setup, multiple projects supported			  	  
##======================= 										  
PROJECTS := nv_small
  																  
##======================= 										  
##Linux Environment Setup 										  
##======================= 										  
  																  
USE_DESIGNWARE  := 1
DESIGNWARE_DIR  := /iccad/synopsys/syn/U-2022.12-SP1/dw/sim_ver
CPP  := /usr/bin/cpp
GCC  := /usr/bin/gcc
CXX  := /usr/bin/g++
PERL := /usr/bin/perl
JAVA := /usr/bin/java
SYSTEMC := /usr/local/systemc-2.3.4
PYTHON := /usr/bin/python3.6
VCS_HOME := /iccad/synopsys/vcs/T-2022.06
NOVAS_HOME := /iccad/synopsys/verdi/T-2022.06
VERDI_HOME := /iccad/synopsys/verdi/T-2022.06
VERILATOR := verilator
CLANG := /home/utils/llvm-4.0.1/bin/clang

运行tmake生成RTL

./tools/bin/tmake -build vmod

可能会提示perl的模块没有安装，按提示使用CPAN安装即可。

SRAM替换

在hw/outdir/nv_small/vmod/rams/synth/目录下可以找到所需的sram，可以看到全部只使用了一个clock port，即都是pseudo dual port sram。

nvdla的在sram外面套了两层壳，以nv_ram_rwsp_20x289.v为例，这是最外层的壳，包含一些mbsit相关的逻辑，其中例化nv_ram_rwsp_20x289_logic.v，这一层主要包含物理sram的拼接逻辑，例如物理sram只支持288的width，就会在nv_ram_rwsp_20x289_logic这一层例化20个reg，代替缺失的1bit。

而实际例化的物理sram在hw/outdir/nv_small/vmod/rams/model/目录下可以找到，可能是sram ip对于size有限制，虽然nvdla所有的ram wrapper都只使用了一个clk port（pesuado dual port sram），但却例化了一些dual port sram。

考虑到后续流程用不到这些mbist逻辑，并且跟官方使用的工艺库的size配置不同，因此直接替换掉nvdla的最外层sram wrapper，在上例中即直接替换掉nv_ram_rwsp_20x289，并且尽可能选用pesuado dual port sram来替换，如果size需求无法满足，再使用dual port sram。

TSMC28 MC2的库中没有pseudo dual port sram，因此用2prf代替，tsn28hpcp2prf_20120200_130a、tsn28hpcpuhddpsram_20120200_170a和tsn28hpcpdpsram_20120200_130a，第一个是2prf，后两个都是dpsram，它们分别支持的size(depth*width)如下：

SEG Option	Mux (CM)	Word Depth (W)	Word Width (I/O) (N)
F	2	16,24,32, 48,56…96, 112,120…160, 176,184…224, 240,248…288, 304,312…352, 368,376…416, 432,440…480, 496,504…512	2,3…144
F	4	32,48,64, 96,112…192, 224,240…320, 352,368…448, 480,496…576, 608,624…704, 736,752…832, 864,880…960, 992,1008…1024	2,3…72
F	8	64,96,128, 192,224…384, 448,480…640, 704,736…896, 960,992…1152, 1216,1248…1408, 1472,1504…1664, 1728,1760…1920, 1984,2016…2048	2,3…36
S	2	64, 80,88…192, 208,216…320, 336,344…448, 464,472…512	2,3…144
S	4	128, 160,176…384, 416,432…640, 672,688…896, 928,944…1024	2,3…72
S	8	256, 320,352…768, 832,864…1280, 1344,1376…1792, 1856,1888…2048	2,3…36

tsn28hpcp2prf_20120200_130a size配置表

Segment Option	Mux Option	Word Depth	Word Width (I/O)
SEG	CM	W	N
M	4	32,48…4096	10,11…144

tsn28hpcpuhddpsram_20120200_170a size配置表

SEG Option	Mux	Word Depth	Word Width (I/O)
F	4	32,48…1024	4,5…72
	8	64,96…2048	4,5…36
	16	128,192…4096	4,5…18
M	4	32,48…2048	4,5…72
	8	64,96…4096	4,5…36
	16	128,192…8192	4,5…18

tsn28hpcpdpsram_20120200_130a size配置表

按照hw/outdir/nv_small/vmod/rams/synth/中的sram size，挑选合适的配置填写MC2需要的config.txt，部分sram需要拼接，也有depth没有合适的情况需要冗余一些word，#后的注释有说明，所有sram都采用tsn28hpcp2prf_20120200_130a生成，其config.txt内容如下：

128x18m2f
128x128m2f # 128x128 * 2 = 128x256
128x64m2f
16x128m2f  # 16x128 * 2 = 16x256
16x136m2f  # 16x136 * 2 = 16x272
16x64m2f
256x3m2f
256x128m2f # 256x128 * 4 = 256x512
256x64m2f
256x7m2f
32x16m2f
32x128m2f  # 32x128 * 2 = 32x256; 32x128 * 4 = 32x512; 32x128 * 6 = 32x768
32x136m2f  # 32x136 * 2 = 32x272; 32x136 * 4 = 32x544
512x128m2f # 512x128 * 2 = 512x256;
512x64m2f
64x10m2f
64x128m2f  # 64x128 * 8 = 64x1024
64x136m2f  # 64x136 * 8 = 64x1088
64x116m2f
64x18m2f
128x11m2f
128x6m2f
160x16m2f
160x144m2f # 160x144 * 3 + 160x82 = 160x514
160x82m2f
160x65m2f
24x128m2f  # 24x128 * 2 + 24x33 = 24*289 (use as 20x289)
248x144m2f # 248x144 * 3 + 248x82 = 248x514 (use as 245x514)
248x82m2f
256x11m2f
32x32m2f
64x144m2f  # 64x144 * 3 + 64x82 = 64x514 (use as 61x514)
64x82m2f
64x64m2f   # use as 61x64
64x65m2f   # use as 61x65
80x14m2s
80x16m2s
80x128m2s # 80x128 * 2 = 80x256
80x144m2s # 80x144 * 3 + 80x82 = 80x514
80x82m2s
80x65m2s
16x65m2f  # use as 8x65
256x8m2f
24x32m2f  # use as 19x32
24x4m2f   # use as 19x4
24x80m2f  # use as 19x80
64x84m2f  # 64x84 * 2 = 64x168, use as 60x168
64x21m2f  # use as 60x21
80x15m2s
80x72m2s
80x9m2s

运行mc2生成sram，脚本如下：

#!/bin/bash
./tsn28hpcp2prf_130a.pl -NonBWEB -file config.txt 2>&1 | tee "cfg.log"
for cfg_file in *.cfg; do
    base_name=$(basename "$cfg_file" .cfg)
    mkdir -p "$base_name"
    log_file="$base_name/${base_name}.log"
    mc2-eu -eu -c tsn28hpcp2prf_20120200_130a.mco -cfg $cfg_file -ui textual -v -p tsmceva -d $base_name 2>&1 | tee "$log_file"
done

实际上有一些sram没有用到，可以用一下gen_sram_inst.sh脚本抓一下sram的例化情况：

#!/bin/bash

# 使用方法：./find_sram_inst.sh <目标目录>
# 生成sram_inst.f文件，包含所有被例化的SRAM模块

nvdla_sram_synth_dir="$1"
prj_dir="$2"
output_file="sram_inst.f"

# 提取所有SRAM模块名
modules=$(ls $nvdla_sram_synth_dir | grep -E '_[0-9]+[xX][0-9]+\.v$' | sed 's/\.v$//')

# 初始化输出文件
> "$output_file"

for module in $modules; do
    # 在目标目录中搜索.v和.sv文件（排除指定类型）
    # 使用正则表达式匹配两种例化格式
    echo "Searching $module"
    found=$(find "$prj_dir" -type f \( -name "*.v" -o -name "*.sv" \) \
        ! -name "*logic.v" ! -name "*.vcp" ! -name "*.log" ! -name "*.f" \
        -exec grep -E -m1 "\b${module}\b(\s*#\s*\([^)]*\))?\s+\w+\s*\(" {} + 2>/dev/null)

    if [ -n "$found" ]; then
        echo "$found"
        echo "$module" >> "$output_file"
    fi
done

echo "SRAM instance search complete, saved to：$output_file"

使用示例：

# 输出 sram_inst.f
./gen_sram_inst.sh nvdla/hw/outdir/nv_small/vmod/rams/synth/ nvdla/hw/outdir/nv_small/

接下来生成sram_wrapper，包含sram ip的例化以及位拼接逻辑，脚本如下：

import os
import re
import math
import argparse

def parse_fcp_line(line):
    line = line.strip()
    if not line:
        return None
    parts = line.split('#', 1)
    wrapper_name = parts[0].strip()
    comment = parts[1].strip() if len(parts) > 1 else None

    # Parse wrapper's depth and width
    size_part = wrapper_name.split('_')[-1]
    depth, width = map(int, size_part.split('x'))

    # Parse comment to get base segments
    segments = []
    if comment:
        segment_pattern = re.compile(r'\s*(\d+)x(\d+)(?:\s*\*\s*(\d+))?')
        for part in comment.split('+'):
            part = part.strip()
            match = segment_pattern.match(part)
            if not match:
                raise ValueError(f"Invalid comment segment: {part}")
            seg_depth = int(match.group(1))
            seg_width = int(match.group(2))
            seg_count = int(match.group(3)) if match.group(3) else 1
            segments.append({
                'depth': seg_depth,
                'width': seg_width,
                'count': seg_count
            })
    else:
        segments.append({
            'depth': depth,
            'width': width,
            'count': 1
        })
    return {
        'wrapper_name': wrapper_name,
        'depth': depth,
        'width': width,
        'segments': segments
    }

def find_sram_module(base_dir, target_depth, target_width):
    target_size = f"{target_depth}x{target_width}"
    candidates = []
    for dir_name in os.listdir(base_dir):
        dir_path = os.path.join(base_dir, dir_name)
        if os.path.isdir(dir_path):
            matches = re.findall(r'(\d+)x(\d+)', dir_name)
            for d, w in matches:
                if f"{d}x{w}" == target_size:
                    candidates.append(dir_name)
                    break
    if not candidates:
        raise ValueError(f"No module found for {target_size}: {candidates}")
        return None
    elif len(candidates) > 1:
        raise ValueError(f"Multiple modules found for {target_size}: {candidates}")
    else:
        return candidates[0]

def generate_wrapper(wrapper_info, sram_dir, output_dir):
    wrapper_name = wrapper_info['wrapper_name']
    depth = wrapper_info['depth']
    width = wrapper_info['width']
    segments = wrapper_info['segments']

    address_width = math.ceil(math.log2(depth)) if depth > 0 else 0

    instances = []
    current_bit = 0
    for seg in segments:
        seg_depth = seg['depth']
        seg_width = seg['width']
        seg_count = seg['count']
        module_name = find_sram_module(sram_dir, seg_depth, seg_width)
        if not module_name:
            raise ValueError(f"No module found for {seg_depth}x{seg_width}")

        for _ in range(seg_count):
            start_bit = current_bit
            end_bit = current_bit + seg_width - 1
            current_bit += seg_width
            instances.append({
                'module_name': module_name,
                'start_bit': start_bit,
                'end_bit': end_bit,
                'width': seg_width,
                'instance_name': f"sram_{len(instances)}",
                'dout_wire': f"dout{len(instances)}"
            })

    total_width = sum(inst['width'] for inst in instances)
    if total_width != width:
        raise ValueError(f"Total width {total_width} != {width} for {wrapper_name}")

    sorted_instances = sorted(instances, key=lambda x: -x['start_bit'])

    verilog_code = []
    verilog_code.append(f"module {wrapper_name} (")
    verilog_code.append(f"   input clk,")
    verilog_code.append(f"   input [{address_width-1}:0] ra,")
    verilog_code.append(f"   input re,")
    verilog_code.append(f"   output [{width-1}:0] dout,")
    verilog_code.append(f"   input [{address_width-1}:0] wa,")
    verilog_code.append(f"   input we,")
    verilog_code.append(f"   input [{width-1}:0] di,")
    verilog_code.append(f"   input pwrbus_ram_pd")
    verilog_code.append(f");\n")

    for inst in instances:
        verilog_code.append(f"wire [{inst['width']-1}:0] {inst['dout_wire']};")

    for inst in instances:
        verilog_code.append(f"{inst['module_name']} {inst['instance_name']} (")
        verilog_code.append(f"   .AA(wa),")
        verilog_code.append(f"   .D(di[{inst['end_bit']}:{inst['start_bit']}]),")
        verilog_code.append(f"   .WEB(~we),")
        verilog_code.append(f"   .CLKW(clk),")
        verilog_code.append(f"   .AB(ra),")
        verilog_code.append(f"   .Q({inst['dout_wire']}),")
        verilog_code.append(f"   .REB(~re),")
        verilog_code.append(f"   .CLKR(clk)")
        verilog_code.append(f");\n")

    dout_parts = [inst['dout_wire'] for inst in sorted_instances]
    verilog_code.append(f"assign dout = {{ {', '.join(dout_parts)} }};")
    verilog_code.append("endmodule")

    output_file = os.path.join(output_dir, f"{wrapper_name}.v")
    with open(output_file, 'w') as f:
        f.write('\n'.join(verilog_code))

def main():
    parser = argparse.ArgumentParser(description='Generate SRAM wrappers.')
    parser.add_argument('--input', required=True, help='Path to sram_inst.fcp file')
    parser.add_argument('--sram_dir', required=True, help='Directory containing SRAM modules')
    parser.add_argument('--output_dir', required=True, help='Output directory for wrappers')
    args = parser.parse_args()

    with open(args.input, 'r') as f:
        lines = f.readlines()

    os.makedirs(args.output_dir, exist_ok=True)

    for line in lines:
        line = line.strip()
        if not line:
            continue
        try:
            wrapper_info = parse_fcp_line(line)
            if wrapper_info:
                generate_wrapper(wrapper_info, args.sram_dir, args.output_dir)
        except Exception as e:
            print(f"Error processing line '{line}': {e}")

if __name__ == '__main__':
    main()

使用示例:

python3 gen_sram_wrapper.py --input sram_inst.fcp --sram_dir /iccad/lib/sram/nvdla_sram/TSMCHOME/sram/Compiler/tsn28hpcp2prf_20120200_130a --output_dir .

--input: 需要输入一个sram_inst.fcp文件，由gen_sram_inst.sh脚本输出sram_inst.f，需要手动补充一下拼接信息，示例如下：

nv_ram_rws_128x18
nv_ram_rws_16x256 # 16x128 * 2
nv_ram_rws_16x272 # 16x136 * 2
nv_ram_rws_16x64
nv_ram_rws_256x3
nv_ram_rws_256x64
nv_ram_rws_256x7
nv_ram_rws_32x16
nv_ram_rws_64x10
nv_ram_rwsp_128x11
nv_ram_rwsp_128x6
nv_ram_rwsp_160x16
nv_ram_rwsp_160x65
nv_ram_rwsp_20x289 # 24x128 * 2 + 24x33
nv_ram_rwsp_245x514 # 248x144 * 3 + 248x82

--sram_dir:指定sram ip的位置，其目录结构如下:

❯ tree -d /iccad/lib/sram/nvdla_sram/TSMCHOME/sram/Compiler/tsn28hpcp2prf_20120200_130a
/iccad/lib/sram/nvdla_sram/TSMCHOME/sram/Compiler/tsn28hpcp2prf_20120200_130a
├── ts6n28hpcphvta128x11m2fbso
│   ├── DATASHEET
│   ├── DFT
│   │   ├── ATPG
│   │   └── MBIST
│   ├── GDSII
│   ├── LEF
│   ├── LOG
│   ├── NLDM
│   ├── SPICE
│   └── VERILOG
├── ts6n28hpcphvta128x128m2fbso
│   ├── DATASHEET
│   ├── DFT
│   │   ├── ATPG
│   │   └── MBIST
│   ├── GDSII
│   ├── LEF
│   ├── LOG
│   ├── NLDM
│   ├── SPICE
│   └── VERILOG

RTL代码生成

SRAM替换

发送评论 编辑评论

发送评论编辑评论