使用TSMC28工艺库实现,全部资料来源于公开网络,仅供学习交流,侵权请联系删除。
RTL代码生成
NVDLA仓库clone下来之后checkout到nv_small分支,执行make:
git checkout nv_small
make
按提示配置,生成的tree.make如下:
##=======================
## Project Name Setup, multiple projects supported
##=======================
PROJECTS := nv_small
##=======================
##Linux Environment Setup
##=======================
USE_DESIGNWARE := 1
DESIGNWARE_DIR := /iccad/synopsys/syn/U-2022.12-SP1/dw/sim_ver
CPP := /usr/bin/cpp
GCC := /usr/bin/gcc
CXX := /usr/bin/g++
PERL := /usr/bin/perl
JAVA := /usr/bin/java
SYSTEMC := /usr/local/systemc-2.3.4
PYTHON := /usr/bin/python3.6
VCS_HOME := /iccad/synopsys/vcs/T-2022.06
NOVAS_HOME := /iccad/synopsys/verdi/T-2022.06
VERDI_HOME := /iccad/synopsys/verdi/T-2022.06
VERILATOR := verilator
CLANG := /home/utils/llvm-4.0.1/bin/clang
运行tmake生成RTL
./tools/bin/tmake -build vmod
可能会提示perl的模块没有安装,按提示使用CPAN安装即可。
SRAM替换
在hw/outdir/nv_small/vmod/rams/synth/目录下可以找到所需的sram,可以看到全部只使用了一个clock port,即都是pseudo dual port sram。
nvdla的在sram外面套了两层壳,以nv_ram_rwsp_20x289.v为例,这是最外层的壳,包含一些mbsit相关的逻辑,其中例化nv_ram_rwsp_20x289_logic.v,这一层主要包含物理sram的拼接逻辑,例如物理sram只支持288的width,就会在nv_ram_rwsp_20x289_logic这一层例化20个reg,代替缺失的1bit。
而实际例化的物理sram在hw/outdir/nv_small/vmod/rams/model/目录下可以找到,可能是sram ip对于size有限制,虽然nvdla所有的ram wrapper都只使用了一个clk port(pesuado dual port sram),但却例化了一些dual port sram。
考虑到后续流程用不到这些mbist逻辑,并且跟官方使用的工艺库的size配置不同,因此直接替换掉nvdla的最外层sram wrapper,在上例中即直接替换掉nv_ram_rwsp_20x289,并且尽可能选用pesuado dual port sram来替换,如果size需求无法满足,再使用dual port sram。
TSMC28 MC2的库中没有pseudo dual port sram,因此用2prf代替,tsn28hpcp2prf_20120200_130a、tsn28hpcpuhddpsram_20120200_170a和tsn28hpcpdpsram_20120200_130a,第一个是2prf,后两个都是dpsram,它们分别支持的size(depth*width)如下:
| SEG Option | Mux (CM) | Word Depth (W) | Word Width (I/O) (N) |
|---|---|---|---|
| F | 2 | 16,24,32, 48,56…96, 112,120…160, 176,184…224, 240,248…288, 304,312…352, 368,376…416, 432,440…480, 496,504…512 | 2,3…144 |
| F | 4 | 32,48,64, 96,112…192, 224,240…320, 352,368…448, 480,496…576, 608,624…704, 736,752…832, 864,880…960, 992,1008…1024 | 2,3…72 |
| F | 8 | 64,96,128, 192,224…384, 448,480…640, 704,736…896, 960,992…1152, 1216,1248…1408, 1472,1504…1664, 1728,1760…1920, 1984,2016…2048 | 2,3…36 |
| S | 2 | 64, 80,88…192, 208,216…320, 336,344…448, 464,472…512 | 2,3…144 |
| S | 4 | 128, 160,176…384, 416,432…640, 672,688…896, 928,944…1024 | 2,3…72 |
| S | 8 | 256, 320,352…768, 832,864…1280, 1344,1376…1792, 1856,1888…2048 | 2,3…36 |
| Segment Option | Mux Option | Word Depth | Word Width (I/O) |
|---|---|---|---|
| SEG | CM | W | N |
| M | 4 | 32,48…4096 | 10,11…144 |
| SEG Option | Mux | Word Depth | Word Width (I/O) |
|---|---|---|---|
| F | 4 | 32,48…1024 | 4,5…72 |
| 8 | 64,96…2048 | 4,5…36 | |
| 16 | 128,192…4096 | 4,5…18 | |
| M | 4 | 32,48…2048 | 4,5…72 |
| 8 | 64,96…4096 | 4,5…36 | |
| 16 | 128,192…8192 | 4,5…18 |
按照hw/outdir/nv_small/vmod/rams/synth/中的sram size,挑选合适的配置填写MC2需要的config.txt,部分sram需要拼接,也有depth没有合适的情况需要冗余一些word,#后的注释有说明,所有sram都采用tsn28hpcp2prf_20120200_130a生成,其config.txt内容如下:
128x18m2f
128x128m2f # 128x128 * 2 = 128x256
128x64m2f
16x128m2f # 16x128 * 2 = 16x256
16x136m2f # 16x136 * 2 = 16x272
16x64m2f
256x3m2f
256x128m2f # 256x128 * 4 = 256x512
256x64m2f
256x7m2f
32x16m2f
32x128m2f # 32x128 * 2 = 32x256; 32x128 * 4 = 32x512; 32x128 * 6 = 32x768
32x136m2f # 32x136 * 2 = 32x272; 32x136 * 4 = 32x544
512x128m2f # 512x128 * 2 = 512x256;
512x64m2f
64x10m2f
64x128m2f # 64x128 * 8 = 64x1024
64x136m2f # 64x136 * 8 = 64x1088
64x116m2f
64x18m2f
128x11m2f
128x6m2f
160x16m2f
160x144m2f # 160x144 * 3 + 160x82 = 160x514
160x82m2f
160x65m2f
24x128m2f # 24x128 * 2 + 24x33 = 24*289 (use as 20x289)
248x144m2f # 248x144 * 3 + 248x82 = 248x514 (use as 245x514)
248x82m2f
256x11m2f
32x32m2f
64x144m2f # 64x144 * 3 + 64x82 = 64x514 (use as 61x514)
64x82m2f
64x64m2f # use as 61x64
64x65m2f # use as 61x65
80x14m2s
80x16m2s
80x128m2s # 80x128 * 2 = 80x256
80x144m2s # 80x144 * 3 + 80x82 = 80x514
80x82m2s
80x65m2s
16x65m2f # use as 8x65
256x8m2f
24x32m2f # use as 19x32
24x4m2f # use as 19x4
24x80m2f # use as 19x80
64x84m2f # 64x84 * 2 = 64x168, use as 60x168
64x21m2f # use as 60x21
80x15m2s
80x72m2s
80x9m2s
运行mc2生成sram,脚本如下:
#!/bin/bash
./tsn28hpcp2prf_130a.pl -NonBWEB -file config.txt 2>&1 | tee "cfg.log"
for cfg_file in *.cfg; do
base_name=$(basename "$cfg_file" .cfg)
mkdir -p "$base_name"
log_file="$base_name/${base_name}.log"
mc2-eu -eu -c tsn28hpcp2prf_20120200_130a.mco -cfg $cfg_file -ui textual -v -p tsmceva -d $base_name 2>&1 | tee "$log_file"
done
实际上有一些sram没有用到,可以用一下gen_sram_inst.sh脚本抓一下sram的例化情况:
#!/bin/bash
# 使用方法:./find_sram_inst.sh <目标目录>
# 生成sram_inst.f文件,包含所有被例化的SRAM模块
nvdla_sram_synth_dir="$1"
prj_dir="$2"
output_file="sram_inst.f"
# 提取所有SRAM模块名
modules=$(ls $nvdla_sram_synth_dir | grep -E '_[0-9]+[xX][0-9]+\.v$' | sed 's/\.v$//')
# 初始化输出文件
> "$output_file"
for module in $modules; do
# 在目标目录中搜索.v和.sv文件(排除指定类型)
# 使用正则表达式匹配两种例化格式
echo "Searching $module"
found=$(find "$prj_dir" -type f \( -name "*.v" -o -name "*.sv" \) \
! -name "*logic.v" ! -name "*.vcp" ! -name "*.log" ! -name "*.f" \
-exec grep -E -m1 "\b${module}\b(\s*#\s*\([^)]*\))?\s+\w+\s*\(" {} + 2>/dev/null)
if [ -n "$found" ]; then
echo "$found"
echo "$module" >> "$output_file"
fi
done
echo "SRAM instance search complete, saved to:$output_file"
使用示例:
# 输出 sram_inst.f
./gen_sram_inst.sh nvdla/hw/outdir/nv_small/vmod/rams/synth/ nvdla/hw/outdir/nv_small/
接下来生成sram_wrapper,包含sram ip的例化以及位拼接逻辑,脚本如下:
import os
import re
import math
import argparse
def parse_fcp_line(line):
line = line.strip()
if not line:
return None
parts = line.split('#', 1)
wrapper_name = parts[0].strip()
comment = parts[1].strip() if len(parts) > 1 else None
# Parse wrapper's depth and width
size_part = wrapper_name.split('_')[-1]
depth, width = map(int, size_part.split('x'))
# Parse comment to get base segments
segments = []
if comment:
segment_pattern = re.compile(r'\s*(\d+)x(\d+)(?:\s*\*\s*(\d+))?')
for part in comment.split('+'):
part = part.strip()
match = segment_pattern.match(part)
if not match:
raise ValueError(f"Invalid comment segment: {part}")
seg_depth = int(match.group(1))
seg_width = int(match.group(2))
seg_count = int(match.group(3)) if match.group(3) else 1
segments.append({
'depth': seg_depth,
'width': seg_width,
'count': seg_count
})
else:
segments.append({
'depth': depth,
'width': width,
'count': 1
})
return {
'wrapper_name': wrapper_name,
'depth': depth,
'width': width,
'segments': segments
}
def find_sram_module(base_dir, target_depth, target_width):
target_size = f"{target_depth}x{target_width}"
candidates = []
for dir_name in os.listdir(base_dir):
dir_path = os.path.join(base_dir, dir_name)
if os.path.isdir(dir_path):
matches = re.findall(r'(\d+)x(\d+)', dir_name)
for d, w in matches:
if f"{d}x{w}" == target_size:
candidates.append(dir_name)
break
if not candidates:
raise ValueError(f"No module found for {target_size}: {candidates}")
return None
elif len(candidates) > 1:
raise ValueError(f"Multiple modules found for {target_size}: {candidates}")
else:
return candidates[0]
def generate_wrapper(wrapper_info, sram_dir, output_dir):
wrapper_name = wrapper_info['wrapper_name']
depth = wrapper_info['depth']
width = wrapper_info['width']
segments = wrapper_info['segments']
address_width = math.ceil(math.log2(depth)) if depth > 0 else 0
instances = []
current_bit = 0
for seg in segments:
seg_depth = seg['depth']
seg_width = seg['width']
seg_count = seg['count']
module_name = find_sram_module(sram_dir, seg_depth, seg_width)
if not module_name:
raise ValueError(f"No module found for {seg_depth}x{seg_width}")
for _ in range(seg_count):
start_bit = current_bit
end_bit = current_bit + seg_width - 1
current_bit += seg_width
instances.append({
'module_name': module_name,
'start_bit': start_bit,
'end_bit': end_bit,
'width': seg_width,
'instance_name': f"sram_{len(instances)}",
'dout_wire': f"dout{len(instances)}"
})
total_width = sum(inst['width'] for inst in instances)
if total_width != width:
raise ValueError(f"Total width {total_width} != {width} for {wrapper_name}")
sorted_instances = sorted(instances, key=lambda x: -x['start_bit'])
verilog_code = []
verilog_code.append(f"module {wrapper_name} (")
verilog_code.append(f" input clk,")
verilog_code.append(f" input [{address_width-1}:0] ra,")
verilog_code.append(f" input re,")
verilog_code.append(f" output [{width-1}:0] dout,")
verilog_code.append(f" input [{address_width-1}:0] wa,")
verilog_code.append(f" input we,")
verilog_code.append(f" input [{width-1}:0] di,")
verilog_code.append(f" input pwrbus_ram_pd")
verilog_code.append(f");\n")
for inst in instances:
verilog_code.append(f"wire [{inst['width']-1}:0] {inst['dout_wire']};")
for inst in instances:
verilog_code.append(f"{inst['module_name']} {inst['instance_name']} (")
verilog_code.append(f" .AA(wa),")
verilog_code.append(f" .D(di[{inst['end_bit']}:{inst['start_bit']}]),")
verilog_code.append(f" .WEB(~we),")
verilog_code.append(f" .CLKW(clk),")
verilog_code.append(f" .AB(ra),")
verilog_code.append(f" .Q({inst['dout_wire']}),")
verilog_code.append(f" .REB(~re),")
verilog_code.append(f" .CLKR(clk)")
verilog_code.append(f");\n")
dout_parts = [inst['dout_wire'] for inst in sorted_instances]
verilog_code.append(f"assign dout = {{ {', '.join(dout_parts)} }};")
verilog_code.append("endmodule")
output_file = os.path.join(output_dir, f"{wrapper_name}.v")
with open(output_file, 'w') as f:
f.write('\n'.join(verilog_code))
def main():
parser = argparse.ArgumentParser(description='Generate SRAM wrappers.')
parser.add_argument('--input', required=True, help='Path to sram_inst.fcp file')
parser.add_argument('--sram_dir', required=True, help='Directory containing SRAM modules')
parser.add_argument('--output_dir', required=True, help='Output directory for wrappers')
args = parser.parse_args()
with open(args.input, 'r') as f:
lines = f.readlines()
os.makedirs(args.output_dir, exist_ok=True)
for line in lines:
line = line.strip()
if not line:
continue
try:
wrapper_info = parse_fcp_line(line)
if wrapper_info:
generate_wrapper(wrapper_info, args.sram_dir, args.output_dir)
except Exception as e:
print(f"Error processing line '{line}': {e}")
if __name__ == '__main__':
main()
使用示例:
python3 gen_sram_wrapper.py --input sram_inst.fcp --sram_dir /iccad/lib/sram/nvdla_sram/TSMCHOME/sram/Compiler/tsn28hpcp2prf_20120200_130a --output_dir .
--input: 需要输入一个sram_inst.fcp文件,由gen_sram_inst.sh脚本输出sram_inst.f,需要手动补充一下拼接信息,示例如下:
nv_ram_rws_128x18
nv_ram_rws_16x256 # 16x128 * 2
nv_ram_rws_16x272 # 16x136 * 2
nv_ram_rws_16x64
nv_ram_rws_256x3
nv_ram_rws_256x64
nv_ram_rws_256x7
nv_ram_rws_32x16
nv_ram_rws_64x10
nv_ram_rwsp_128x11
nv_ram_rwsp_128x6
nv_ram_rwsp_160x16
nv_ram_rwsp_160x65
nv_ram_rwsp_20x289 # 24x128 * 2 + 24x33
nv_ram_rwsp_245x514 # 248x144 * 3 + 248x82
--sram_dir:指定sram ip的位置,其目录结构如下:
❯ tree -d /iccad/lib/sram/nvdla_sram/TSMCHOME/sram/Compiler/tsn28hpcp2prf_20120200_130a
/iccad/lib/sram/nvdla_sram/TSMCHOME/sram/Compiler/tsn28hpcp2prf_20120200_130a
├── ts6n28hpcphvta128x11m2fbso
│ ├── DATASHEET
│ ├── DFT
│ │ ├── ATPG
│ │ └── MBIST
│ ├── GDSII
│ ├── LEF
│ ├── LOG
│ ├── NLDM
│ ├── SPICE
│ └── VERILOG
├── ts6n28hpcphvta128x128m2fbso
│ ├── DATASHEET
│ ├── DFT
│ │ ├── ATPG
│ │ └── MBIST
│ ├── GDSII
│ ├── LEF
│ ├── LOG
│ ├── NLDM
│ ├── SPICE
│ └── VERILOG