1 Introduction 1
1.1 Research Background and Significance 1
1.1.1 Development Trends of Neural Network 1
1.1.2 Requirements of NN Processor 2
1.1.3 Energy-Efficient NN Processors 4
1.2 Summary of the Research Work 6
1.2.1 Overall Framework of the Research Work 6
1.2.2 Main Contributions of This Book 7
1.3 Overall Structure of This Book 8
References 9
2 Basics and Research Status of Neural Network Processors 13
2.1 Basics of Neural Network Algorithms 13
2.2 Basics of Neural Network Processors 16
2.3 Research Status of Digital-Circuits-Based NN Processors 18
2.3.1 Data Reuse 18
2.3.2 Low-Bit Quantization 20
2.3.3 NN Model Compression and Sparsity 21
2.3.4 Summary of Digital-Circuits-Based NN Processors 23
2.4 Research Status of CIM NN Processors 23
2.4.1 CIM Principle 24
2.4.2 CIM Devices 25
2.4.3 CIM Circuits 26
2.4.4 CIM Macro 27
2.4.5 Summary of CIM NN Processors 28
2.5 Summary of This Chapter 28
References 29
3 Energy-Efficient NN Processor by Optimizing Data Reuse for Specific Convolutional Kernels 33
3.1 Introduction 33
3.2 Previous Data Reuse Methods and the Constraints 33
3.3 The KOP3 Processor Optimized for Specific Convolutional Kernels 35
3.4 Processing Array Optimized for Specific Convolutional Kernels 36
3.5 Local Memory Cyclic Access Architecture and Scheduling Strategy 39
3.6 Module-Level Parallel Instruction Set and the Control Circuits 40
3.7 Experimental Results 41
3.8 Conclusion 44
References 45
4 Optimized Neural Network Processor Based on Frequency-Domain Compression Algorithm 47
4.1 Introduction 47
4.2 The Limitations of Irregular Sparse Optimization and CirCNN Frequency-Domain Compression Algorithm 47
4.3 Frequency-Domain NN Processor STICKER-T 50
4.4 Global-Parallel Bit-Serial FFT Circuits 52
4.5 Frequency-Domain 2D Data-Reuse MAC Array 55
4.6 Small-Area Low-Power Block-Wise TRAM 59
4.7 Chip Measurement Results and Comparison 62
4.8 Summary of This Chapter 69
References 69
5 Digital Circuits and CIM Integrated NN Processor 71
5.1 Introduction 71
5.2 The Advantage of CIM Over Pure Digital Circuits 71
5.3 Design Challenges for System-Level CIM Chips 74
5.4 Sparse CIM Processor STICKER-IM 78
5.5 Structural Block-Wise Weight Sparsity and Dynamic Activation Sparsity 79
5.6 Flexible Mapping and Scheduling and Intra/Inter-Macro Data Reuse 81
5.7 Energy-Efficient CIM Macro with Dynamic ADC Power-Off 85
5.8 Chip Measurement Results and Comparison 88
5.9 Summary of This Chapter 92
References 93
6 A “Digital CIM” Processor Supporting Large-Scale NN Models 95
6.1 Introduction 95
6.2 The Challenges of System-Level CIM Chips to Support Large-Scale NN Models 95
6.3 “Digital CIM” NN Processor STICKER-IM2 97
6.4 Set-Associate Block-Wise Sparse Zero-Skipping Circuits 98
6.5 Ping-Pong CIM and Weight Update Architecture 100
6.6 Ping-Pong CIM Macro with Dynamic ADC Precision 103
6.7 Chip Measurement Results and Comparison 104
6.8 Summary of This Chapter 112
References 112
7 Summary and Prospect 115
7.1 Summary of This Book 115
7.2 Prospect of This Book 117
內容試閱:
s
The
g
p
l
k
)
s
e
e
d
.ment of modern arti.cial
e
)
d
w
s
t
d with
l
N
s
e
n
d
n
l medical e
d
y
r e
h
n
y
d
a
s
y
e
a
s challenge
o
e
e
d
r
n
f
e
N
This
k
y
s
l
N
s
d
d digital
d Three
N
s
e
d
h
t energy y e
n
s
:
)
o
e
e
.cient
a
e
n
g
N
n
d
a
e
e
d on speci.c convolutional kernel size is proposed with improved energy ef.ciency.
(2) A frequency-domain NN processor is designed to address the signi.cant
.tional hardware overhead of irregular sparse optimization. It s ef.cient
T computation
d
n
D
a
e
o
e
p
a
d
e the energy )
e
l
e
n
f
e
M
p is
g
e
a
e
d
c
C )
o
t
M
s
r
e
N
a
e skipping
y
d
d
a
g
t
e
d
e
d in
a
d
M
This research topic has important theoretical signi.cance
d
l Several
s
d
n
s
k
e
y
d
n
e
t conferences/journals
n
s
h
h
s
h
a
s
g rapidly
h
l
n
h
s
e
l
s
l
e to
e
s
o
d
e
c
s
d
e
g
s
o e the energy y or
e
r
l
a Yongpan
u December
3