# An Innovative Radiation 

Hardened By Design Flip-Flop
by
Bradley Matush

# A Thesis Presented in Partial Fulfillment of the Requirements for the Degree <br> Master of Science 

Approved November 2010 by the Graduate Supervisory Committee:

Lawrence T. Clark, Chair David Allee
Bertan Bakkaloglu

ARIZONA STATE UNIVERSITY

December 2010


#### Abstract

Radiation hardening by design (RHBD) has become a necessary practice when creating circuits to operate within radiated environments. While employing RHBD techniques has tradeoffs between size, speed and power, novel designs help to minimize these penalties. Space radiation is the primary source of radiation errors in circuits and two types of single event effects, single event upsets (SEU), and single event transients (SET) are increasingly becoming a concern. While numerous methods currently exist to nullify SEUs and SETs, special consideration to the techniques of temporal hardening and interlocking are explored in this thesis. Temporal hardening mitigates both SETs and SEUs by spacing critical nodes through the use of delay elements, thus allowing collected charge to be removed. Interlocking creates redundant nodes to rectify charge collection on one single node.

This thesis presents an innovative, temporally hardened D flip-flop (TFF). The TFF physical design is laid out in the 130 nm TSMC process in the form of an interleaved multi-bit cell and the circuitry necessary for the flip-flop to be hardened against SETs and SEUs is analyzed with simulations verifying these claims. Comparisons are made to an unhardened D flip-flop through speed, size, and power consumption depicting how the RHBD technique used increases all three over an unhardened flip-flop. Finally, the blocks from both the hardened and the unhardened flip-flops being placed in Synthesis and auto-place and route (APR) design flows are compared through size and speed to show the effects of using the high density multi-bit layout.


Finally, the TFF presented in this thesis is compared to two other flip-flops, the majority voter temporal/DICE flip-flop (MTDFF) and the C-element temporal/DICE flip-flop (CTDFF). These circuits are built on the same 130 nm TSMC process as the TFF and then analyzed by the same methods through speed, size, and power consumption and compared to the TFF and unhardened flip-flops. Simulations are completed on the MTDFF and CTDFF to show their strengths against D node SETs and SEUs as well as their weakness against CLK node SETs. Results show that the TFF is faster and harder than both the MTDFF and CTDFF.

## ACKNOWLEDGEMENT

First, I'd like to thank my parents for their unwavering support and guidance throughout my entire academic career. To Dr. Clark, thank you for your mentorship in both research and the classroom. Your aptitude at your profession continues to amaze me. Thanks to Tom Mozdzen for his contributions with the APR and Synthesis simulations in my research. Finally, I'd like to give a special thanks to SpaceMicro for funding the research.

## TABLE OF CONTENTS

Page
LIST OF TABLES ..... vii
LIST OF FIGURES ..... viii
CHAPTER ..... Page
I. Introduction .....  1
A. Sequential Circuits ..... 1

1) Latches ..... 1
2) D Flip-Flops ..... 6
3) Synchronous Timing ..... 7
B. Space Radiation ..... 11
C. Radiation Effects ..... 14
4) Single Event Upsets ..... 15
5) Single Event Transients ..... 17
D. Hard Errors ..... 20
E. Conclusions ..... 21
II. Radiation Hardening Techniques ..... 22
A. Introduction ..... 22
B. Process Hardening Techniques ..... 22
6) RC Hardening ..... 22
7) Magnetic Tunnel Junction Hardening ..... 24
C. RHBD Techniques ..... 26
8) Redundant Latches ..... 26
CHAPTER
9) Temporal Hardening ..... 29
10) Dual Interlock Storage Cells ..... 35
11) Charge Sharing, Schmidtt Trigger, and Other Methods ..... 37
D. Conclusions ..... 41
III. Verilog-A Model Simulating SET and SEU ..... 43
A. Introduction ..... 43
B. Verilog-A Code ..... 44
C. Implementation And Simulation ..... 46
D. Conclusions ..... 49
IV. Temporal Flip-Flop ..... 50
A. Introduction ..... 50
B. Circuit Design ..... 50
C. Simulation ..... 57
D. Physical Design ..... 61
E. Power Consumption Analysis ..... 63
F. Size Analysis ..... 65
G. Synthesis and APR Implementation ..... 66
H. Conclusions ..... 71
V. Radiation Hardened Flip-Flop Comparison ..... 72
A. Introduction ..... 72
B. Delay Element Comparison ..... 72
C. Timing Comparison ..... 75
12) Majority Voter TDFF ..... 76
13) C-Element TDFF ..... 78
14) Summary ..... 81
D. Size Comparison ..... 82
15) Majority Voter TDFF ..... 82
16) C-element TDFF. ..... 83
17) Summary ..... 84
E. Power Comparison ..... 85
F. Hardness Comparison ..... 87
G. Conclusions ..... 92
VI. Conclusion ..... 93
REFERENCES ..... 95

## LIST OF TABLES

Table Page
I. Majority voter truth table. ..... 27
II. State table for the Muller C-Elements. ..... 31
III. Specific data for APR results comparing hardened and unhardened blocks. . 71
IV. Energy consumption comparison for two delay element designs. ..... 73
V. Timing comparison of four D Flip-Flops ..... 82
VI. Size comparison of RHBD flip-flops, per bit. ..... 84
VII. MTDFF power dissipation analysis with comparison to an unhardened D
flip-flop. ..... 85
VIII. CTDFF Power dissipation analysis with comparison to an unhardened D
flip-flop. ..... 85

## LIST OF FIGURES

Figure Page
I-1. Timing diagrams for combinational logic (a), flip-flops (b), and
latches (c). (After [1]) ..... 2
I-2. Basic pass gate and latch design schematics. (After [1]) .....  3
I-3: Noise affecting a bi-stable memory element and being rejected by a protected storage node. .....  5
I-4. Basic D flip-flop schematics using transmission gates (a) and bi-stable memory (b). (After [1]) ..... 7
I-5. Setup and hold time curves for a latch. (After [1]) .....  8
I-6. Ionized particle flux by atomic number relative to Si . (After [4]) ..... 12
I-7. Van Allen belt equatorial trapped particle flux vs. altitude. (After [4]) ..... 13
I-8. Charge collection on a node due to an incident ionized particle.
(After [5]) ..... 15
I-9. Simulated SEU on the storage node of a basic transparent high latch. ..... 16
I-10. SET pulse on a node due to an ionized particle strike. (After [6]) ..... 18
I-11. Simulated SET on the input node, D, of a basic transparent high latch. ..... 20
II-1. Schematic showing a storage node with resistances separating driving devices and capacitances attached to the transistor gates.
(After [13]) ..... 23
II-2. Schematic for a dual-MJT latch cell. (After [16]) ..... 25
II-3. Simulated SET on dual-MJT bi-stable memory cell. (After [16]) ..... 25
Figure Page
II-4. Majority voter schematic with inputs A, B and C, along with output pin Y. ..... 26
II-5. Block Diagram showing TMR setup. ..... 28
II-6. Temporal latch depicting temporal redundancy with majority voters.
Delay elements are marked by $\delta$. (After [19]) ..... 29
II-7. Simulated SET on D mitigated by a triple redundant temporal latch. ..... 30
II-8. Current starved inverter schematic. (After [19]). ..... 33
II-9. Delays provided by a $\delta$ delay element while varying drive strength through transistor width. ..... 33
II-10. Delays provided by a $\delta$ delay element while varying capacitor size by transistor length. ..... 34
II-11. DICE latch schematics showing the simplified 8 transistor version (b) with PMOS pass-gate inputs. (After [25]) ..... 35
II-12. SEU prevention in a DICE latch. ..... 36
II-13: Unhardened charge sharing flip-flop schematic. (After [31]) ..... 38
II-14. Schematic portion of current-sharing hardened flip-flop. (After [31]) ..... 39
II-15. Schmitt Trigger based latch. (After [34]) ..... 40
III-1. Schematic setup showing the Verilog-A model's peripheral circuitry. ..... 47
III-2. Simulated SET propagating through chained inverters. ..... 48
III-3. Simulated SEU in a bi-stable memory cell. ..... 49
IV-1. (a) expresses the schematic of the Muller C-element and (b) shows the symbol designating C -element placement in a schematic ..... 51
IV-2. Schematic of delta delay element used in the temporal flip-flop. ..... 53
IV-3. Full schematic of master/slave temporal flip-flop. ..... 53
IV-4. Simulated SEU if the TFF latch feedback loops were missing the
delay element. ..... 54
IV-5. Simulation of an SET creating a charge feedback error. ..... 56
IV-6. Simulation of operation with temporal flip-flop at 250 MHz . ..... 58
IV-7. Simulated SET on input node D, showing mitigation potential of the temporal flip-flop. ..... 59
IV-8. Simulated SET on SSetup ..... 60
IV-9. Simulation of an SET on the CLK node. ..... 61
IV-10. Temporal flip-flop schematic expressing divisions of interleaving. ..... 62
IV-11. Multi-bit cell layout with the interleaved nature of one flip-flopprogressing across the cell.62
IV-12. Plot of power consumption by activity factor for the unhardenedflip-flop and the temporal flip-flop.65
IV-13. Transient plot depicting the worst case setup scenario for the wherean SET occurs within two $\mathrm{t}_{\delta}$ from the rising clock edge.
(After [43]) ..... 68
Figure Page
IV-14. APR results of the hardened block (a) and unhardened version (b).
(After [43])70
V-1. Two delay element designs. (a) being the design described in
Chapter 4 and (b) showing an inverter chain providing the same delay. ..... 73
V-2. Simulated ionized particle strike on the first node of (a) the delay element used in the TFF and (b) a 18 chained inverter delay element. ..... 75
V-3. Majority voter temporal/DICE flip-flop. (After Knud-06]) ..... 76
V-4. Proper MTDFF operation at a 250 MHz clock frequency ..... 77
V-5. C-Element temporal/DICE flip-flop schematic. (After [43]) ..... 79
V-6. Proper CTDFF operation at a 250 MHz clock frequency. ..... 80
V-7. MTDFF layout covering two cell heights. ..... 83
V-8. Complete CTDFF layout. ..... 83
V-9. Plot comparing the power consumption of an unhardened flip-flop to three temporally hardened flip-flop across a range of activity factors. ..... 86
V-10. Simulated SET affecting the D input of the MTDFF. ..... 88
V-11. SET seen on the CLK node of a MTDFF causing an upset. ..... 89
V-12. Simulated SET on the D input of the CTDFF. ..... 90
V-13. Simulated SET on the CLK node of a CTDFF. ..... 91

## I. Introduction

D latches and flip-flops have become the most widely used circuits in modern CMOS chip design. This is due to the ability these circuits have to provide both data synchronization and storage. This chapter will discuss latch design and use along with the effects radiation can have on these circuits. Single event effects, or SEE, will be explained in relation to their impact on transient operation of bulk CMOS circuits, along with an explanation regarding an important parameter called linear energy transfer, or LET.

## A. Sequential Circuits

## 1) Latches

Latches are the basic building block for synchronous designs in CMOS VLSI. These circuits are controlled by the clock signal in a chip and can operate in two states, transparent and closed. When a latch is transparent, data passes through the circuit from the input to the output. Conversely, when the clock closes the latch, data is stopped at the input and the last value to pass freely through the latch is stored until the latch reopens. Latches can be designed to open for either clock $=1$ or clock $=0$ states and are referred to as transparent high or transparent low latches, respectively. Operation for a standard, transparent high, D latch is shown in Fig 1-1(c). Sections (a) and (b) of Fig. I-1 will be discussed below.
(a)

(b)

(c)



Fig. I-1. Timing diagrams for combinational logic (a), flip-flops (b), and latches (c). (After [1])
When the clock is high, the value at the input, D , is passed freely through the latch and the storage node captures the input value when the clock goes low. At this point, any changes in D are not recognized by the latch output until the clock goes high again.

Fig. I-2 shows the evolution of synchronous timing circuits from the most simple, single transistor pass gates up to complex latch designs. Basic synchronous switches consist of pass gates and transmission gates, shown in Fig. I-2(a)(b).

(a)

(b)

(d)

(e)

(f)

(g)

(h)

(i)

(j)


Fig. I-2. Basic pass gate and latch design schematics. (After [1])
The pass gate implementation provides a compact and fast solution for synchronous timing but these switches suffer from a couple limitations. For example, in the design of pass gates, only one NMOS or PMOS transistor is used. This limits the output voltage range across the device and will not allow the output to easily swing rail to rail. Also, both the pass gate and transmission gate
implementations are dynamic latches, i.e. the output Q floats when the switch is closed, thus exposing the state node to feedback noise and sub-threshold leakage corruption. The circuit in Fig. I-2(c) implements a simple solution to the noise corruption issue and protects the state node by adding a buffering inverter to the output. Conversely, Fig. I-2(d) buffers the input node but leaves the state node exposed. In both these designs, the additional inverters create inverting latches that operate equivalently to a low logical effort tri-state inverter.

In order to rectify the leakage corruption issues that persists with floating storage nodes, current latch designs use bi-stable memory by adding feedback inverters, or tri-state inverters in the case of D-latches, to create static storage. The tri-state inverters are designed to pass logic when the latch is closed to prevent the feedback path from competing with the input pass-gate logic during the same clock phase. These circuits improve on an inverter/transmission gate implementation by maintaining a high drive strength that the transmission gates lack on their own. The two latches shown in Fig. I-2(e)(f) demonstrate this technique through inverting and non-inverting latch configurations. However, the storage node is still susceptible to possible noise feedback in both these design. Figs. $1-2(\mathrm{~g})(\mathrm{h})$ protect the storage node by driving the output inverter from node n1, mitigating any possible feedback corruption on the output node. Fig. I-3 shows how the latches shown in Fig I-2 (f)(g) will react to output noise. The node N 1 is a path in close proximity to both Q outputs (Qopen for f and Qprotect for g ).


Fig. I-3: Noise affecting a bi-stable memory element and being rejected by a protected storage node

When N1 switches high, the nodes Qopen and Qprotect react due to capacitive coupling. Since this noise passes $\mathrm{V}_{\mathrm{DD}} / 2$, the bi-stable memory in the unprotected latch switches state while the protected latch returns to the proper value. The schematic shown in Fig. I-2(g) depicts the most commonly used D-latch due to its fast clock to Q value, which is derived from driving the transmission gate with an inverter and high drive strength from the unloaded output inverter.

In addition to the standard D latch, enables (such as set and resent) can be added to latch designs to further control the outputs of the latch. Set and reset control signals enable the latch output to high and low logic levels, respectively. These enables can set latch values either synchronously or asynchronously depending on the configuration.

## 2) D Flip-Flops

Flip-flops, like latches, provide synchronous data transfer and storage. However, unlike latch elements, a flip-flop only copies the data from the input pin to the output once per clock period and does not allow multiple logic values to be passed in a clock cycle. Data is transferred at either the rising or the falling clock edge, depending on the flip-flop configuration. Rising edge triggered flip-flop basic operation is shown in Fig I-1(b). The flip-flop only changes state by capturing D values at the two rising clock edges shown in the chart. This is compared to the combinational timing shown in Fig. I-1(a), where data can pass freely through the block regardless of clock phase.

In a master slave flip-flop, this behavior is produced by a circuit combining two latches in series with opposite clock polarities. For example, a transparent high master latch followed by a transparent low slave latch will create a falling edge triggered flip-flop. Examples of this using first a transmission gate and then D-latch implementation are shown in Fig I-4. Complementary clock signals are needed in all flip-flop designs to insure that the master and slave latches are not transparent at the same moment and are usually generated locally within the cell. In the event that clock edges do not rise or fall quickly, flip-flops have the possibility of failing to regulate data flow if both latches are transparent at the same time.
(a)

(b)


Fig. I-4. Basic D flip-flop schematics using transmission gates (a) and bi-stable memory (b). (After [1])

## 3) Synchronous Timing

Three nodes, $\mathrm{D}, \mathrm{CLK}$ and Q , must be considered to properly characterize the timing of sequential circuits. Through analysis, three values are generated that define how quickly latches and flip-flops are able to properly operate: $\mathrm{t}_{\text {SETUP }}$, $\mathrm{t}_{\text {HOLD }}$ and $\mathrm{t}_{\text {PCQ }} \cdot \mathrm{t}_{\text {SETUP }}$ and $\mathrm{t}_{\text {HOLD }}$ refer to the time a logic value must be stable at D before and after a clock edge, respectively. $\mathrm{t}_{\mathrm{PCQ}}$ describes the amount of time data takes to propagate through the slave latch before Q stabilizes after an activating clock edge. The value $\mathrm{t}_{\mathrm{PDQ}}$ is specific only to latches and describes the time required for a change in data to propagate from D to Q when a latch is transparent. Fig. I-1(b) shows a visual, transient representation of these values along with the contamination delay, $\mathrm{t}_{\mathrm{CCQ}}$, which will be described later.

While propagation times can easily be measured by asserting a clock edge or proper clock state and measuring the temporal difference between changes, $\mathrm{t}_{\text {SETUP }}$ and $t_{\text {Hold }}$ require a bit more analysis. A sequential element will retain a proper logic state in the event that data arrives preceding a clock edge by a sufficient
amount of time. However, as data arrives closer and closer to the clock edge, $\mathrm{t}_{\mathrm{PCQ}}$ will increase towards infinity. Let's define $t_{C Q}$ as the measured clock to Q time and $\mathrm{t}_{\mathrm{DC}}$ as the actual difference between the data change and the clock edge. We then can define $t_{\text {SETUP }}$ as the smallest $t_{\mathrm{DC}}$ value where $\mathrm{t}_{\mathrm{CQ}} \leq \mathrm{t}_{\text {PCQ }}$ to provide the smallest time which a data change can precede the clock edge where the new data will be properly stored after the latch closes. Similarly, changing data before $\mathrm{t}_{\mathrm{HOLD}}$ will also increase $\mathrm{t}_{\mathrm{CQ}}$. This leads to the inequality expressing a worst case $\mathrm{t}_{\mathrm{HOLD}}$ as the highest $\mathrm{t}_{\mathrm{DC}}$ value where $\mathrm{t}_{\mathrm{CQ}} \leq \mathrm{t}_{\mathrm{PCQ}}$. Setup and hold times also vary depending on if the input is switching low to high or high to low depending on PMOS vs. NMOS size in CMOS logic. Sample setup and hold time analysis curves on a $t_{C Q}$ vs. $t_{\mathrm{DC}}$ plot are shown below.



Fig. I-5. Setup and hold time curves for a latch. (After [1])
For hold times, the 0 and 1 subscripts refer to whether $D$ is rising or falling, respectively. With setup times, the same nomenclature refers to the rising and
falling of Q . This occurs in the event that D switches much earlier than the clock edge. Setup and hold times vary when measured with respect to rising and falling clock edges. This is due to NMOS and PMOS transistor sizing.

Inspecting Fig. I-5 introduces a new value called aperture width, or $\mathrm{t}_{\mathrm{a}}$. Aperture width refers to a $\mathrm{t}_{\mathrm{DC}}$ range spanning across the clock edge, during which the flipflop will not produce correct outputs should the input state transition within this window. This value differs for rising and falling inputs and can be calculated by the equations

$$
\begin{align*}
& \mathrm{t}_{\mathrm{ar}}=\mathrm{t}_{\text {SETUP } 1}+\mathrm{t}_{\text {HOLD } 0}  \tag{I-1}\\
& \mathrm{t}_{\mathrm{af}}=\mathrm{t}_{\text {SETUP } 0}+\mathrm{t}_{\text {HOLD }} \tag{I-2}
\end{align*}
$$

where r and f designate rising and falling inputs, respectively. Data transitions that occur within the aperture width will result in the storage cell becoming metastable, or in an indeterminate state, and will not settle until the node discharges due to leakage current or the next data transition meets required timing conditions. Similarly to the setup and hold times, this value will vary for rising and falling clock edges.

These timing constraints become important when designing sequential circuits to work with combinational logic. The minimum available clock period, or $\mathrm{T}_{\mathrm{C}}$, must be defined by adding the overhead of the sequential circuits, $\mathrm{t}_{\mathrm{SETUP}}+\mathrm{t}_{\mathrm{PCQ}}$, and any delays from combinational logic, $t_{\text {PD }}$, providing the equation

$$
\begin{equation*}
\mathrm{T}_{\mathrm{C}} \geq \mathrm{t}_{\mathrm{SETUP}}+\mathrm{t}_{\mathrm{PCQ}}+\mathrm{t}_{\mathrm{PD}} . \tag{I-3}
\end{equation*}
$$

This allows data to enter the combinational logic after $\mathrm{t}_{\mathrm{PCQ}}$ and then have ample time to pass through combinational logic and reach the second flip-flop
before the setup time is reached. Violating the equation shown above will result in a setup time failure. This particular failure type can be rectified by decreasing the clock speed, thus allowing more time for data to propagate through combinational logic. Additionally, the minimum allowed pulse width is set by the sum of the setup and hold times, or

$$
\begin{equation*}
\mathrm{t}_{\text {PW }}=\mathrm{t}_{\text {SETUP }}+\mathrm{t}_{\text {HOLD. }} \tag{I-4}
\end{equation*}
$$

Inversely, a hold time failure, or race condition, occurs when combinational logic does not provide sufficient delay between two flip-flops. In the situation where a flip-flop has a large hold time, a possibility exists that after a triggering clock edge, data can quickly be passed from one flip-flop to the next before the hold time expires thus corrupting the captured state of a following flip-flop or latch. This error relies on timing called contamination delay which describes the time it takes for an element to begin changing state once activated, by either a clock edge or transparent state, and has the variables $\mathrm{t}_{\mathrm{CD}}$ for logic contamination delay and $\mathrm{t}_{\mathrm{CCQ}}$ for flip-flop/latch clock to Q contamination. The lower limit of $\mathrm{t}_{\mathrm{CD}}$ is shown by the equation

$$
\begin{equation*}
\mathrm{t}_{\mathrm{CD}} \geq \mathrm{t}_{\mathrm{HOLD}}-\mathrm{t}_{\mathrm{CCQ}} \tag{I-5}
\end{equation*}
$$

From this equation, it can be seen that if $t_{\text {CCQ }}$ is greater than $t_{\text {HOLD }}$, no race conditions will occur. $\mathrm{t}_{\text {HOLD }}$ will be negative for many cases, allowing for the condition shown above to always be met. This type of error cannot be rectified by slowing clock speed and must be addressed by redefining logic either within the flip-flop or between the two sequential elements. Simply put, sufficient use of
buffers will increase contamination delays until hold time failures are corrected [1].

## B. Space Radiation

Radiation is defined as "the process in which energy is emitted as particles or waves." [3] When an energized particle strikes circuitry, any reaction within the circuit that caused by the strike is referred to as a single event effect, or SEE. These effects are classified as either soft errors, where circuitry has the ability to continue with proper operation after a period of time, or hard errors, where there is permanent damage or a circuit must be powered down to be corrected. There are three main SEE sources due to space radiation: cosmic rays, gamma rays, solar flares. In addition to these, plasma has the potential to affect integrated circuits, but due to the lower energy ranges of this source compared to the first three, plasma is not considered a high risk. Radiation causes soft errors in circuits due to by strikes by ionized incident particles to a sensitive node within a circuit [4].

The effects cosmic rays have on integrated circuits have been observed in space and aircraft electronics and are considered the most important form of deep space radiation with circuits designed for high orbit applications. These particles are both very high energy and very highly ionized. The primary sources of cosmic rays are deep space novas and solar wind. As seen in Fig. I-6, heavy particles of nuclei with atomic numbers less than 25 are important in relation to an SEE type called a single event upset, or SEU, due to their relatively high abundance.


Fig. I-6. Ionized particle flux by atomic number relative to Si. (After [4])
Elements with atomic numbers greater than 25 are not able to persist in space environments, unlike smaller elements, and dissipate before they reach Earth. The four most important elements are hydrogen, helium, carbon, and oxygen with hydrogen and helium making up $94 \%$ and $5 \%$ of the total high-energy heavy ions found in space, respectively.

The Earth is protected from cosmic rays by a region in the Earth's magnetic field called the magnetosphere, which lies about 10 Earth radii from the Earth's center towards the sun side of the planet. The shape of the magnetosphere is defined by solar wind, or plasma moving in the Earth's magnetic field and the interplanetary magnetic field.

Within the magnetosphere shape, two belts of high SEU danger are formed at the Earth's atmosphere edge and extend 40,000 miles into space. These regions were originally found by J. Van Allen and consequently named Van Allen Belts. The inner and outer belts consist of high energy protons and electrons, respectively, from trapped cosmic rays and solar wind. The belts' particle flux is depicted in Fig. I-7. Stronger magnetic fields closer to the Earth trap charged particles within the inner Van Allen belt for longer durations than in the outer belt [4].


Fig. I-7. Van Allen belt equatorial trapped particle flux vs. altitude. (After [4])
Gamma rays, originating from interstellar space, have the smallest wavelength and most energy when compared to any other wave on the electromagnetic spectrum. Because of this, there is the possibility that electrons are ejected from gamma ray reactions with alpha, proton, and neutron particles in substrates,
causing SEUs in integrated circuits. These rays are found in radiation bursts that can last anywhere from seconds to minutes and often originate in either interstellar space or are given off by some radioactive substances [4].

Solar flares cause radiation in the form of solar particle events (SPEs) that eject electrons, alpha particles and heavier particles into space. While the particles have the ability to pierce the Earth's polar regions to low altitudes, there is a small probability that a significant number will be injected into the magnetosphere. Most solar flares do not pose a threat to spacecraft circuitry because of this. The X-rays that are released by solar flares do not pose a threat to spacecraft circuitry due to their relatively low flux levels [4].

## C. Radiation Effects

Single event effects, or SEE, are caused by ionized particles striking a circuit. When an incident particle passes through a circuit substrate, there is a charge generated due to the holes and electrons that drift onto the node, as shown in Fig. I-8. SEEs occur when this parasitic charge exceeds the node's critical charge threshold $\left(\mathrm{Q}_{\text {CRIT }}=\mathrm{C}_{\text {NODE }} * \mathrm{~V}_{\text {NODE }}\right)$. This thesis will address two types soft error SEEs, single event upsets (SEU) and single event transients (SET). SEUs are caused by direct ion strikes inside a latch storage element while SETs are caused by transient, temporary voltage shifts from preceding logic. Unlike soft errors, hard errors (single event latchup, single event burnout, and single event gate rupture) created by SEEs can cause unrecoverable failures in CMOS circuitry and while the circuits presented in this thesis will not address solutions to mitigating these effects, they will be briefly discussed at the end of this section.


Fig. I-8. Charge collection on a node due to an incident ionized particle. (After [5])

## 1) Single Event Upsets

When ions strike an integrated circuit, charge is deposited as the particle travels through the substrate. The particle's stopping power is measured in energy loss per unit path length, or linear energy transfer (LET), with the units of MeV $\mathrm{cm}^{2} / \mathrm{mg}$ and plays a significant role in determining ionization energy deposited from the incident ionizing particle track. It is possible for high and low energy particles to have the same LET value [4].

SEUs are caused by particles striking an integrated circuit if the charge collected in the substrate during the strike exceeds the critical charge threshold of nodes electrically connected to the incident area. Any voltage shifts may potentially be restored to their original value by circuitry driving incident nodes. However in some situations, most notably storage nodes, the charge will not be absorbed and an upset will occur.

As mentioned earlier, collected charge affects nodes at or near the strike, creating SEUs in storage cells in memories or latches. Fig. I-9 shows the effect of an ion strike within a basic latch storage node, n2, such as the one in Fig. I-2(g).


Fig. I-9. Simulated SEU on the storage node of a basic transparent high latch.
When the storage node n 2 collects negative charge at 11 ns , the node voltage shifts down to a logic level of 0 . Since the clock is low and the latch is closed, this fault causes n 2 to switch the output of the second inverter to switch state before the collected charge is removed from the incident node. The inverter change drives n 1 high and thus permanently flipping the value captured by the storage cell and disrupting the operation of any circuit subsequent to the faulted latch. Since ion strikes that will cause this error type can occur at any point in time, SEUs are independent of clock speed. Also, the methods through which SEUs
affect latches apply to memories since the storage mechanism in sequential circuits and many memories are identical at a schematic level.

## 2) Single Event Transients

Single event transients, or SETs, are a type of SEE that is gaining importance as feature size decreases, due to their causes and methods used to repair their effects. The source of SETs stems from ionizing particles striking in combinational logic, activating devices that are in an off state. The pulse returns to its proper state once the circuitry driving the affected node removes any collected charge. Fig. I-10 shows this pulse by separating charge collection and diffusion sections with a reference time scale. The recovery process's speed is directly proportional to node capacitance, since charge is the capacitance times the voltage, and the preceding circuitry's current driving strength. Because of this relationship, as drive strength and node capacitance decrease with new fabrication processes, collected charge is becoming more and more dangerous to proper circuit operation. Generated voltage pulses propagate through logic until they either reach a closed latch, preventing any further transmission, or the pulses dissipate due to attenuation, which is explained below.


Fig. I-10. SET pulse on a node due to an ionized particle strike. (After [6])
SET pulse width, or $\mathrm{t}_{\mathrm{SET}}$, is directly proportional to drive strength of the incident node, the capacitance of the affected node, and the amount of charge collected. The last of those three is related to the LET of the impinging ionizing particles. As mentioned earlier, drive strength directly affects the time it takes for collected charge to be removed from a node. Since drive strength is fixed for a given circuit, higher LETs will create a longer $\mathrm{t}_{\text {SET }}$. $\mathrm{t}_{\text {SET }}$ has been shown to increase with decreasing process sizes [7][8] thus increasing the importance of SETs in ICs as technology processes progress.

Attenuation also plays a significant factor in SET propagation. While CMOS technology in combinational logic has the potential to decrease pulse widths to an inconsequential level, certain circuitry, such as transmission gates, will increase $\mathrm{t}_{\text {SET }}$ because of the lower drive strength [9]. Pulses propagating through significantly long chains of combinational logic that are shorter than the clock pulse width will decrease in width after each subsequent gate. The opposite is for pass and transmission gates. Since the drive strengths of this type of logic are
significantly lower than those seen in CMOS designs, charge pulses created on nodes driven by transmission gates will take longer to mitigate, thus increasing $\mathrm{t}_{\text {SET }}$ for the preceding combinational logic. Longer $\mathrm{t}_{\text {SET }}$ values will render certain hardening techniques, such as the temporal hardening to be described in Chapter 2, useless. This issue can be bypassed by replacing transmission gates with tristate inverters to maintain strong drive strengths throughout the circuit.

SETs permanently impact circuitry when they reach a storage node around a closing clock edge. If an SET spans the latch setup and hold time at a clock edge, the incorrect value will be captured, creating an upset. Because of this, both longer pulse widths and higher clock speeds increase the probability of SETs causing upsets in sequential logic. A high clock speed SET capture example is shown in Fig. I-11 where an SET occurs in logic preceding transparent high latch with a schematic similar to that shown in Fig. I-2(g).


Fig. I-11. Simulated SET on the input node, D, of a basic transparent high latch.
The transient plot shows the SET reaching the latch input pin, D, at 9.6 ns and persisting across the falling clock edge. Since the storage node closes while capturing the incorrect SET value, the latch drives the output at an incorrect value for half a clock cycle before the rising clock edge re-opens the latch and the next value is passed to the storage node.

## D. Hard Errors

As with SEUs and SETs, the origin of hard errors is based in a single ion strike and subsequent charge collection at various locations in a circuit. Single event latchup (SEL) is caused when an incident ion turns on the cross coupled, parasitic bipolar transistors that are inherent in any CMOS configuration due to the PNPN and NPNP setup. While these parasitic BJTs are in a high impedance mode when
the circuit is in a normal operating mode, once the parasitics are turned on, high current flow can thermally destroy the transistors unless power is removed from the device [10]. The final two SEEs discussed, single event burnout (SEB) and single event gate rupture (SEGR), are usually found in power devices since both require high current levels passing through the devices but have also been seen in CMOS design. SEB occurs when a heavy ion causes a FET to enter second breakdown and, as with SEL, the device can be thermally crippled if not quickly stopped. SEGR is often seen simultaneously with SEB and also results in transistor failure. This event occurs when conduction between the gate and channel regions causes the insolating gate dielectric fails [11].

## E. Conclusions

In this chapter, basics for CMOS latch and flip-flop implementations were explained along with the introduction of space radiation and the effects that ionizing particles have on integrated circuits. In the remainder of this thesis, techniques to mitigate these SEE radiation effects and implementations of these techniques will be described. Chapter 2 will review multiple radiation mitigation techniques, describing viable applications for each one. Chapter 4 depicts the usage of two of these methods in an innovative flip-flop design. Finally, this flipflop will be compared to an unhardened D flip-flop and two other hardened flipflop designs for a visualization of how radiation hardening affects circuit operation, size, and power consumption.

## II. Radiation Hardening Techniques

## A. Introduction

Radiation hardening techniques used in chip design fall into two main categories: process hardening and radiation hardening by design (RHBD). Process hardening techniques allow for a more compact design when compared to RHBD on equivalent process sizes. However, current hardened processes are substantially larger than current industry standard processes, while RHBD techniques allow current designs to scale with future process sizes. For this reason, RHBD implementations are necessary for modern processes to be utilized in hardened circuits.

All hardening techniques have their individual pros and cons and should be selected depending on the application. This chapter will review various hardening techniques, focusing specifically on two techniques called temporal hardening and node interlocking.

## B. Process Hardening Techniques

## 1) RC Hardening

Process hardening is proving to be the most effective method of minimizing certain single event effects. However, the technologies that utilize these methods are still large, power hungry, and slow compared to the current industry standards for circuit design.

Fig. II-1 shows resistance between nodes in a bi-stable memory cell and illustrates the gate capacitances provided by the transistors as independent capacitors.


Fig. II-1. Schematic showing a storage node with resistances separating driving devices and capacitances attached to the transistor gates. (After [13])

In this figure, the resistances are thin film resistors that have minimal area impact in the cell layout. This keeps the diffusion area low to minimize the locations charge collection can occur. This method's validity has been shown through analysis made by Hoang on this cell type [13].

In addition to the gate capacitances, metal-insulator-metal capacitors (MIMCAPs) can be integrated into designs to create better RC decoupling (for the minimization of current spikes [1]) and increasing a circuit's resistance to SEUs. These MIMCAPs are becoming more influential as the latest technologies decrease the node capacitances. Similar to other RC circuits, this hardening setup creates a low pass filter that nullifies all high frequency pulses, such as those seen during SETs. This enables RC hardening in the technology and eliminates any sensitivity to low LET levels [13].

## 2) Magnetic Tunnel Junction Hardening

Similar to RC hardening, the more complex magnetic tunnel junction (MTJ) hardening spaces critical nodes through micro-scale discrete components. The junction is created by separating two ferromagnetic metals with a dielectric layer [14]. In this configuration, the insulating dielectric is so thin that electrons and holes are able to create a tunneling current between the two metals. The current direction depends on the magnetization orientation in the metals and can be modified by applying a magnetic field, creating a tunneling magneto-resistance across the junction [15].

When applying this technology to radiation hardening, the tunneling current quickly removes any collected holes or electrons due to ion strikes from nodes, allowing for immediate SET and SEU mitigation. An example magnetic hardened latch schematic and the particle strike simulation corresponding to the schematic are shown below.


Fig. II-2. Schematic for a dual-MJT latch cell. (After [16])


Fig. II-3. Simulated SET on dual-MJT bi-stable memory cell. (After [16])
In Fig. II-2, the varistor-like symbols, labeled MTJ1 and MTJ2, represent the two MTJs needed to harden the latch. However, if the dielectric layer is damaged
by an incident particle, the magneto-resistance value drops and any hardening benefit provided by the MTJs is nullified [16]. Also, like the resistors used in RC hardening, MTJs create a low pass filter that limits magnetic hardening use to low bandwidth applications.

## C. RHBD Techniques

## 1) Redundant Latches

Triple modular redundant, or TMR, latches and flip-flops mitigate SETs and SEUs through employing spatial hardening by creating multiple critical nodes and physically separating them via layout. This requires the desired circuitry to be placed in triplicate and the sequential logic outputs to be voted on by circuitry such as a majority voter. In this system, if an ion strike effects one of the three circuits, the proper values from the other two circuits will remove the incorrect logic level through the use of a majority voter, shown in Fig. II-4.


Fig. II-4. Majority voter schematic with inputs A, B and C, along with output pin Y.
The majority voter is a CMOS gate that compares three input logic values and outputs the value that two or more inputs agree on. The truth table for this element is shown in Table I. As you can see, the gate passes the inverse of whichever logic
value controls the inputs' majority thus providing hysteresis for any circuits preceding the gate.

TABLE I
Majority Voter Truth Table

| A | B | C | Y |
| :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 1 |
| 0 | 0 | 1 | 1 |
| 0 | 1 | 0 | 1 |
| 0 | 1 | 1 | 0 |
| 1 | 0 | 0 | 1 |
| 1 | 0 | 1 | 0 |
| 1 | 1 | 0 | 0 |
| 1 | 1 | 1 | 0 |

There are significant size and power penalties when utilizing TMR techniques. The obvious increases from an unhardened flip-flop are seen with the three parallel unhardened circuits that are required to properly function in a hardened state, plus three additional majority voters, one for each output. A block schematic displaying a triple redundant flip-flop implementation is shown below.


Fig. II-5. Block Diagram showing TMR setup.
Every input signal is triplicated before the unhardened flip-flop to minimize multiple D-inputs from reacting to SETs. Note that the same clock signal drives all three latches. This implies that the design is not hard to clock SETs since a glitch on the clock can cause all three flip-flops to pass a logic value prematurely and causing an upset. TMR designs can rectify this fault by generating three separate clocks, one for each logic copy.

While dual modular redundancy has been tested as a lower power and compact size alternative to TMR, additional techniques, such as temporal hardening or interlocking, are required to make dual redundancy effective [17]. Because of this, dual redundancy is primarily used as an error detection method while correction circuits are designed in TMR.

## 2) Temporal Hardening

Temporal hardening is also an extremely popular RHBD approach. Instead of physically separating critical nodes and creating multiple circuits, as done in TMR, this technique creates temporal redundancy by separating nodes through the use of delay elements [18]. Delay elements provide a $\mathrm{t}_{\delta}$ propagation delay from the input, A, to the output, Y. Temporal redundancy can be used in both dual and triple redundant forms. A triple temporally redundant latch schematic is shown below in Fig. II-6 and the flip-flop described as the CTDFF in Chapter 5 shows a dual temporally redundant implementation.


Fig. II-6. Temporal latch depicting temporal redundancy with majority voters. Delay elements are marked by $\delta$. (After [19])

The bi-stable memory cell consists of an inverter and a feedback majority voter whose inputs are temporally separated by $0,1 \mathrm{t}_{\delta}$ and $2 \mathrm{t}_{\delta}$ delays. The $\mathrm{t}_{\delta}$ value is chosen to exceed the maximum SET duration that the circuit is expected to encounter. This insures that any pulse shorter than $\mathrm{t}_{\delta}$ seen by the nodes $\mathrm{Mb}, \mathrm{MDb}$, and MDDb will not reach the majority voter inputs at the same moment and consequently, the latch will mitigate SETs less than that length.

Fig. II-7 depicts this technique's SET mitigation process. In this simulation, an SET occurs at the input node, D, while the latch is transparent. The nodes MD, MDb and MDDb pulse for a 400 ps duration after $0, \mathrm{t}_{\delta}$, and $2 \mathrm{t}_{\delta}$ respectively. At no point does the SET value occur on two voter input nodes so the latch output, Q , and the storage cell, never changes state.


Fig. II-7. Simulated SET on D mitigated by a triple redundant temporal latch.
To improve on both size and speed, the majority voters can be replaced by Muller C-elements. The C-element is also a hysteresis device and will be more thoroughly described at a schematic level in Chapter 4. This two input device has
the same function as the majority voter where there must be an agreement between the inputs in order for the element to change state. This allows the Celement to provide hysteresis when the input nodes are temporally separated [20]. The state table for a C-element is shown in Table II where the output state "X" denotes an instance when the C-element is tri-stated and the output is floating in the previous logic state.

TABLE II
State Table For Muller C-Elements

| A | B | Y |
| :---: | :---: | :---: |
| 0 | 0 | 1 |
| 0 | 1 | X |
| 1 | 0 | X |
| 1 | 1 | 0 |

Only one $t_{\delta}$ delay element is required to temporally separate the gate inputs since there are two inputs on the C-element. This allows a size decrease of two delay elements and eight transistors when using a C-element configuration in the same storage cell described with the majority voter. The temporal flip-flop design proposed in this thesis will use a C-element configuration for the storage nodes in both the master and slave latches. Driving two C-element inputs with two other Celements has also been shown to provide an effective method to incorporate redundancy into temporal designs [21].

In most cases, temporal designs are only hardened to single SETs or SEUs. If multiple pulses simultaneously affect critical storage nodes, such as the two Celement inputs, upsets can occur. Additionally, in the event that multiple transient pulses combine to create a pulse with duration exceeding $\mathrm{t}_{\delta}$, the input nodes for the C-element will capture an incorrect value and the memory cell will switch
states, causing this hardening technique to fail [22]. Also, a fault will occur if two pulses reach the C -element/delay element combination with a $\mathrm{t}_{\delta}$ separation. This error is shown via a transient simulation in Chapter 4.

As mentioned in the previous chapter, as fabrication processes scale down to deep sub-micron levels, e.g. $0.13 \mu \mathrm{~m}$ feature size, the amount of charge needed to switch logic states decreases with the lower node capacitances found on smaller transistors. Coupling this effect with lower drive strengths, which increase the amount of time a CMOS element takes to remove collected charge, further increases the SET duration. Smaller node capacitances and lower operating voltages increase the SET pulse width [23]. Thusly, smaller processes and lower operating voltages will require larger $\mathrm{t}_{\delta}$ separation on critical nodes.

Similarly to TMR, temporal hardening implementations have a severe penalty in both size and power. A majority of these costs stem from the delay elements providing the temporal separation. Therefore, to create a low power, compact, temporally hardened circuit, special attention must be provided when designing the delay elements. Since current fabrication processes are built for increasingly high clock speeds, it is necessary to deviate from normal CMOS design. To create an ideal delay element, the internal circuitry should have low drive strength with high capacitance nodes. Ideally the circuit should also be non-inverting from the input to the output. The low drive strength inverters are generally created through two methods: either using current starved inverters, whose schematic is shown in Fig II-8, or by decreasing the transistor width.


Fig. II-8. Current starved inverter schematic. (After [19])
The simulation shown in Fig. II-9 details a step in drive strength, by transistor width from 185 nm to $1 \mu \mathrm{~m}$, vs. delta delay for the four inverter combination described in Chapter 4. Notice that as the transistor drive strength decreases, $\mathrm{t}_{\delta}$ increases at a non-linear rate.


Fig. II-9. Delays provided by a $\delta$ delay element while varying drive strength through transistor width.

Placing capacitors between inverters with minimum drive strengths maximizes the delaying effect of the minimized transistor sizing. Fig. II-10 shows the delay
element $\mathrm{t}_{\delta}$ over a capacitance range, in transistor gate lengths, from 130 nm to 500 nm . While sizing the capacitors at various gate lengths does affect $\mathrm{t}_{\delta}$, varying this parameter is linearly proportional to $\mathrm{t}_{\delta}$ and therefore provides a smaller impact to propagation times than decreasing the driving inverter width.


Fig. II-10. Delays provided by a $\delta$ delay element while varying capacitor size by transistor length.
While these trends can be used to significantly increase the delays provided by $\delta$ delay elements, there are limits to how small inverter drive strength should be. Since the time it takes to remove collected charge is proportional to drive strength, decreasing the transistor size in the delay elements will increase the induced $\mathrm{t}_{\text {SET }}$ from any ion strike.

Delay elements should not be the weakest link in a CMOS logic chain when it comes to collected charge removal and therefore drive strengths should have lower limits equal to the lowest drive strength in a process library. For the design
feature size presented in this thesis, that level is equivalent to a NAND4 gate, or 185 nm NMOS width.

## 3) Dual Interlock Storage Cells

Local redundancy, or interlocking, utilizes feedback storage nodes to mitigate SEUs. A dual interlock storage cell, or DICE, latch consists of eight interlocked inverters but can be simplified down to eight transistors, four PMOS and four NMOS, as shown in Fig. II-11[2]. The feedback paths in this design insure that single node upsets are quickly corrected. At least two storage nodes in the latch must be driven by inputs in order for the latch to write properly. This combats the interlocking feedback paths from fighting latch input signals. All four storage nodes can be written at once to improve write speed [24]. Implementing DICE latches in layout provides a compact, low power design.


Fig. II-11. DICE latch schematics showing the simplified 8 transistor version (b) with PMOS passgate inputs. (After [25])

Fig. II-12 shows a simulated charge collected on the DICE storage node X0 while the latch is closed. The node goes low for $\mathrm{t}_{\text {SET }}$ and is then driven high by the
value from X 3 . X 3 is not affected by the SET because it is driven by the nodes X 2 and X 0 . In this example, when X 0 goes to a low logic value, X 3 is floating and does not change state. The charge stored on the capacitance connected to X3 by transistors MP1 and MN3 is enough to keep X3 at its proper value for $\mathrm{t}_{\text {SET }}$.


Fig. II-12. SEU prevention in a DICE latch.
The DICE latch is not immune to upsets via multiple node charge collection. This drawback in the design also requires that the storage nodes $\mathrm{X} 0-\mathrm{X} 3$ are protected from charge sharing at both the inputs and the latch outputs. This can be achieved by simply adding inverters at the input and output pins. Additional hardening can be applied by spatially separating the storage nodes to decrease the probability of multiple node charge collection from a single ion strike, which has been shown to cause upsets even at low LET [26].

A second common failure with this latch design arises when an SET occurs on the input nodes and spans across a clock edge that closes the latch. This event causes the latch to store the incorrect SET logic value until the latch becomes transparent at the following clock edge. The time around the clock edge where an SET seen on the input has the potential to upset the latch is referred to as the "window of vulnerability," and this window is fixed for a given DICE flip-flop design.

While the most basic version of the latch is constructed using CMOS inverter topologies, other derivations, such as a NAND or C-element design, have been shown to provide similar hardening [20][27]. Local redundancy can also be modified to supply both hardened set and reset signals as well as hardened scan options [28]. However, charge back writing and SET issues persist with all these added input pins. Designs that simplify the four storage node interlocking down to two interlocked storage nodes and a third state node have also been shown to mitigate SEUs efficiently [29].

## 4) Charge Sharing, Schmidtt Trigger, and Other Methods

Other designs combine the TMR, temporal and DICE latch hardening techniques in more exotic methods, such as charge sharing and sense amplifier (SA) hardening [30]. Schmitt triggers are also introduced in a few new designs for additional hardening methods. Each scheme has its own pros and cons, making them amenable for varying applications.

Unhardened charge sharing flip-flops provide smaller designs than a CMOS implementation and are based on charge being stored on the small capacitances
from transistor source and drains [31]. Fig. II-13 shows a simple schematic of this type of flip-flop.


Fig. II-13: Unhardened charge sharing flip-flop schematic. (After [31])
However, hardening by this technique requires redundant charge sharing flipflop implementations, up to five iterations, creating a very large and fast circuit. The cross coupled differential inputs and outputs found in charge sharing flipflops increase the design vulnerability to SEUs due to strong positive feedback. Since this design relies on node capacitance for data storage, the charge sharing flip-flops effectiveness will not hold with future process sizes.

Fig. II-14 shows a schematic representing the redundancy required for a portion of a charge sharing flip-flop.


Fig. II-14. Schematic portion of current-sharing hardened flip-flop. (After [31])
The schematic shown only depicts about $20 \%$ of the entire hardened flip-flop, thus implying how large circuits hardened by this technique can become.

SA latch designs have also been integrated into high speed radiation hardened flip-flops. While the unhardened version provides a compact design, hardening with this technique requires redundancy. This substantially increases the circuit size while maintaining the high circuit speed. A second option is to integrate an SA master latch with a hardened slave latch, such as a DICE topology [30]. This method does not rely on device size or capacitance and will scale with future technologies [32].

A third exotic RHBD hardening technique example integrates a Schmitt trigger into a latch's storage cell. The large hysteresis provided by these CMOS elements helps mitigate transient pulses of a specific voltage [33]. Additionally, Schmitt triggers harden cells by adding both capacitance and drive strength to storage nodes, decreasing the effect that collected charge has on a circuit. The schematic
shown in Fig. II-15 displays a method to incorporate the Schmitt trigger into a latch.


Fig. II-15. Schmitt Trigger based latch. (After [34])
With this design, the hysteresis and increased node capacitance will only provide protection against limited SET pulse heights. Additional hardening techniques must be implemented for higher collected charge levels [34]. This design is also vulnerable to clock node ion strikes.

There are two additional commonly used hardening techniques. The first is high capacitance hardening. In this method, large nets, such as the clock, have been found to be immune to ion strikes because of the high charge levels already stored on the connected nodes in addition to the high $\mathrm{Q}_{\text {CRIT }}$ intrinsic to large capacitances [35]. The amount of charge collected during a strike will be negligible and no SETs or SEUs will affect the circuit's operation. This technique, however, does not allow for buffers to be used that would lower the node capacitances.

Secondly, implementing a layout technique called spatial hardening can decrease the probability that SETs and SEUs occur in circuits. By physically spacing critical nodes, the chance of a single ion strike affecting multiple critical nodes decreases substantially. Minimizing multiple node charge collection will in turn reduce the likelihood of upsets occurring.

## D. Conclusions

While all the RHBD techniques described independently solve various issues that circuit designers are faced with when addressing hardened designs, none of them provide a perfect RHBD solution when compared to unhardened circuit design through power, size and speed. For example, the completed SET and SEU hardness of a temporal latch requires high power, low speed, and large circuit applications, while a DICE latch sacrifices the SET hardness for a low power, compact design. Technique combinations have been created for specialized applications. The delay-filtered DICE latches, one proposed by Naseer and Draper and another proposed by Blum and Delgado-Frias, combines temporal and interlocked hardening in one latch to create a latch that is smaller and faster than a strictly temporal design while increasing the size and hardening from a solely DICE latch implementation [36][37]. Similarly, Mavis and Eaton combined TMR techniques with temporal delays to create a temporal sampling latch that mitigates SETs and SEUs high clock speeds [18].

Hardened processes must also be fit to specific applications since they generally sacrifice size and power consumption for SET and SEU immunity.

Resistance hardening offers a rugged magnetic hardening version, but is not available on current processes sizes, thus limiting this technique's uses.

## III. Verilog-A Model Simulating SET and SEU

## A. Introduction

Accurately simulating the effects of ionized particles striking a circuit is critical when verifying a circuit's radiation hardness before beginning physical production. In many cases, the use of standard CAD simulation elements does not sufficiently emulate how a circuit reacts in a radiated environment. While complex models have been developed to depict radiation effects involving variables such as semiconductor defects [38], charge cloud shape vs. time [39], and formulas for the drain currents initiated by charge collection [40], circuit simulations do not usually require this level of detail.

Conversely, simply modeling charge collection through the use of a current source does not suffice. As shown in Fig. I-10, the transient charge collection due to an ion strike is not constant or linear, but creates a peak and then decays at an exponential rate. Also, the idealities of a perfect current source, such as infinite internal impedance, make it impossible to properly model charge collection. For example, if a current source is connected between one inverter's output and a second inverter's input, there will not be a continuous voltage value across the node connecting the two inverters.

Because of these reasons, a unique Verilog-A model was created to insure an accurate representation of charge collection on nodes during an ion strike. An ideal simulation would show a specified amount of either positive or negative charge being quickly ejected onto a node. This can be done through either modulation of voltage or current.

The model described in this chapter mirrors the load from a charged capacitor onto a node through current at a time specified by the user. Similar to the SPICE model presented by Fjeldly [41], this model is current based. As mentioned in the previous chapter, SET length is determined by the circuitry driving the node and the amount of charge collected on the node. In this model, the amount of charge collected is set by $\mathrm{Q}=\mathrm{C} * \mathrm{~V}$ where C is the capacitor value and V is the initial condition voltage across the capacitor. As an example, a minimum sized inverter in the 130 nm process used for the design of the flip-flop described in the following chapter requires 300 ps to remove the charge from a 25 fF capacitor charged to 1.2 V .

## B. Verilog-A Code

The code for a negative charge ion strike model is as follows:

```
// VerilogA for Temporal_FF_v2, SETLowSim, veriloga
`include "constants.vams"
`include "disciplines.vams"
module SET(p, n, cp, cgnd, vtime);
parameter R=1.0 from (0:inf);
parameter real iout_min = 0;
parameter real iout_max = 1;
electrical p, n, cp, cgnd, vtime;
real vin, vout, iout;
    analog
    begin
        vin = V(p, n);
        vout = V(cp, cgnd);
//SET begins when vtime is set to 1
    if(V(vtime,cgnd) == 1)
        begin
        iout = vin * vout * R;
        // limit the current to be positive
        case (1)
            iout < iout_min : iout = 0;
            iout > iout_min : iout = iout;
```

```
    endcase
// inject the current to the target node
    I(p, n) <+ iout;
// subtract the same current from the capacitor
// acting as the charge reservoir
    I(cp, cgnd) <+ iout;
// current should end when the reservoir runs out
// of charge
    end
end
endmodule
```

The operation of this model is straight forward. When the value of vtime (set by an external voltage source) equals 1 an external charged capacitor attached across cp and cgnd beings to discharge. The current created by the discharging capacitor is mirrored between the terminals p and n . Once the capacitor has fully discharged, the current coming out of node p also ceases to flow and the simulated charge collection has ended.

A couple modifications need to be made in order to generate a positive SET pulse that does not extend past $\mathrm{V}_{\mathrm{DD}}$. This can be done by setting the output currents with the lines:

```
I(p, n) <+ ((vin-vdd)*vout*R);
I(cp, cgnd) <+ -((vin-vdd)*vout*R);
```

where "vdd" is a parameter set at the $\mathrm{V}_{\mathrm{DD}}$ voltage for the process being simulated. This current will be negative until vin reaches $\mathrm{V}_{\mathrm{DD}}$, at which time the current is 0 A. Similarly, the current drawn from the capacitor must be set as the negative of $\mathrm{I}(\mathrm{p}, \mathrm{n})$ so that charge is removed instead of added.

For positive charge SET simulations, the positive pulse can be emulated by adding an inverter to the pin being tested and simulating a negative SET. This
creates a positive pulse on the input pin of the circuit and created the waveform shape that would be generated by logic preceding the flip-flop inputs. The simulations requiring a positive voltage SET pulse in this thesis were conducted using this method. An example of this simulation is shown in the next chapter.

## C. Implementation And Simulation

The model is not designed to be a standalone simulation element and two peripheral circuit elements must be used. The first is the pre-charged reservoir capacitor that is attached across the cp and cgnd terminals. This capacitor determines the amount of charge being injected into the node, and consequently the SET duration. Secondly, a $\mathrm{V}_{\text {PWL }}$ source was used to set the vtime pin to easily adjust the point in time when the pin reaches 1 V and the SET begins. The p terminal drives the only output pin, P , which is connected to the circuit being simulated. The full SET circuit used to run the Verilog-A model is shown below in Fig. III-1.


Fig. III-1. Schematic setup showing the Verilog-A model's peripheral circuitry.
For the following simulations, the capacitor value used is 33 fF , creating a $\mathrm{t}_{\text {SET }}$ of about 400 ps . Vtime is a parameter to be set in simulation which determines when the simulated SET begins. The first simulation in Fig. III-2 depicts an ion strike in a chain of inverters originally affecting the node n 1 , pulling the node down to 0 V for 400 ps . Nodes n 3 and n 5 are separated from n 1 and themselves by two inverters and show how the induced negative voltage propagates through the chain. Note that in this instance, once the collected charge is removed, the values of $n 1, n 3$, and $n 5$ return to their original state. This pulse will continue to flow through combinational logic until a closed sequential element stops the pulse, or attenuation decreases the effect of the SET to inconsequential levels.


Fig. III-2. Simulated SET propagating through chained inverters.
Fig. III-3 shows the effects of charge collection in a bi-stable memory element. The node n 2 collects negative charge at 9 ns and drives the node n 1 high. After one gate delay, or about $25 \mathrm{ps}, \mathrm{n} 1$ begins to maintain 0 V . Since the inverter drive strengths and charge well capacitor size are the same as in the SET example, we can see from Fig. III-2 that it would take the inverter driving n2 about 400 ps to remove the collected charge. However, an upset occurs since the cell flips after only one gate delay. At this point the memory cell will continue to supply the incorrect value, thus an SEU has occurred.


Fig. III-3. Simulated SEU in a bi-stable memory cell.

## D. Conclusions

As mentioned earlier, it is imperative that circuits undergo proper testing to verify radiation hardness through simulation. The Verilog-A model described in this chapter provides a quick simulation of how charge collected on specific nodes affects circuit simulation. This model will be used to accurately simulate the charge collected on nodes during ionized particle strikes for all the examples in Chapters 4 and 5.

## IV. Temporal Flip-Flop

## A. Introduction

SETs on clock and control nodes, e.g., reset, have been shown to cause issues for RHBD circuits [20]. The C-gate DICE flip-flop presented by Matush, et al. [19] is an example of this drawback since the design presented in that paper is not hard to SETs on the clock node. Issues that arise when hardening set and reset signals focus mostly around asynchronous controls. In the event that an SET propagates to an asynchronous enable node, the effect has the potential to bypass any hardening in the flip-flop and continue to logic following the sequential element. The temporal flip-flop (TFF) presented in this chapter combines two temporal latches and provides hardness against SETs on the control signals and input nodes, as well as SEUs on the internal nodes. The storage cells for both the master and slave latches in this design have identical configurations. This chapter will show that the design has been comprehensively simulated to justify the validity of the circuit configuration used to harden both latches. This flip-flop is then compared through power and size analysis to an unhardened D flip-flop provided by the foundry. Finally, results from the TFF being placed in the synthesis and APR design flows will be presented and explained.

## B. Circuit Design

The C-element was created by David E. Muller in 1959. This circuit has " n " inputs and as mentioned before, does not change state until all inputs supply equal logic levels, thus providing hysteresis until all inputs agree. The CMOS implementation for a two input C-element is shown in Fig. IV-1(a). While the
hardening benefits of this device were discussed in Chapter 2, the C-element is more regularly found in asynchronous design as a stabilizing element and a synchronizer for propagating logic.

The C-element used in the temporal flip-flop uses this two input configuration and has the symbol shown in Fig. IV-1(b). When the input pins, A and B, both have the logic value of $0(1)$, the output pin, Y , will be at a logic state of 1 (0). In the event that A and B do not provide the same value, the C -element will be in tristate mode and Y will float. In this state, the voltage level of Y has the possibility to slightly shift down from logic 1 , or up from logic 0 , due to charge sharing between the stacked transistors. This phenomenon is shown in the simulation section below. However, the magnitude of this shift is not great enough to switch the logic state of C .

a.

b.

Fig. IV-1. (a) expresses the schematic of the Muller C-element and (b) shows the symbol designating C -element placement in a schematic

The temporal flip-flop utilizes one C-element in the storage nodes of both the master and slave latches. As mentioned before, temporal hardening provides separated nodes by creating a minimum pulse width of $t_{\delta}$ that can affect the latch state. In the TFF, this hardening is provided by separating the C-element inputs,

MHold and dMHold using a $\delta$ delay element. This creates a dual redundant hardening and directly prevents SETs from propagating on the input node or control signals.

The $\delta$ delay element employed in this design consists of four inverters separated by two large capacitances, as shown in Fig. IV-2. The inverters were sized to provide the minimum drive strength in the fabrication process's standard cell library. This practice is completed to insure that the drive strengths within a delay element will not create SETs longer than those generated anywhere else in the design, since the current driving capabilities of circuitry directly affects the rate at which any collected charge is removed after an ion strike. As previously mentioned, in this process this drive strength is the equivalent of a NAND4 gate, or an NMOS width of 175 nm . Since this drive strength requires the transistor size of the inverters to be less than the minimum width allowed by the process, each inverter consists of two stacked NMOS and PMOS transistors of 380 nm and 760 nm width respectively. These inverters use the same device length, 130 nm as the rest of the circuit to allow for scalability with process corners, i.e. fast-fast or slow-slow. Capacitance sizes were calculated per the explanation of delay element design in Chapter 2. The large capacitances consist of NMOS and PMOS transistors, each with a gate length of 400 nm . The widths of these two devices are at the maximum width allowed in the standard cell height of $3.69 \mu \mathrm{~m}$ which is $1.04 \mu \mathrm{~m}$ for PMOS and 855 nm for NMOS. The capacitances, coupled with the low drive strength of the inverters, increase the amount of time it takes to change
the state of the $\delta$ delay element. At $\mathrm{V}_{\mathrm{DD}}=1.2$, this configuration provided a $\mathrm{t}_{\delta}$ of 412 ps , or the approximately the same time as 18 minimum sized inverters.


Fig. IV-2. Schematic of delta delay element used in the temporal flip-flop.
The full master/slave TFF schematic is shown in Fig. IV-3. The master and slave latches are noted by the dashed lines. Both latches are nearly identical at the schematic level with the outputs being driven from different nodes, i.e. the slave node for the master latch and the hold node for the slave latch.


Fig. IV-3. Full schematic of master/slave temporal flip-flop.
The feedback loop for each latch consists of an inverter, to complete the feedback loop, followed by a feedback $\delta$ delay element, to protect the latch from SEUs on the storage node. In the event that the latch is closed and an SEU occurs on the setup nodes, labeled MSetup and SSetup in the full TFF schematic shown
in Fig. IV-3, the feedback delay allows the C-element to recover the setup node to its original value before it begins to tri-state.

A simulated example of the SEU that would be caused if the feedback delay element was not present is shown in Fig. IV-4. When charge is collected on MSetup, and the latch is closed, the nodes MHold and dMHold switch with a separation of $\mathrm{t}_{\delta}$. Since the C-element tri-states while MHold and dMHold represent opposite logic values, the charge collected on MSetup is not removed. When the incorrect high logic value propagates through the hold node delay element, an upset occurs.


Fig. IV-4. Simulated SEU if the TFF latch feedback loops were missing the delay element.
The inverter INVBW, located between node MSetup and the slave latch input
node SD2, prevents charge sharing failures due to back-writing from the slave latch to the master. This back-writing can occur when a high logic level on the slave node SHold is connected directly via a transmission gate to a low logic level on MSetup in the event that the clock changes while the master latch C-element is tri-stated. For example, consider the case where the inverter INVBW is not included. If an SET disturbs either MHold or dMHold, the C-element in the master latch tri-states, floating MSetup. While the clockis high, the voltage on SHold would write back to MSetup, potentially flipping the master storage node before the SET pulse is mitigated and the C-element begins to drive node MSetup once again.

An example of the back-writing fault in a design missing INVB is shown in Fig. IV-5. In this simulation, an SET of $\mathrm{t}_{\text {SET }}=400 \mathrm{ps}$ reaches the node MHold about 600 ps before the rising clock edge. Initial conditions of the simulation set SHold high, and MSetup low. Since the pulse passes through the first delay element before the clock edge, the master latch C-element is tri-stating when the slave latch becomes transparent. As mentioned before, when the C-elements are tri-stated, the setup nodes are floating. This allows the charge from SHold to write back to MSetup, flipping the logic level of the node. From the transient plot, you can see that after one $\mathrm{t}_{\delta}$, MFdbk goes low, and since MFdbk drives MHold when the clock is high, MHold also goes low. This causes the master latch C-element to tri-state for another $\mathrm{t}_{\delta}$ instead of driving MSetup back low. Since the feedback path of the master latch has stabilized in the incorrect state, MHold will stay low
and the incorrect value is captured, causing an upset.


Fig. IV-5. Simulation of an SET creating a charge feedback error.
Additional circuitry in the temporal flip-flop consists of an input inverter, 2:1 transmission gate multiplexers in both latches, and output inverters to generate complementary outputs Q and QN . The output inverters for this design have increased drive strength that is 4 times that of a minimum sized inverter for the process. The multiple of four stems from the maximum amount of capacitance a driven node can attach to CMOS gates and still quickly switch state. The increased output drive strength minimizes the loading effects of high capacitance nodes that the flip-flop may be driving.

## C. Simulation

The TFF was comprehensively simulated to confirm the validity of the circuit's hardening against SETs and SEUs. The Verilog-A model described in Chapter 3 was used for all the simulations in order to insure that an accurate representation of an ion strike's effects was applied to the circuit.

Fig. IV-6 shows the standard operation for the temporal flip-flop in a rising edge triggered configuration. When the clock is low the nodes MHold and dMHold switch, following the inverse of D , with a separation of $\mathrm{t}_{\delta}$. While these nodes are not equal, the master C-element tri-states, displaying the slight voltage shift on MSetup mentioned above. MSetup transitions once MHold and dMHold agree. When the clock goes high, the slave latch becomes transparent the slave latch storage node is written and Q will switch after the delays of one transmission gate and one inverter. Similarly to the master latch, the node SSetup will not stabilize until the nodes SHold and dSHold agree after one $\mathrm{t}_{\delta}$. This simulation was completed with a 250 MHz clock frequency at a $\mathrm{V}_{\mathrm{DD}}=1.2 \mathrm{~V}$. Using the Synopsys tool NCX, which will be described below, hardened $\mathrm{t}_{\text {SETUP }}$ and $\mathrm{t}_{\text {HoLD }}$ times were found to be 853 ps and -59.9 ps respectively. $\mathrm{t}_{\mathrm{CLK} 2 \mathrm{Q}}$ for the TFF was found as 133 ps.


Fig. IV-6. Simulation of operation with temporal flip-flop at 250 MHz .
Fig. IV-7 shows the flip-flop operation in the event an SET occurs on the input node, D. In this simulation, D is pulled low by an SET 400 ps in pulse width, or $\mathrm{t}_{\text {SET }}$, when the master latch is open (clock is low). MHold goes high for 400 ps and drives the delay element to switch dMHold high after $\mathrm{t}_{\delta}$. However, since 400 ps is less than $\mathrm{t}_{\delta}$ in this design, the nodes MHold and dMHold will not agree within the duration of the SET induced high value. The pulse is then mitigated by the C-element/delay element combination and the master storage cell does not flip. For the duration of the disagreement between MHold and dMHold, the node MSetup decreases in voltage while the C-element tri-states but does not switch the logic level. This simulation was also run at $\mathrm{V}_{\mathrm{DD}}=1.2 \mathrm{~V}$.


Fig. IV-7. Simulated SET on input node D, showing mitigation potential of the temporal flip-flop.
The third simulation, shown in Fig. IV-8, expresses the necessity of the feedback delay element by creating a pulse on SSetup. At $\mathrm{t}=5 \mathrm{~ns}$, a pulse is created on SSetup of 400 ps . As this pulse travels to SHold and dSHold, there is a delay of one $\mathrm{t}_{\delta}$, allowing the deposited charge to be removed from SSetup. It is imperative for collected charge to be removed from SSetup before SHold and dSHold switch logic values causing the C-element to tri-state, in order to prevent propagation of the incorrect state to both sides of the latch, avoiding an SEU.


Fig. IV-8. Simulated SET on SSetup
Finally, a simulation depicting the TFF's hardness to SETs on the clock node is shown in Fig. IV-9. In this simulation, the slave latch is originally closed until an SET of $\mathrm{t}_{\text {SET }}=400 \mathrm{ps}$ glitches the clock high, temporarily making the slave transparent. Since the value stored in the slave is the opposite of what is stored in the master latch in this simulation, a pulse is passed from the master latch to the slave latch for the duration of $\mathrm{t}_{\text {SET }}$. The SET will be less than $\mathrm{t}_{\delta}$ and the value passed prematurely from the master to the slave will have a pulse width of $\mathrm{t}_{\text {SET }}$. SHold and dSHold will never have this pulse's value at the same moment in time and therefore the C -element will not switch SSetup, thus preventing an SEU. However, because Q and QN are generated directly from SHold, the outputs will glitch for the duration of the SET.


Fig. IV-9. Simulation of an SET on the CLK node.

## D. Physical Design

The physical design of this flip-flop was implemented in the TSMC 130 nm fabrication process. Following [19], vertical interleaving was employed between four flip-flops to create a multi-bit cell. The use of vertical interleaving spaces critical nodes to decrease the probability of simultaneous, multiple node charge collection while maintaining high transistor density across the cell. Four interleavings take place in one temporal flip-flop bit, two in each latch. This splits the flip-flop into the five sub cells A-E. These interleaving space the hold and delayed hold nodes driving the C-elements by interleaving the $\delta$ delay element between them. If both the hold and delayed hold nodes in a latch collected the
same type of charge in the same strike, holes or electrons, the C -element would switch states and an SEU would occur. However, in the event that only one of these nodes collects charge, the C-element will tri-state but the storage cell will not flip. Fig. IV-10 and Fig. IV-11 show the interleaved constituent cells of the single flip-flop implemented across the four-bit cell. The schematic shown in Fig. IV-10 maps the sub cells to the actual circuit divisions of a single temporal flipflop. The red boxes shown in the Fig. IV-11 highlight the interleaving path of one flip-flop.


Fig. IV-10. Temporal flip-flop schematic expressing divisions of interleaving.


Fig. IV-11. Multi-bit cell layout with the interleaved nature of one flip-flop progressing across the cell.

In the foundry process used the power rails on the top and bottom of standard cells separate rows through the use of metal 1 and diffusion routes. This means that vertical interleaving had to be completed using vertical metal 2 and some horizontal metal 3 routes. While the limited amount of interleaving does not expend all the available routing tracks for metal 2 , considerations for power grid routing must be taken into account. These considerations must space metal 2 tracks to allow room for vias from metal 8 to be placed without difficulty. More on this topic will be discussed in later in the chapter in the Synthesis and APR section.

The cell physical design matches the standard cell height and intermediate cell layers to a commercially available, unhardened, fully tapped standard cell library available from the foundry.

## E. Power Consumption Analysis

Simulations were run on the temporal flip-flop to determine the effect that temporal hardening has on power consumption. The circuit simulated consisted of ten temporal flip-flops chained in a shift register configuration. The outputs of these FFs were loaded with a fan-out of four minimum sized inverters. The simulation was run at a temperature of $25^{\circ} \mathrm{C}, \mathrm{V}_{\mathrm{DD}}=1.2 \mathrm{~V}$ and at the typical process corners. To accurately assess power consumption in realistic operating conditions, power was measured on a per flip-flop bit basis for activity factors ranging from $\alpha=0$ to $40 \%$. At $\alpha=0$, only the clock power for the temporal flip-
flop is shown and was measured at 14.60 fJ and then at $\alpha=40 \%$, the circuit dissipates 103.8 fJ .

For a comparison, the same simulation was run on the single bit, unhardened D flip-flop, whose schematic is similar to that in Chapter 1, Fig. I-3(b). For this circuit, the energy consumption was measured at 21.96 fJ and 36.83 fJ for activity factors of 0 and $40 \%$ respectively. The results of both these simulations are shown in Fig. IV-12, along with the energy consumption of the four delay elements in a single temporal flip-flop. At $0 \%$ activity factor, the clock power of the temporal flip-flop is $33 \%$ less than that of the unhardened version. However, as the activity factor increases, the delay elements begin to greatly affect the power consumption. At $40 \%$ activity factor, the temporal flip-flop dissipates 2.8 times that of the unhardened flip-flop and the delay elements comprise of $75 \%$ of the total consumption at this level. Other notable points on the chart are when the temporal flip-flop and unhardened version dissipate the same amount of energy (approximately $\alpha=4 \%$ ), and when the temporal flip-flop's power consumption is twice that of the unhardened flip-flop (approximately $\alpha=18 \%$ ).


Fig. IV-12. Plot of power consumption by activity factor for the unhardened flip-flop and the temporal flip-flop.

## F. Size Analysis

The unhardened flip-flop cell in the foundry provided library is $8.74 \mu \mathrm{~m}$ long with a height of one standard cell, or $3.69 \mu \mathrm{~m}$ for a total area of $32.25 \mu \mathrm{~m}^{2}$. Comparatively, the hardened, multi-bit cell has a length of $36.80 \mu \mathrm{~m}$ and is four standard cell rows high, or $14.76 \mu \mathrm{~m}$ total height. This cell has an area of 543.17 $\mu \mathrm{m}^{2}$, which can be divided into $135.79 \mu \mathrm{~m}^{2}$ per bit. When comparing these two cells, it is obvious that the hardened multi-bit flip-flop is significantly larger (4.2x) than the unhardened version. Most of this difference is due to the temporal hardening technique utilized in the form of the four delay elements. Each delay element measures $6.00 \mu \mathrm{~m}$ in length and $3.69 \mu \mathrm{~m}$ high, adding $22.14 \mu \mathrm{~m}^{2}$ to the area, about $68 \%$ of the foundry flip-flop size. The four delay elements combine to
comprise of $85.5 \%$ of the size difference between these two flip-flops. Also, $8.5 \%$ is due to the inverter INVBW and the use of the C-elements, as opposed to standard inverters, in the storage cells for both latches. The final $6 \%$ of size discrepancy can be attributed to optimizations not made in the hardened flip-flop due to the interleaved nature of the multi-bit cell and the unhardened flip-flop being efficiently designed by automated tools.

## G. Synthesis and APR Implementation

To further acknowledge the substantial size and speed penalties, the use of the hardened, multi-bit cell in a practical application, through synthesis and auto place and route (APR) design flows. These flows were completed using Synopsys Design Compiler (DC Shell) for the synthesis step and Cadence Encounter for the APR.

To properly complete these methodologies, the individual TFF had to be characterized and a liberty file, or .lib, had to be generated and then formatted for a multi-bit implementation. The characterization of the flip-flop was completed using Synopsys NCX, an automated character ization program that analyzes the setup, hold, and propagation delays for both combinational and sequential logic. In this case, a sample .lib was formatted to fit the input and output terminals of the single bit TFF. This .lib worked in conjecture with a configuration file to comprise the two input files for NCX. A key for the configuration file is to instruct NCX to run multiple iterations of timing analysis at varying D to CLK edge times using the commands:

```
set constraint true
```

```
set shpr_constraint true
```

With this feature enabled, NCX runs multiple iterations of the setup and hold time analysis without the assumption that there are infinite hold and setup times respectively. The hold times are originally set at the unhardened levels and then shifted by approximately unhardened $\mathrm{t}_{\text {SETUP }} / 2$ and $\mathrm{t}_{\text {SETUP }}$ for iterations 2 and 3 respectively to calculate the new setup times. Hold times are the calculated using these setup times. This setting is necessary to provide a "hardened" setup time for the circuit. The configuration file also points the program to which netlist to simulate.

When calculating setup times for a flip-flop or latch, the tool assumes an infinite hold time, and the inverse while calculating hold times. For unhardened sequential logic, this provides an accurate calculation and if the TFF is characterized with this convention, a setup time of about $\mathrm{t}_{\delta}$ will be reported. However, the temporal hardening requires that an additional $\mathrm{t}_{\delta}$ be added to the setup time in order for a flip-flop to properly operate in hardened conditions. Fig. IV-13 is a transient plot explaining the worst case setup time for the TFF is shown. In this example, D switches just over $2 \mathrm{t}_{\delta}$ before the clock edge and an SET quickly occurs, pulling D low for about 300 ps . Since the SET is mitigated at least one $\mathrm{t}_{\delta}$ before the clock edge, the master latch is able to capture the proper D value. In the event that D switches closer to the clock edge, an SET of sufficient $\mathrm{t}_{\text {SET }}$ can keep the C-element tri-stated long enough for the previous, and incorrect, D value to be captured, causing an upset.


Fig. IV-13. Transient plot depicting the worst case setup scenario for the TFF where an SET occurs within two $\mathrm{t}_{\delta}$ from the rising clock edge. (After [43])

Running multiple iterations of NCX, one assuming a setup time of $\mathrm{t}_{\delta}$ and one assuming a hold time of $-\mathrm{t}_{\delta}$ allows the tool to report hardened characterization times for the TFF. NCX creates and runs HSPICE simulations for every timing value required by the input .lib and then outputs a .lib with updated timing information specific to the circuit netlist that the configuration file points to. Since this program does not handle multi-bit cells, the simulation had to be run on a single bit TFF. When reviewing the Liberty User Manual, a specific format for multi-bit cell netlists must be followed to designate their use in .lib files, through either bus or bundle command lines. After an analysis of how DC Shell works with both formats, it was decided that bundles would be used for the .lib formatting. An excerpt from the Liberty User Manual shows a general format
when using bundles:

```
cell(inv) {
    area : 16 ;
    cell_leakage_power : 8 ;
bundle(Z) {
        members(Z0, Z1, Z2. Z3) ;
        direction : output ;
        function : "D" ; }
bundle(D) {
    members (D0, D1, D2, D3) ;
    direction : input ;
    capacitance : 1 ; } }
```

This format was manually applied to the NCX output .lib. Additionally, the data from the multiple iterations run by NCX that provide hardened setup and hold times was moved to replace the unhardened timing since the NCX output .lib does not automatically complete this step. This final .lib was placed into LC Shell, another Synopsys tool, to generate the second file DC Shell needs, a .db.

Once the .lib and .db files were generated, the synthesis and APR design flows were able to be run using the hardened, multi-bit cell. Since the large cell contained four TFFs, a second multi-bit cell containing three TFFs was also created to accommodate any number of flip-flops in a design that is not a multiple of four. This cell was quickly created by tying the input of the fourth TFF in the multi-bit cell to ground and floating the output. Through a combination of the 3 and 4-bit cells, any design with a number of flip-flops greater than six can be accommodated. For example, if 30 flip-flops are in a design, six 4-bit cells and two 3-bit cells provide the 30 hardened flip-flops. However, in the block generated, only 4 bit cells were needed.

Synthesis and APR design flows were completed using the TFF for a stand -
alone block from an actual set of control logic with a frequency target of 125 MHz . For a comparison, the block was generated using the unhardened D flipflops used for comparison earlier. As expected, the hardened block was significantly larger than the unhardened version. Fig. IV-14 depicts both the hardened and unhardened blocks. The multi-bit cells are noticeable as the large grey boxes in (a) while the unhardened DFF is not visible in (b).

(a)

(b)

Fig. IV-14. APR results of the hardened block (a) and unhardened version (b). (After [43])
While the hardened version is noticeably larger, it is not 4.2 x the size of the unhardened version like the original size discrepancy between then TFF and unhardened D flip-flop. The actual size difference is 1.92 x . A quick analysis of the cell density across the block explains this difference. The unhardened block has a density of $77 \%$ while the block generated using the TFF is uses $87 \%$ of the
space provided[43]. The increased density is due to the tight layout of the multibit cells. Cadence Encounter did not place all the DFF cells as close as the TFF multi-bit cells therefore increasing the amount of white space left over between cells. Both blocks met the frequency requirement and Table III provides more specific details about the results of the two generated blocks.

TABLE III
Specific Data For APR Results Comparing Hardened And Unhardened Blocks

| Flip-Flop | X Dimension <br> $(\mu \mathrm{m})$ | Y Dimension <br> $(\mu \mathrm{m})$ | Area $\left(\mathrm{mm}^{2}\right)$ | Timing <br> $(\mathrm{MHz})$ | Density (\%) |
| :---: | :---: | :---: | :---: | :---: | :---: | | Hardened |
| :---: |
| 257.6 |
| Unhardened |

Comparatively, the 125 MHz timing constraint limited Encounter from creating a smaller, even denser hardened block.

## H. Conclusions

The inspiration for this temporal flip-flop provided a strong base to start the design. The final TFF circuit solved the issue of hardness on control and clock nodes found in the C-gate/DICE flip-flop proposed by Knudsen by introducing the temporal slave latch in favor of the DICE slave. The multi-bit layout implementation increases SET and SEU hardness by spacing critical nodes. However, the use of delay elements to mitigate SETs and SEUs creates a substantial penalty in both size and power consumption. This shows the importance of efficient delay element design when employing the technique of temporal hardening, which was discussed in Chapter 2.

## V. Radiation Hardened Flip-Flop Comparison

## A. Introduction

As mentioned in Chapter 2, there are numerous methods to nullify SETs and SEUs in CMOS circuits. Two of the most widely used RHBD techniques are temporally redundancy and by interlocking storage nodes. This chapter will compare two radiation hardened flip-flops to the TFF described in the previous chapter through speed, size and power consumption. Finally, these designs will be compared via SET and SEU hardness to the TFF through simulation. The two comparison flip-flops were designed by Knudsen and employ very similar schematics by combining a temporal master latch with a DICE slave latch [19][43]. One flip-flop uses majority voters in the master latch storage cell while the second improves on both size and power by replacing the majority voters with C-elements. At the end of each section, the data collected will be compared to the TFF and the unhardened D-flip-flop analysis from Chapter 4. First, it is necessary to do a brief comparison between the delay element used in the TFF to a standard inverter chain through size and power consumption. All three of the RHBD flipflops compared in this thesis use the TFF delay element

## B. Delay Element Comparison

As mentioned in the previous chapter, the delay element design used with the TFF consists of two inverter/capacitor combinations followed by two more inverters, creating a $\mathrm{t}_{\delta}=412 \mathrm{ps}$. This timing equates to eighteen chained minimum sized inverters. The layout for the TFF delay element is shown below in Fig. V-

1(a) and stretches for $6.090 \mu \mathrm{~m}$ in length. The 18 chained inverter layout is shown in Fig. V-1(b) and is $15.615 \mu \mathrm{~m}$ long. Both layouts are one standard cell height, $3.69 \mu \mathrm{~m}$.

(a)

(b)

Fig. V-1. Two delay element designs. (a) being the design described in Chapter 4 and (b) showing an inverter chain providing the same delay.

These designs show that the chained inverters are over 2.5 x the TFF delay element size. Next, energy consumption analysis was run on both chains at varying activity factors $(\alpha)$. Table IV shows these results.

TABLE IV
Energy Consumption Comparison For Two Delay Element Designs

| Delay Element |  | Activity Factor vs. Energy Consumption (fJ) |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: |
| $\alpha(\%)$ | 0 | 10 | 20 | 30 | 40 |
| TFF | 0.007 | 3.56 | 10.8 | 14.3 | 17.9 |
| INV Chain | 0.006 | 4.65 | 13.5 | 17.7 | 22.3 |
| TFF normalized <br> to INV Chain | 1.20 | 0.766 | 0.797 | 0.810 | 0.804 |

From this table, it is noticeable that the TFF delay element consumes about $20 \%$ less power than the inverter chain between activity factors of $10-40 \%$. Even though the TFF design has large capacitors that need to be charged, the 18 inverters combine to surpass the large capacitor energy dissipation penalty. This shows that the design used in the TFF is more efficient in both size and power consumption than a standard inverter chain providing a comparable $\mathrm{t}_{\delta}$.

A final comparison is the effect of ion strikes incident to nodes within each delay element. Fig. V-2 shows charge collection on the first node within the TFF delay element (a) and the inverter chain (b).

(a)

(b)

Fig. V-2. Simulated ionized particle strike on the first node of (a) the delay element used in the TFF and (b) a 18 chained inverter delay element.

The charge collected is equal for both simulations. However, the negative pulse created in the TFF delay element is 365 ps while the pulse in the inverter chain is 275 ps. This difference is due to the lower inverter drive strength in the TFF delay element. The smaller inverters induce a lower current that in turn increases the time it takes to remove any collected charge. An example of how this increased SET length from the weak delay elements can be hazardous is seen in the TFF feedback delay element. If a large amount of charge is collected in this circuit element, as opposed to the setup nodes, there is a higher danger of $\mathrm{t}_{\text {SET }}$ exceeding $\mathrm{t}_{\delta}$ and an upset occurring.

## C. Timing Comparison

Timing data for each flip-flop was collected by the use of the Synopsys characterization tool NCX. As previously described, this program characterizes both sequential and combinational logic and these results will provide hardened
setup, hold, and clock-to-Q times. Each flip-flop's speed will be compared as a combination of the minimum clock speed allowed ( $\mathrm{T}_{\text {SETUP }}+$ thold ) and the propagation delay after a clock edge $\left(\mathrm{T}_{\mathrm{CLK} 2 \mathrm{Q}}\right)$. Table V summarizes the data collected for this section. Basic operation for the two Knudsen flip-flops circuit will be described in this section as well.

## 1) Majority Voter TDFF

The schematic for the majority voter temporal/DICE flip-flop (MTDFF) is shown in Fig. V-3.


Fig. V-3. Majority voter temporal/DICE $\overline{\text { flip-flop. }}$ (After Knud-06])
Unlike the TFF, the temporal master latch in this design uses a delay element/majority voter combination in the feedback path and setup nodes with times of either $0 \delta, 1 \delta$, or $2 \delta$ temporally separating $\mathrm{N} 2, \mathrm{MDb}$ and MDDb ,
respectively. The three inverters preceding and succeeding the delay elements decrease the loading on the input inverter as well as the delay elements. As shown in Chapter 2, majority voter inputs add large capacitances to nodes and these driving inverters reduce the loading effects.

The slave latch draws its four inputs from the $1 \delta$ and $2 \delta$ nodes, MDb and MDDb, as well as the two identical majority voter outputs driven by the setup nodes. The PMOS pass gates separating the two latches are driven by CLKb and show that this flip-flop is in a rising-edge triggered configuration. The output inverters are redundant to limit the effect that a disagreement between interlocked DICE nodes can have on the output through contention. Standard operation for this circuit is shown below.


Fig. V-4. Proper MTDFF operation at a 250 MHz clock frequency.

In this simulation, when D switches when the clock is low, the nodes MDb and MDDb switch after $1 \mathrm{t}_{\delta}$ and $2 \mathrm{t}_{\delta}$. M0 and M 1 represent the same value in this circuit and switch following MDb since the node N 2 also drives the majority gates. Once the clock signal goes high, the output is changes one gate delay after the DICE latch is written. There is a dip on M0 at the first clock edge due to some charge back-writing between the master and slave latches. Similarly, MDb and MDDb dip after the second rising clock edge for the same reason.

The NCX data collected shows that the hardened setup and hold times for this design are 1193 ps and -215 ps , respectively. The setup time is about $3 \mathrm{t}_{\delta}$, since the timing tool measures the time it takes for both the $1 \mathrm{t}_{\delta}$ and the $2 \mathrm{t}_{\delta}$ delay elements to stabilize. The measured CLK-to-Q time for the circuit is 145 ps . These times were measured with an inverter preceding the CLKb pin in order to create an input CLK pin similar to the other three flip-flops analyzed. Because of this, $t_{\text {CLK2Q }}$ represents the propagation times of the CLK inverter, the output inverters, and the DICE latch being written.

## 2) $C$-Element TDFF

The final RHBD circuit being compared is a derivation of the MTDFF shown above and replaces the majority voters with C -elements in the temporal master latch. The DICE slave latch is identical to that of the MTDFF. Fig. V-5 shows the schematic for the C-element TDFF (CTDFF).


Fig. V-5. C-Element temporal/DICE flip-flop schematic. (After [43])
In this case, the master latch consists of delay element/C-element combinations in both the feed forward and feedback nodes of the bi-stable memory cell. The four DICE latch inputs are driven by the temporally separated inputs connected to each C-element, shown as nodes N1-N4. Charge back-writing from the slave latch to the master latch, as described in the previous chapter when addressing the necessity of INVBW, is an issue with this design. The inverters separating the master and slave latches prevent failures that could be caused by this backwriting. These inverters are not needed in the MTDFF design since majority voters do not enter a tri-state mode that would cause the output nodes to float.


Fig. V-6. Proper CTDFF operation at a 250 MHz clock frequency.
The Fig. V-6 above shows the basic CTDFF operation in a falling edge configuration. When the clock is low, the nodes N 1 and N 2 switch according to D and with $\mathrm{a}_{\delta}$ separation. Once N 1 and N 2 are equal, the first C -element begins to drive N 3 and N 4 switches one $\mathrm{t}_{\delta}$ later. When the latch is closed, the inverted values of $\mathrm{N} 1, \mathrm{~N} 2, \mathrm{~N} 3$ and N 4 get passed to the DICE slave latch nodes $\mathrm{X} 0, \mathrm{X} 2$, X1 and X3, respectively. Q switches after one gate delay, about 25 ps .

Measured values for the CTDFF give a hardened $\mathrm{t}_{\text {SETUP }}$ of 1371 ps and a hardened $\mathrm{t}_{\mathrm{HOLD}}$ of 370 ps The N1-N4 values must all be stabilized at the falling clock edge to keep the interlocked nodes $\mathrm{X} 0-\mathrm{X} 3$ from fighting each other and
quickly write the DICE slave latch. Thusly, both delay elements should be stabilized for the setup time to be met. For the CTDFF, $\mathrm{t}_{\text {CLK2Q }}$ was found to be 120 ps , which is simply the DICE slave latch write time and the output inverter propagation time.

## 3) Summary

When analyzed side by side, the MTDFF and CTDFF have very similar timing characteristics. The hardened setup times are approximately $3 t_{\delta}$ for both designs, with the MTDFF reporting times a bit larger due to the preceding and succeeding the delay elements in the temporal hardening circuitry. In both designs, a majority of the four temporally separated master latch outputs must be stabilized at the clock edge in order for the slave latch to be quickly written. The TFF only requires a hardened setup time of about $2 \mathrm{t}_{\delta}$ since there is no temporal hardening on the bi-stable memory cell feedback path and the slave latch input is drawn from the node SSetup.

The hardened hold times for the MTDFF and CTDFF are very close as well, Again, both of these times are much more negative than the TFF and unhardened FF hold times and this decrease can be attributed to the larger amount of master latch circuitry a logic value must propagate through before affecting the slave latch input.

Finally, the clock-to-Q times for the three hardened circuits are very similar. Measured times for the MTDFF and CTDFF being separated by only 25 ps can be attributed to the inversion on the MTDFF CLK input. After the appropriate CLK
level has been reached, the CTDFF and MTDFF $t_{\text {CLK2Q }}$ times require the DICE latch to be written and the output inverters to switch. In the TFF, the transmission gate on the slave latch and the two output inverters must stabilize before Q switches. The timing data for all four flip-flops are shown in Table V.

TABLE V
Timing Comparison of Four D Flip-Flops

| Mesign |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: |
| MTDFF | CTDFF | TFF | Unhardened |  |
| $\mathrm{T}_{\text {SETUP }}(\mathrm{ps})$ | 1193 | 1371 | 853 | 132 |
| $\mathrm{~T}_{\text {HOLD }}(\mathrm{ps})$ | -215 | -370 | -59.9 | -54.7 |
| Max CLK $(\mathrm{GHz})$ | 1.02 | 0.999 | 1.26 | 773 |
| $\mathrm{~T}_{\text {CLK2Q }}(\mathrm{ps})$ | 145 | 120 | 133 | 96.5 |

## D. Size Comparison

## 1) Majority Voter TDFF

Temporal hardening is known to be a high area impact solution for SET and SEU mitigation. The use of $\delta$ delay elements significantly increases the size of a design. Since the MTDFF requires three delay elements, the size penalty is substantial. Each delay element is $6.09 \mu \mathrm{~m}$ long and $3.69 \mu \mathrm{~m}$ high giving a 67.4 $\mu m^{2}$ penalty from the three delay elements alone. Conversely to the delay element's size penalty, the DICE latch provides a compact layout of only 3.800 $\mu \mathrm{m} \times 3.690 \mu \mathrm{~m}$, or $14.02 \mu \mathrm{~m}^{2}$. The full layout for the MTDFF is shown in Fig. V7.


Fig. V-7. MTDFF layout covering two cell heights.
The final layout spans over two cell rows, extending $22.020 \mu \mathrm{~m}$ on the top row and $17.725 \mu \mathrm{~m}$ on the bottom row for $39.745 \mu \mathrm{~m}$ in total length. This creates an area penalty of $146.66 \mu \mathrm{~m}^{2}$. The three delay elements are noticeable by the large capacitors, two on the top row and one on the bottom. Interconnect tracks on layers higher than metal 1 are not shown in any layouts to improve image clarity.

## 2) C-element TDFF

The CTDFF is a significant size improvement over the MTDFF by reducing the 12 transistor majority gates down to 4 transistor C -elements, and removing one delay element. However, the addition of four inverters used to mitigate charge sharing between the master and slave latches creates an area penalty not found in the MTDFF. Again, the DICE slave latch only requires $14.0 \mu \mathrm{~m}^{2}$. Fig. V-8 shows the layout for this flip-flop.


Fig. V-8. Complete CTDFF layout.

The final cell length is $28.625 \mu \mathrm{~m}$ and has a height of $3.690 \mu \mathrm{~m}$, creating a $105.6 \mu \mathrm{~m}^{2}$ footprint. Right away, it can be noticed that this design is the most compact of the three temporal flip-flops by far. Two delay elements can be seen on the left side of the layout and the DICE is placed on the far right.

## 3) Summary

As expected, all three RHBD flip-flops are significantly larger than the unhardened D flip-flop. The CTDFF provided the most compact hardened solution at only 3.27 x the unhardened flip-flop size. The TFF does not take advantage of a DICE slave latch and consequently and shows a significant size increase because of it. Finally the MTDFF is the largest of the three RHBD designs because of the majority gates and three delay elements needed in the master latch hardening circuitry. It should be noted that the four delay elements used in the TFF have comprise a much larger percentage of area, $86 \%$, than the two and three delay elements used in the CTDFF and MTDFF, $42.6 \%$ and $46.0 \%$ respectively. Table VI displays these results.

TABLE VI Size Comparison of RHBD Flip-Flops, Per Bit

| SIZE COMPARISON OF RHBD FLIP-FLOPS, PER BIT |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: |
|  | MTDFF | CTDFF | TFF | Unhardened |
| $\mathrm{X}(\mu \mathrm{m})$ | 3.69 | 3.69 | 3.69 | 3.69 |
| $\mathrm{Y}(\mu \mathrm{m})$ | 39.745 | 28.625 | 36.80 | 8.75 |
| Area $\left(\mu \mathrm{m}^{2}\right)$ | 146.66 | 105.62 | 135.8 | 32.3 |
| Area from delay elements | 46.0 | 42.6 | $86 \%$ | 0 |
| $(\%)$ |  |  |  |  |
| Size Normalized to | 4.54 | 3.27 | 4.20 | 1 |
| Unhardened |  |  |  |  |

## E. Power Comparison

The energy consumption simulations were run on the MTDFF and CTDFF are identical to those used to analyze the TFF and unhardened flip-flop in the previous chapter. Tables VII and VIII show the energy dissipation results for $\alpha=$ 0 to $40 \%$ and Fig. V-9 depicts a graphical representation of the results for all four flip-flops compared.

TABLE VII
MTDFF POWER DISSIPATION ANALYSIS WITH COMPARISON TO AN UNHARDENED D FLIP-FLOP

|  | Energy Consumption (fJ) |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: |
| $\alpha(\%)$ | 0 | 10 | 20 | 30 | 40 |
| MTDFF | 24.71 | 44.92 | 80.17 | 97.50 | 115.3 |
| Normalized to | 1.13 | 1.85 | 2.62 | 2.84 | 3.13 |
| Unhardened |  |  |  | 44 | 47 |
| $\%$ from Delay | 0.0 | 24 | 40 | 44 |  |

TABLE VIII
CTDFF POWER DISSIPATION ANALYSIS WITH COMPARISON TO AN UnHARDENED D FLIP-FLOP

|  | Energy Consumption (fJ) |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: |
| $\alpha(\%)$ | 0 | 10 | 20 | 30 | 40 |
| CTDFF | 5.69 | 16.73 | 41.22 | 53.88 | 64.92 |
| Normalized to | 0.26 | 0.69 | 1.34 | 1.57 | 1.76 |
| Unhardened <br> \% from Delay <br> Element | 0.0 | 43 | 52 | 53 | 55 |

It is obvious that the MTDFF is the most power hungry of the three temporal designs for all of the activity factors tested. The temporal slave latch in the TFF substantially increases the design's power consumption over the CTDFF, which is by far the most efficient temporal flip-flop.

A $0 \%$ activity factor shows the power consumed when only the clock pin is switching. At this level, the CTFF is substantially lower than the other three flipflops compared. One possibility is that this due to the flip-flop's falling edge
configuration. In the example of the MTDFF schematic, it can be seen that the inverted clock node drives substantially more transistors than in the CTDFF. This increases the capacitance needed to be charged and discharged every clock phase, thus increasing the clock power.

At $40 \%$ activity factor, the differences between the TFF and CTDFF can be easily approximated to the power consumption of two delay elements at that activity level. Similarly, the two majority voters and one delay element can comprise the increased power consumption of the MTDFF when compared to the CTDFF.


Fig. V-9. Plot comparing the power consumption of an unhardened flip-flop to three temporally hardened flip-flop across a range of activity factors.

In most cases, power consumption and size are directly proportional. When the delay elements are introduced, this relationship is compounded by the two large
capacitors from which a majority of the delay is drawn from. Fig. V-9 reiterates the substantial power penalty when hardening through temporal methods vs. unhardened circuitry at high activity factors.

## F. Hardness Comparison

Unlike the TFF, the Knudsen flip-flops do not address SETs on every input node. The following simulations show how the two designs mitigate SETs on the input node, D. There will also be simulations displaying how a DICE latch nullifies SEUs and how the Knudsen latches can fail if an SET reaches the clock input node. As with the TFF, multiple node charge collection is not considered due to the utilization of spatial hardening in the final design's layout by implementing a multi-bit cell in the same fashion as the TFF.

As mentioned previously, all three flip-flops analyzed are using the TFF delay element design, providing just over 400 ps for $\mathrm{t}_{\delta}$. Since temporal hardening relies on the temporal separation of nodes, these three flip-flops should have similar "hardness" levels. However, certain nodes on the MTDFF and CTDFF have increased transistor sizing which provides additional drive strength, such as N 2 in the MTDFF that drives the majority voters and an inverter. Simulations will be run at $\mathrm{t}_{\text {SET }}$ times below $\mathrm{t}_{\delta}$ for each design since, as described in Chapter 2, SETs greater than $\mathrm{t}_{\delta}$ will cause temporally hardened circuits to fail.


Fig. V-10. Simulated SET affecting the D input of the MTDFF.
In the simulation above, an SET duration of 400 ps is seen on the D input. The nodes $\mathrm{N} 2, \mathrm{MDb}$, and MDDb switch and then reset with a $\mathrm{t}_{\delta}$ separation between each one. Since none of those three nodes sustain a high logic value at the same moment, the nodes M0 (shown above) and M1 do not capture the incorrect SET value. Because of this, the stored master latch value does not deviate from the proper value and an upset is averted.


Fig. V-11. SET seen on the CLK node of a MTDFF causing an upset.
However, when an SET occurs on the CLK node, there is a possibility of an upset occurring. An example of this is shown in Fig. V-11 when an SET of $\mathrm{t}_{\mathrm{SET}}=$ 260 ps brings the clock node high. In this example, the value stabilized in the master latch is the opposite of that stored in the DICE slave latch. When the clock goes high, the DICE latch is written and the output switches after one gate delay. This simulation shows that any sustained positive SET on the CLK node can cause an upset to this temporal master/DICE slave design.

The CTDFF mitigates SETs and SEUs in an almost identical fashion to the MTDFF. The SET occurs with a $\mathrm{t}_{\text {SET }}=400 \mathrm{ps}$ the nodes N 1 and N 2 pulse with a $\mathrm{t}_{\delta}$ separation and a width of $\mathrm{t}_{\text {SET }}$. At no time do they agree in value, thus keeping
the N3 from changing state and preventing the SET from propagating through the latch. This keeps an upset from occurring in the master latch that would be passed to the slave at the falling clock edge.


Fig. V-12. Simulated SET on the D input of the CTDFF.


Fig. V-13. Simulated SET on the CLK node of a CTDFF.
As with the MTDFF, this design is also has the possibility to upset if an SET occurs on the CLK node. The simulation in Fig. V-13 depicts such an event when the value stored in the master latch is opposite that of the slave latch. When the clock goes low (since this latch is in a falling edge triggered configuration) for the duration of the propagating SET, 180 ps , the slave latch is written and the output switches one gate delay later. In this situation, if the CLK SET has a duration longer than the time it takes to write the DICE latch, an upset will occur.

While all three hardened flip-flops analyzed in this thesis mitigate D-input SETs and SEUs at the same level, only the TFF is hardened against all control signal and CLK node SETs. This allows the TFF design to be integrated with standard CAD tool generated clock trees.

## G. Conclusions

Through the three comparisons provided in this chapter, a benchmark for temporally hardened flip-flops was derived. The CTDFF was shown to be an improvement over the MTDFF in size and power consumption without sacrificing speed or temporal hardness. When compared to the TFF, while the CTDFF is more compact and power efficient but is out performed in speed and "hardness" measurements.

Using the C-element/delay element combination in only the feed forward path of the latch memory cell greatly increases the speed of the TFF over the CTDFF by approximately one $\mathrm{t}_{\delta}$. The temporal slave latch found in the TFF surpasses the hardening capabilities of a DICE latch when paired with a temporal master. All three of the temporal designs display the large size and power consumption along with the slow operating speeds that are expected with temporal RHBD techniques.

## VI. CONCLUSION

An innovative RHBD D flip-flop has been presented. This design combines two temporally hardened latches to mitigate SETs on all input nodes as well as SEUs on any internal nodes and requires only one C-element/delay element combination per latch. The delay elements used in the temporal hardening schematic consist of low current drive inverters followed by large capacitances to maximize the propagation time of pulses passing through the circuitry.

By comparing the TFF to an unhardened D flip-flop, it was found that the delay elements caused a severe penalty in size, power consumption and speed. The TFF was found to be 4.2 x the size of the unhardened flip-flop per bit and while it consumed less power than the unhardened version at low activity factors, an activity factor of $40 \%$ results in $2.8 x$ the power dissipation of the unhardened flip-flop. The delay elements also increase the setup time of the flip-flop by $1 \mathrm{t}_{\delta}$ and an additional $1 t_{\delta}$ was shown to be required for a "hardened" setup time. This creates a total speed penalty of $2 \mathrm{t}_{\delta}$ for the TFF over the unhardened flip-flop. Finally, a multi-bit cell was created and placed in both the Synthesis and APR methodologies. The resulting block was 1.75 x that of an unhardened version and $24 \%$ slower. However, the transistor density provided by the multi-bit cell translated to the generated blocks allowing for a $13 \%$ increase in cell density.

Two temporal/DICE master slave flip-flops designed by Knudsen [19] were analyzed in the same fashion to compare the TFF design with other hardened flipflops. The MTDFF was found to be slower, larger, and more power consuming
than the TFF. However, the CTDFF used the compact layout of a DICE latch and only two delay elements to provide a total area impact that is $22 \%$ less than the TFF. Additionally, this design is $47 \%$ more efficient than the TFF. However, the C-element/delay element combinations in both storage cell paths increase the setup time for the CTDFF by another $\mathrm{t}_{\delta}$, making it about $1 \mathrm{t}_{\delta}$ larger than the TFF. Finally, while the MTDFF and CTDFF mitigate SEUs and SETs on the D node effectively, both are subject to failures if an SET occurs on the clock node. The TFF was shown to not fail in these situations.

While the TFF is limited in application to relatively large, high power designs, the complete SET and SEU hardness insures proper operation when addressing soft errors. Since the weakness of the temporal hardening employed in both the master and slave latches lies in the delay elements, further research should focus on this area to create a low power, compact delay element.

## REFERENCES

[1] N. H. E. Weste, D. Harris, CMOS VLSI Design: A Circuits and Systems Perspective, Boston, MA; Pearson Education, Inc, 2005.
[2] R. J. Baker, CMOS Circuit Design, Layout, and Simulation, Hoboken, New Jersey; John Wiley \& Sons, Inc., 2004.
[3] "Radiation," Def. 1a. Webster's Encyclopedic Unabridged Dictionary of the English Language, 1995.
[4] G. C. Messenger and M. S. Ash, Single Event Phenomena, New York, NY; Chapman \& Hall, 1997.
[5] V. Liberali, "A radiation-hardened-by-design SRAM memory in commercial CMOS technology," presented at CMOS Emerging Technologies, May, 2010
[6] F. W. Agrawal, "Single Event Upset: An Embedded Tutorial," International Conference on VLSI Design, pp. 429-434, Feb. 2008.
[7] M. J. Gadlage, et al., "Single Event Transient Pulse Widths in Digital Microcircuits," Trans. Nuc. Sci., pp. 3285-3290, vol. 51, no. 6, Dec. 2004.
[8] B. Narasimham, et al., "Characterization of Digital Single Event Transient Pulse-Widths in $130-\mathrm{nm}$ and $90-\mathrm{nm}$ CMOS Technologies," Trans. Nuc. Sci., pp. 2506-2511, vol. 54, no. 6, Dec. 2007.
[9] P. E. Dodd, et al., "Production and Propagation of Single-Event Transients in High-Speed Digital Logic ICs," Trans. Nuc. Sci., pp. 3278-3284, vol. 51, no. 6, Dec. 2004.
[10] G. Bruguier, J. M. Palau, "Single Particle-Induced Latchup," Trans. Nuc. Sci., pp. 522-532, vol. 43, no. 2, April, 1996.
[11] F. W. Sexton, "Destructive Single-Event Effects in Semiconductor Devices and ICs," Trans. Nuc. Sci., pp. 603-621, vol. 50, no. 3, June, 2003.
[12] Yu, F.X., et al., "Overview of radiation hardening techniques for IC design," Inform. Technol. Jour., pp. 1068-1080, vol. 6, 2010.
[13] T. Hoang, et al., "A Radiation Hardened 16-Mb SRAM for Space Applications," IEEE Aerospace Conference, pp. 1-6, March, 2007.
[14] J.M. Slaughter, et al., "Magnetic Tunnel Junction Materials for Electronic Applications," JOM, vol. 52, no. 6, June, 2000.
[15] S. Kim, et al., "Thermal Stability of Magnetic Tunnel Junctions with $\mathrm{FeO}_{\mathrm{x}}$ Doped Tunnel Barrier," IEEE Trans. on Mag., pp. 2284-2286, vol. 40, no. 4, July, 2004.
[16] K. J. Hass, "Magnetic Flip Flops for Space Applications," Trans. Nuc. Sci., pp. 2751-2753, vol. 42, no. 10, Dec. 2006.
[17] H. Zhenfeng, L. Huanguo, "A Novel Radiation Hardened by Design Latch," Journal of Semiconductors, vol. 30, no. 3, March, 2009.
[18] D. Mavis, P. Eaton, "Soft error rate mitigation techniques for modern microcircuits," Proc. IEEE Reliability Physics Symposium, pp. 216-225, 2002
[19] J. Knudsen, L. T. Clark, "Area and Power Efficient Radiation Hardened by Design FlipFlop," Trans. Nuc. Sci., pp. 3392-3399, vol. 53, no. 6. Dec. 2006.
[20] R. L. Shuler, et al. "The Effectiveness of TAG or Guard-Gates in SET Suppression Using Delay and Dual-Rail Configurations at $0.35 \square \mathrm{~m}$," Trans. Nuc. Sci., pp. 3428-3431, vol. 53, no. 6, Dec. 2006.
[21] R. L. Shuler, et al., "SEU Performance of TAG Based Flip-Flops," Trans. Nuc. Sci., pp. 2550-2553, vol. 52, no. 6, Dec. 2005.
[22] B. Narasimham, et al., "Extended SET Pulses in Sequential Circuits Leading to Increased SE Vulnerability," Trans. Nuc. Sci., pp. 3077-3081, vol. 55, no. 6, Dec. 2008.
[23] J. Benedetto, et al., "Digital Single Event Transient Trends with Technology Node Scaling," Trans. Nuc. Sci., pp. 3462-3465, vol. 53, no. 6, Dec. 2006.
[24] J. Benedetto, et al. "Heavy ion induced digital single-event transients in deep submicron processes" Trans. Nuc. Sci. vol. 53, no. 6, Dec. 2006.
[25] T. Calin, et al., "Upset hardened memory design for submicron CMOS technology," IEEE Trans. Nuc. Sci., pp. 2874-2878, vol. 43, Dec. 1996.
[26] O. A. Amusan, et al. "Single Event Upsets in a 130 nm Hardened Latch Design Due To Charge Sharing" $45^{\text {th }}$ Annual International Reliability Physics Symposium, Phoenix, AZ, April, 2007.
[27] K. Warren, et al. "Heavy Ion Testing and Single Event Upset Rate Prediction Considerations for a DICE Flip-Flop," Trans. Nuc. Sci., vol. 56, no. 6, Dec. 2009.
[28] S. H. Lin, H. Z. Yang, "Reliable SR latches design using local redundancy," Electronic Letters, vol. 43, no. 2, Jan. 2007.
[29] L. Wang, et al., "Low-Overhead SEU-Tolerant Latches," International Conference on Microwave and Millimeter Wave Technology, pp. 1-4, April, 2007.
[30] W. Wang, "High Performance Radiation Hardened Register Cell, Design on Standard CMOS Process," Conference on Electron Devices and Solid-State Circuits, pp. 513-515, Dec. 2003.
[31] G. Niu, et al. "A Comparison of SEU Tolerance in High-Speed SiGe HBT Digital Logic Designed With Multiple Circuit Architectures," Trans. Nuc. Sci., pp. 3107-3114, vol. 49, no. 6, Dec. 2002.
[32] W. Wang, H. Gong, "Sense Amplifier Based RADHARD Flip Flop Design," Trans. Nuc. Sci., pp. 3811-3815, vol. 51, no. 6, Dec. 2004.
[33] Y. Sasaki, et al., "Soft Error Masking Circuit and Latch Using Schmitt Trigger Circuit," Int. Symp. on DFT, pp. 327-335, Dec. 2006.
[34] S. Lin, et al. "Soft-Error Hardening Designs of Nanoscale CMOS Latches," VLSI Test Symposium, pp. 41-46, May, 2009.
[35] C. Carmichael, et al., "SEU Mitigation Techniques for Virtex FPGAs in Space Applications," Xilinx, Inc. [Online], Available : http://www.xilinx.com/esp/mil_aero/collateral/presentations/SEU_mitigation_technique.pdf .
[36] R. Naseer, J. Draper, "DF-DICE: A Scalable Solution for Soft Error Tolerant Circuit Design," ISCAS, pp. 4, Sept. 2006.
[37] D. Blum, et al. "Delay and Energy Analysis of SEU and SET-Tolerant Pipeline Latches and Flip-Flops," Trans. Nuc. Sci., pp. 1618-1628, vol. 56, no. 3, June 2009.
[38] H. J. Barnaby, et al. "Modeling Ionizing Radiation Effects in Solid State Materials and CMOS Devices," Trans. Nuc. Sci., pp. 1870-1883, vol. 56, no. 8, Aug.. 2009.
[39] D. Fulkerson, et al., "Modeling Ion-Induced Pulses in Radiation-Hard SOI Integrated Circuits" Trans. Nuc. Sci., pp. 1406-1415, vol. 54, no. 4, Aug. 2007.
[40] D. Kobayashi, et al. "Analytical Expression for Temporal Width Characterization of Radiation-Induced Pulse Noise in SOI CMOS Logic Gates," Reliability Physics Symp., pp. 165-169, April, 2009
[41] T. A. Fjeldly, et al. "Modeling of High-Dose-Rate Transient Ionizing Radiation Effects in Bipolar Devices," Trans. Nuc. Sci., pp. 1721-1730, vol. 48, no. 5, Oct. 2001.
[42] D. Hansen, et al., "Clock, Flip-Flop, and Combinatorial Logic Contributions to the SEU Cross Section in 90 nm ASIC Technology," Trans. Nuc. Sci., pp. 3542-3550, vol. 56, no. 6, Dec. 2009.
[43] B. Matush, T. Mozdzen, L. T. Clark, "Area Efficient Temporally Hardened by Design FlipFlop Circuits," Presented at the 2010 NSREC, Denver, CO, July 2010.

