

**T-UT** 

TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU



## SERIES K: PROTECTION AGAINST INTERFERENCE

# ITU-T K.131 – Soft error measures for field programmable gate arrays

ITU-T K-series Recommendations - Supplement 11



#### **Supplement 11 to ITU-T K-series Recommendations**

#### ITU-T K.131 – Soft error measures for field programmable gate arrays

#### **Summary**

Supplement 11 to ITU-T K-series of Recommendations describes soft error mitigation for field programmable gate arrays (FPGAs). FPGAs are a mainstream component of recent large-scale integrated circuits (LSIs), and many FPGAs are used as the main component in equipment for communication. First, this Supplement describes trends of soft error rates corresponding with the miniaturization of manufacturing process rules for semiconductors, and outlines mitigation techniques such as materials, physical layout and design tools that FPGA vendors provide to users. Second, this Supplement discusses the design methodology of communication equipment, including consideration of reliability specification by using these mitigation measures. Finally, this Supplement discusses recent trends for mitigation measures for FPGAs.

#### History

| Edition | Recommendation    | Approval   | Study Group | Unique ID*         |
|---------|-------------------|------------|-------------|--------------------|
| 1.0     | ITU-T K Suppl. 11 | 2017-11-22 | 5           | 11.1002/1000/13475 |
| 2.0     | ITU-T K Suppl. 11 | 2018-09-21 | 5           | 11.1002/1000/13793 |

#### Keywords

Error correction, FPGA, soft error.

<sup>\*</sup> To access the Recommendation, type the URL http://handle.itu.int/ in the address field of your web browser, followed by the Recommendation's unique ID. For example, <u>http://handle.itu.int/11.1002/1000/11</u> <u>830-en</u>.

#### FOREWORD

The International Telecommunication Union (ITU) is the United Nations specialized agency in the field of telecommunications, information and communication technologies (ICTs). The ITU Telecommunication Standardization Sector (ITU-T) is a permanent organ of ITU. ITU-T is responsible for studying technical, operating and tariff questions and issuing Recommendations on them with a view to standardizing telecommunications on a worldwide basis.

The World Telecommunication Standardization Assembly (WTSA), which meets every four years, establishes the topics for study by the ITU-T study groups which, in turn, produce Recommendations on these topics.

The approval of ITU-T Recommendations is covered by the procedure laid down in WTSA Resolution 1.

In some areas of information technology which fall within ITU-T's purview, the necessary standards are prepared on a collaborative basis with ISO and IEC.

#### NOTE

In this publication, the expression "Administration" is used for conciseness to indicate both a telecommunication administration and a recognized operating agency.

Compliance with this publication is voluntary. However, the publication may contain certain mandatory provisions (to ensure, e.g., interoperability or applicability) and compliance with the publication is achieved when all of these mandatory provisions are met. The words "shall" or some other obligatory language such as "must" and the negative equivalents are used to express requirements. The use of such words does not suggest that compliance with the publication is required of any party.

#### INTELLECTUAL PROPERTY RIGHTS

ITU draws attention to the possibility that the practice or implementation of this publication may involve the use of a claimed Intellectual Property Right. ITU takes no position concerning the evidence, validity or applicability of claimed Intellectual Property Rights, whether asserted by ITU members or others outside of the publication development process.

As of the date of approval of this publication, ITU had not received notice of intellectual property, protected by patents, which may be required to implement this publication. However, implementers are cautioned that this may not represent the latest information and are therefore strongly urged to consult the TSB patent database at <u>http://www.itu.int/ITU-T/ipr/</u>.

#### © ITU 2018

All rights reserved. No part of this publication may be reproduced, by any means whatsoever, without the prior written permission of ITU.

#### **Table of Contents**

#### Page

| 1      | Scope                        |                                                                                         |    |
|--------|------------------------------|-----------------------------------------------------------------------------------------|----|
| 2      | References                   |                                                                                         |    |
| 3      | Definitions                  |                                                                                         | 1  |
|        | 3.1                          | Terms defined elsewhere                                                                 | 1  |
|        | 3.2                          | Terms defined in this Supplement                                                        | 1  |
| 4      | Abbrevi                      | bbreviations and acronyms                                                               |    |
| 5      | Conven                       | onventions                                                                              |    |
| 6      | Mitigation measures of FPGAs |                                                                                         | 2  |
|        | 6.1                          | Soft error mitigation measures and transition of improvement contents and their effects | 2  |
|        | 6.2                          | Soft error mitigation measures                                                          | 4  |
|        | 6.3                          | Example of execution for improving SR/MR                                                | 7  |
|        | 6.4                          | Technology trends                                                                       | 8  |
| Biblio | graphy                       |                                                                                         | 11 |

#### Introduction

The field programmable gate array (FPGA) is the mainstream component in recent large-scale integrated circuits (LSIs). The configuration data for determining the circuit configuration of an FPGA is stored in static random access memory (SRAM) so that arbitrary functions can be implemented. For that reason, an FPGA may be susceptible to soft errors, which have a large influence on the system. Therefore, it is important to implement mitigation measures into the circuit design stage of an FPGA. This Supplement summarizes design examples of soft error mitigation measures for the SRAM-type FPGA.

#### Supplement 11 to ITU-T K-series Recommendations

#### ITU-T K.131 – Soft error measures for field programmable gate arrays

#### 1 Scope

This Supplement describes: recent trends in field programmable gate arrays (FPGAs) that show increasing soft error rates; outlines device-level mitigation measures such as physical layout and software tools; and, provides guidance on mitigation measures to be taken when communication equipment is designed. This Supplement provides additional information in support of enhancing the understanding of [ITU-T K.131].

#### 2 References

[ITU-T K.131] Recommendation ITU-T K.131 (2018), Design methodologies for telecommunication systems applying soft error measures.

#### 3 Definitions

#### 3.1 Terms defined elsewhere

None.

#### **3.2** Terms defined in this Supplement

None.

#### 4 Abbreviations and acronyms

This Supplement uses the following abbreviations and acronyms:

| BRAM   | Block Random Access Memory          |
|--------|-------------------------------------|
| CRAM   | Configuration Random Access Memory  |
| DRAM   | Dynamic Random Access Memory        |
| ECC    | Error Correction Code               |
| FinFET | Fin Field Effect Transistor         |
| FIT    | Failure in Time                     |
| FPGA   | Field Programmable Gate Array       |
| IC     | Integrated Circuit                  |
| LSI    | Large-Scale Integrated Circuit      |
| MR     | Maintenance Reliability             |
| RoHS   | Restriction of Hazardous Substances |
| ROM    | Read Only Memory                    |
| SEU    | Single Event Upset                  |
| SR     | Service Reliability                 |
| SRAM   | Static Random Access Memory         |

#### 5 Conventions

None.

#### 6 Mitigation measures of FPGAs

# 6.1 Soft error mitigation measures and transition of improvement contents and their effects

In general FPGAs, the static random access memory (SRAM) cell structure has been used for configuration random access memory (CRAM) (configuration memory) and block random access memory (BRAM) (user memory). Therefore, mitigation measures against soft errors have been required.



Figure 6.1-1 – Example of CRAM soft error rate vs. feature size (technology node) (See [b-Xilinx-UG116])



Figure 6.1-2 – Example of configuration memory capacity and device soft error rate trend (See [b-Xilinx-UG116] and [b-Xilinx-Site])

Figure 6.1-1 shows an example of soft error rate vs. feature size (technology node) in CRAM. As shown in Figure 6.1-2, the FPGA vendor implemented various mitigation measures, which show that the soft error rate has been improved after the 150 nm feature size. Products which use the 20 nm and the 16 nm feature size saw an improvement in the soft error rate more than that of the conventional trend curve. Although the 20 nm product is planar field effect transistor (FET), the adopted, circuit countermeasure had a remarkable result. The result that the 16 nm product adopted fin field effect transistor (FinFET) also appears significant

Improvements are due to progress made investigating the influence of soft errors on field programmable gate array (FPGAs) as well as to the application areas of FPGAs. This is relevant not only in various terrestrial systems, but also in aerospace systems such as the Mars Rover, etc.

Figure 6.1-2 shows the configuration memory capacity of the mainstream product under each feature size (design rule) and the device soft error rate trend based on published information. It shows that the device soft error rate becomes higher because the number of CRAM is increased due to the gradual increase of the FPGA circuit scale. At 20 nm and 16 nm FPGAs, the soft error rate improvement made by the countermeasure of circuits and transistor structure change (from planar FET to FinFET) worked more effectively than increasing CRAMs. It is a cause of the device's soft error rate improvement. This is conspicuous with the 16 nm product, though, as the improvement is greater than that of the 20 nm product.

However, while the telecommunication system requires higher performance each year FPGA has become a core key unit of telecommunication systems and the number of used FPGA units has increased. The soft error rate of the device was improved by the FPGA vendor's mitigation measures, but the influences of soft error for telecommunication systems may not have been. Therefore, soft error mitigation measures are required.

The major measures of soft error rate reduction/mitigation that have been applied are discussed in the following clauses.

#### 6.1.1 Improvement example on semiconductor IC wafer process material

The boron compounds, which have been used as one of semiconductor integrated circuit (IC) wafer process materials, contain <sup>11</sup>B and <sup>10</sup>B. An alpha ray is generated when a thermal neutron strikes boron <sup>10</sup>B, which could become a cause of soft errors. Soft error mitigation measures for semiconductor ICs have been developed, such as by minimizing <sup>10</sup>B content by purification of the material or the introduction of semiconductor IC manufacturing techniques that do not use boron compounds. Similar measures have been attempted in FPGA products.

#### 6.1.2 Improvement in semiconductor IC packaging materials

The occurrence of soft errors in semiconductor ICs, at dynamic random access memory (DRAM), was initially considered problematic. Its main cause was alpha rays emitted from trace amounts of radioactive isotopes contained in the IC packaging materials. For this mitigation measure, introducing ceramic materials reduced the radioactive isotope content. The transition of filler in mold compounds from natural raw materials to materials produced from radioactive isotope-free materials, polyimides for alpha ray shielding, etc., have also been addressed. Similar mitigation measures have been taken with IC packaging materials used in FPGAs; however, it is also necessary to consider the influence of alpha rays from alpha ray sources contained in solder bump materials and underfill materials used in flip chip packages. Thus, mitigation measures reduce the alpha ray sources used for these materials. ultra-low alpha ray (ULA) materials are currently used for advanced FPGAs.

When considering soft error mitigation measures in equipment development, the soft error rate caused by both neutron rays and alpha rays should be combined at the design stage. However, since an irradiation of accelerated alpha rays to the equipment is hard to implement for soft error evaluation, only neutron irradiation testing using an accelerator-driven neutron source is performed.

#### 6.1.3 SRAM layout improvement

#### Soft error of SRAM



Figure 6.1-3 – Soft error mechanism due to neutron ray in SRAM (See [b-Xilinx-WP395])

As shown in Figure 6.1-3, soft errors in SRAM are caused by the influence of charged particles produced by the collision of a neutron in silicon atoms to the nodes in the SRAM. Soft error improvements in SRAM have been introduced through methods that include suppressing logical inversion by increasing critical charge and increasing load capacitance of the corresponding node.

#### 6.1.4 SRAM circuit layout improvement

The miniaturization of semiconductor ICs has increased the possibility that a plurality of adjacent SRAM cells will generate soft errors at the same time (multi-bit error). If a multi-bit error occurs within the simultaneous read subject bits, it is possible that the error detection and correction code generation will not be properly performed. For this reason, using a more powerful error detection and correction code has become more common and is needed to correct a multi-bit error directly. It is also more common to adopt a bit interleave configuration that distributes the simultaneous read subject bits into multiple, different words.

#### 6.2 Soft error mitigation measures

FPGA vendors provide various mitigation measures to reduce the influence of soft errors. The mitigation measures for BRAM and CRAM are explained in this clause.

#### 6.2.1 Soft error mitigation measures for BRAM

For BRAM used as user memory, data errors from soft errors can be mitigated by using error correction code (ECC) functions, which correct errors in read data.



Figure 6.2-1 – Example of soft error mitigation measures for BRAM by ECC (See [b-Altera] and [b-Xilinx-UG473])

NOTE – The error correction function implemented in the BRAM corrects the read data, and the bit error of the memory in the BRAM cannot be corrected.

#### 6.2.2 Soft error mitigation measures for CRAM

#### 6.2.2.1 Variety of soft error reduction macro feature

FPGA vendors develop macros to reduce the impact of CRAM soft errors and provide them as part of their FPGA development tools. These soft error reduction macros have the following four functions:

- 1) CRAM soft error detection function;
- 2) CRAM soft error correction function;
- 3) CRAM classification function;
- 4) CRAM error injection function.



Figure 6.2-2 – Example of soft error reduction macro configuration (See [b-Altera] and [b-Xilinx-WP395])

#### 6.2.2.2 CRAM soft error detection/correction function

The soft error detection/correction function is a function for detecting and correcting a CRAM bit error; these functions operate in the background of the circuit created by the user.

The soft error detection function of the CRAM detects if an error has occurred in any one of the CRAM bits. Since the number of CRAM bits can be significant, error detection cannot be performed simultaneously for all CRAM bits, but rather is performed by scanning each CRAM frame. (See Figure 6.2-2.)

The soft error correction function of the CRAM automatically overwrites the detected error bit with the restoration data and restores it to its original state.

As error correction completes in a short time immediately following the detection of an error, that time can be almost ignored with respect to the detection scan time.

#### [Expected effect]

By enabling the soft error detection/correction function, if the scan period (T) of error detection is less than or equal to the duration time of the client signal interruption defined in the service reliability (SR) requirements, it can be excluded from the failure in time (FIT) number count of SR requirements. (See Figure 6.2-3).

5

NOTE 1 – Since the number of CRAM bits of the FPGA depends on the number of logic of the FPGA, the scan period in each FPGA is not constant. As the number of logic increases, the scan period may become longer.

In the current 28 nm and 20 nm feature size FPGAs, the nominal value of the scan period (T) for error detection is less than 100 ms. Since the scan period differs for each FPGA vendor and device, it is necessary to check the scan period beforehand at the time of functional specification examination.

NOTE 2 – The CRAM soft error correction function is to correct CRAM bit errors M. It is possible that user circuit errors are not recovered even after the CRAM error is corrected.



Figure 6.2-3 – Example of an operation when a soft error occurs

#### 6.2.2.3 CRAM classification function

The classification function of the CRAM classifies CRAM used for the whole user circuit or the CRAM used for some blocks in the user circuit in accordance with a vendor's own rules, as "used" CRAM, and it has the capability to notify the classification result.

By using the FPGA vendor design tool, the user can check the information on the CRAM error bits against the user circuit usage information, and relate them to each other in advance. The occurrence of the error bit is judged by whether it matches to "used" CRAM or not, and the user is notified of the result.

#### [Expected effect]

By enabling the classification function, it becomes possible to distinguish whether the soft error has occurred in the "used" CRAM bit or not, so unnecessary error information can be reduced. Therefore, an effective use of this detection notification can be one of SR / maintenance reliability (MR) measures at the system level.

NOTE – Classification rules for CRAM classification functions are different for each FPGA vendor. Therefore, a conversion to FIT rate by soft error must follow classification specifications.

CRAM classified as "used" by the classification function may include unused CRAM which may not affect user circuit operation in addition to the CRAM used for the user circuit. Therefore, even when a CRAM classified as "used" CRAM causes a soft error, it is not necessarily indicating a functional failure of the user circuit. A fault rating taking account of other fault detection results may be effective in some cases.

When using the classification function, in some cases, external read only memory (ROM) may be necessary and should be used per SR requirements.

#### 6.2.2.4 CRAM error injection function

With the CRAM error injection function, errors can be inserted into the actual CRAM.

It is possible to check the CRAM error detection/correction/notification function at the system level by using the CRAM error injection function.

#### 6.3 Example of execution for improving SR/MR

Examples of methods to improve SR/MR toward a desired reliability class are discussed in the following clauses.

#### 6.3.1 Improvement in BRAM

By using the ECC circuit provided by the FPGA vendor, the soft error rate can be improved to nearly zero FIT. It is strongly recommended to use an ECC circuit.

#### 6.3.2 Improvement in CRAM

By using the soft error mitigation measures provided by the FPGA vendor, it is possible to reduce the soft error rate. Examples of methods for reducing the soft error rates are discussed in the following clauses.

#### 6.3.2.1 Relationship on SR/MR system and direction of improvement

Soft errors affect network services and facility maintenance. An influence of SR/MR in system operation follows:

- it is subjected into the SR class when the network signal is continuously interrupted;
- it is not subjected into the SR class when the network signal is recovered within the time in the SR class;
- it is subjected into the MR class if any circuit packs and/or any IC device are not recovered in the network system;
- it is not subjected into the MR class when all circuit packs and/or all devices are automatically recovered.

#### 6.3.2.2 Examples of SR/MR improvement measures

1) Examples of methods for improving SR:

The network signal interrupted time can be improved by using the following measures:

- error detection/correction function provided by FPGA vendor;
- classification function provided by FPGA vendor;
- strength of SEU tolerance at user logic (redundancy, etc.).
- 2) Examples of methods for improving MR

The following methods can improve MR when any circuit packs and/or any IC device are not recovered in the network system:

- automatic FPGA reconfiguration and reinitialization;
- automatic circuit pack reinitialization.

#### 6.3.2.3 Example SR/MR mitigation procedure

An example of mitigation procedures for improving FIT number of SR/MR is shown in Figure 6.3-1.



Figure 6.3-1 – Example of a method for improving SR/MR

#### 6.4 Technology trends

#### 6.4.1 Introduction of FinFET

The use of the fin field effect transistor (FinFET) started from 22 nm or 16 nm technology.



Drain – Substrate contact area

Drain – Substrate contact area is organized a junction. In Fin-FET case, relative Drain – Substrate contact area is narrower than Planer-FET, significantly.

#### Figure 6.4-1 – Schematic of Planar FET and FinFET and schematic of drain – substrate contact part (See [b-Xilinx-WP472])

Although the planar field effect transistor (FET) has been used conventionally, the relation between the capability of the FET and the drain – substrate junction area tends to relatively increase the contact area of the junction as the miniaturization of the semiconductor technology progresses, and the relative influence of soft error tends to increase. However, using the FinFET greatly reduces the contact area of the relative junction. (See Figure 6.4-1.) As a result, it was expected that an FPGA using FinFET would greatly improve its soft error rate [FIT/Mb] together with various other improvements.

It was confirmed that the soft error rate [FIT/Mb] significantly improved when compared with the FPGA products in which 20 nm Planar FET was used. The soft error rate has also been improved for the FPGA products using 16 nm finFET transistor. However, next-generation finFET causes an increase of the soft error rate [FIT/Mb] as did miniaturization in the past. It is also predicted that the increase of CRAMs in a device will continue. Therefore, the influence of soft error for the telecommunication systems may increase in the future.

#### 6.4.2 Semiconductor IC package trend

To comply with restrictions on the use of hazardous substances (RoHS) compliance, lead-free IC packaging materials are being promoted. Lead-free solder bumps are also beginning to be applied at the mass production level in flip chip packages. Since solder bumps are positioned in the vicinity of the transistor circuit, removing lead as an alpha ray emitting material is a direction towards reducing soft errors.

#### 6.4.3 Trend of FPGA vendor provided circuits

An increase in the SRAM memory capacity, as the scale of the FPGA product expands, leads to an increase in time needed for a soft error detection circuit to detect a CRAM soft error by using a CRAM area scan. , It is, however, possible to reduce the detection time in the whole device by mounting a plurality of detection circuits and operating the detection function in parallel.

In the error correction circuit after the detection, the correction time has been reduced and the multi-bit error correction capability has been improved. Concern exists over future expansion of multi-bit error due to miniaturization of transistors, but efforts are being made to eliminate concerns by improving multi-bit error correction capability with memory layout interleaving.

The improvements of an error injection circuit will be applied in the future. It will realize an ease of use, and a capability of multi-bit error injection instead of the existing single-bit error injection. These improvements will present a capability of more realistic neutron ray soft error event.

### Bibliography

| [b-Altera]       | ALTERA White Paper WP-01135-1.0 (2010), <i>Enhancing Robust SEU</i><br><i>Mitigation with 28-nm FPGAs</i> .<br><u>https://www.altera.com/content/dam/altera-www/global/en_US/pdfs/literature/wp/wp-01135-stxv-seu-mitigation.pdf</u>                           |
|------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [b-Xilinx-Site]  | Xilinx Documentation Website, <i>Data Sheets, Application Notes and User Guides for each product family.</i><br>https://www.xilinx.com/support.html#documentation                                                                                              |
| [b-Xilinx-UG116] | Xilinx Device Reliability Report UG116 v10.9 (2018).<br>https://www.xilinx.com/support/documentation/user_guides/ug116.pdf>                                                                                                                                    |
| [b-Xilinx-UG473] | Xilinx User Guide UG473 v1.12 (2016), 7 Series FPGAs Memory<br>Resources.<br>https://www.xilinx.com/support/documentation/user_guides/ug473_7Series_Memory_Resources.pdf                                                                                       |
| [b-Xilinx-WP395] | Xilinx White Paper WP395 v1.1 (2015), <i>Mitigating Single-Event Upsets</i> .<br>https://www.xilinx.com/support/documentation/white_papers/wp395-Mitigating-SEUs.pdf                                                                                           |
| [b-Xilinx-WP472] | Xilinx White paper WP472 (v1.0, December 2015), <i>Xilinx Multi-node</i><br><i>Technology Leadership Continues with UltraScale+ Portfolio "3D on 3D"</i><br><i>Solutions</i> .<br>https://www.xilinx.com/support/documentation/white_papers/wp472-3D-on-3D.pdf |

#### SERIES OF ITU-T RECOMMENDATIONS

| Series A | Organization of the work of ITU-T                                                                                                                         |  |
|----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| Series D | Tariff and accounting principles and international telecommunication/ICT economic and policy issues                                                       |  |
| Series E | Overall network operation, telephone service, service operation and human factors                                                                         |  |
| Series F | Non-telephone telecommunication services                                                                                                                  |  |
| Series G | Transmission systems and media, digital systems and networks                                                                                              |  |
| Series H | Audiovisual and multimedia systems                                                                                                                        |  |
| Series I | Integrated services digital network                                                                                                                       |  |
| Series J | Cable networks and transmission of television, sound programme and other multimedia signals                                                               |  |
| Series K | Protection against interference                                                                                                                           |  |
| Series L | Environment and ICTs, climate change, e-waste, energy efficiency; construction, installation and protection of cables and other elements of outside plant |  |
| Series M | Telecommunication management, including TMN and network maintenance                                                                                       |  |
| Series N | Maintenance: international sound programme and television transmission circuits                                                                           |  |
| Series O | Specifications of measuring equipment                                                                                                                     |  |
| Series P | Telephone transmission quality, telephone installations, local line networks                                                                              |  |
| Series Q | Switching and signalling, and associated measurements and tests                                                                                           |  |
| Series R | Telegraph transmission                                                                                                                                    |  |
| Series S | Telegraph services terminal equipment                                                                                                                     |  |
| Series T | Terminals for telematic services                                                                                                                          |  |
| Series U | Telegraph switching                                                                                                                                       |  |
| Series V | Data communication over the telephone network                                                                                                             |  |
| Series X | Data networks, open system communications and security                                                                                                    |  |
| Series Y | Global information infrastructure, Internet protocol aspects, next-generation networks,<br>Internet of Things and smart cities                            |  |
| Series Z | Languages and general software aspects for telecommunication systems                                                                                      |  |