A fault-tolerant computer architecture for space vehicle applications

Loading...
Thumbnail Image

Date

2012

Journal Title

Journal ISSN

Volume Title

Publisher

Montana State University - Bozeman, College of Engineering

Abstract

The discovery of new methods to protect electronics from harsh radiation environments outside earth's atmosphere is important to the future of space exploration. Reconfigurable, SRAM-based Field Programmable Gate Arrays (FPGAs) are especially promising candidates for future spacecraft computing platforms; however, their susceptibility to radiation-induced faults in their configuration memory makes their use a challenge. This thesis presents the design and testing of a redundant fault-tolerant architecture targeted at the Xilinx Virtex-6 FPGA. The architecture is based on a combination of triple modulo redundancy (TMR), numerous spare units, repair (scrubbing), and environmental awareness. By using the spares and the partial reconfiguration capabilities of the FPGA, the system can remain operational while repair of damaged modules proceeds in the background. The environmental awareness is supplied by a multi-pixel radiation sensor designed to rest above the FPGA chip, providing information about which areas of the chip have received radiation strikes. The system places these potentially damaged areas first in the queue for scrubbing. Four implementations of the architecture with different types of computing module and numbers of spares reveal its versatility and scalability. These four demonstration systems were modeled with theoretical Markov calculations, for the purpose of determining their reliability. They were also implemented on Xilinx hardware and tested by the injection of simulated faults, based on realistic orbital fault rate data from the Cosmic Ray Effects on Micro-Electronics Code (CREME96) tool. These results confirm that the systems will be highly reliable under typical earth orbit conditions. The results also demonstrate that the inclusion of numerous spares and the sensor both lead to substantial improvements in the Mean Time Before Failure, over a traditional TMR system with only three modules and scrubbing.

Description

Keywords

Citation

Copyright (c) 2002-2022, LYRASIS. All rights reserved.