Click to learn more about author Dev Kakde.
Fifty years ago, astronauts Neil Armstrong, Buzz Aldrin and Michael Collins lifted off from Launch Pad 39A at NASA’s Kennedy Space Center and into history. Armstrong and Aldrin became the first humans to set foot on the Moon.
Spacecraft reliability ensured the success of Apollo 11, and it will continue to do so in future exploration of the Moon and Mars. A Mars mission is expected to take between two and half to three years . Per NASA, a key contributor to spacecraft reliability on such extended missions will the on-board Integrated System Health Management (ISHM) capability. 
The input to ISHM is sensor data measuring the health of vital spacecraft systems, such as propulsion, power supply, telecommunication and landing. ISHM analyzes this data with different algorithms to detect abnormal conditions or faults and provide early warning about impending failures or performance degradation. In future extended-duration missions, astronauts will maintain the spacecraft based on the analytical insights provided by ISHM. Machine learning algorithms are increasingly popular in ISHM for fault detection.
Many machine learning algorithms have been developed to analyze streams of sensor data generated by equipment, from big-rig trucks to high-performance aircraft to smart energy grids, and more. Event Stream Processing software can be installed in the equipment itself and analyze sensor data at their source using machine learning algorithms. Technologies used in space shuttles are now available across multiple industries, thanks to advances in sensor technology, computing power and machine learning.
Two specific machine learning techniques can be particularly useful to analyze sensor data for fault detection, are outlined in this article. Given the 50th anniversary of the Moon landing, I’ll use public data sets from NASA to illustrate these techniques.
This first technique illustrates sensor-data visualization using the Kernel Principal Component Analysis (KPCA), performed on space shuttle data .
The second shows analysis of a turbofan-degradation data set, from NASA’s Prognostics Center of Excellence (PCoE) at Ames Research Center [4,5], using Support Vector Data Description (SVDD).
Sensor Data Visualization using Kernel PCA
Most equipment using sensor technology for fault detection has more than one sensor, measuring key health parameters like temperature, pressure, etc. Hence such sensor data is multivariate. Often, the relationships between different sensor variables are nonlinear. A good sensor data visualization can help scientists and engineers understand the structure of the multivariate data, such as clusters in the data, to help detect when machines are starting to deviate from stable, normal operation. Kernel Principal Component Analysis (KPCA) is a useful technique to visualize multivariate data exhibiting non-linear relationships.
This example illustrates the use of KPCA to visualize sensor data from space shuttles. The NASA data set describes seven radiator positions with nine variables. About 80% of the data belongs to the normal “radiator flow” class and the rest relates to abnormal situations .
Figure 1.0 visualizes the shuttle data using the first three principal components obtained from the KPCA analysis. The marker color indicates the radiator flow class, with blue markers indicating normal “radiator flow” data. The figure indicates that using the first three principal components, one can obtain a good visualization of high-dimensional (eight dimensions in this case), multivariate data. The figure illustrates a good separation between different radiator positions. The normal “radiator flow” data is nicely separated from other classes.
Visualization of sensor data can offer many important insights. For example, can my sensor data separate faulty conditions from normal, stable ones? If there are clusters in the data corresponding to different equipment operating modes, then low-dimensional visualization can reveal those modes in the form of disjoint clusters in low-dimensional space.
Turbofan Degradation Data Analysis using SVDD
Turbofan engines are widely used in commercial airlines for their high thrust and fuel efficiency . Capturing the onset of degradation in these engines is important to carry out appropriate maintenance and prolong engine life.
This example illustrates the use of a machine-learning technique called Support Vector Data Description (SVDD) to model turbofan engine degradation. The data set used in from NASA [4,5]. SVDD is a useful technique for detecting anomalies, such as the onset of degradation, using multivariate sensor data collected from equipment.
Additionally, SVDD is a one-class classification technique, meaning it requires observations from only one class to train the model. In this example, that means data coming from normal or stable operations of a turbofan engine. This is an advantage, since most industrial equipment is inherently reliable. Sensor data on stable operations is abundant and often the only data available.
SVDD takes such one-class data as an input and builds a geometric description of the data, using a kernel function. This description closely matches the geometric features of the training data.
SVDD training produces two key statistics, the threshold value and a distance function. For new observations, the distance function is evaluated to compute that observation’s distance value. The distance value is then compared against the threshold value. If the distance value is more than the threshold value, the observation is designated as an outlier. Otherwise, if the distance value is less than the threshold value, the observation is designated as an inlier.
The turbofan dataset made available by NASA contains the flight history of 216 engines. This example uses a subset 15 engines selected from the original data.
For each flight, three variables are related to the engine’s operating conditions and 21 variables are for sensor measurements. Because each engine degrades at a different rate, the number of flights until the end of life is different for each engine. The analysis assumes that the first 25% of observations represent stable operations of the turbofan engine, with no or little performance degradation. Such data five engines is selected to train the SVDD model. The model is used to score the engines which were not part of the training data set. Each observation for an engine is scored to compute the distance value. Following figure 2 illustrates the distance value for each cycle for three engines, with id unique identifier value equal to 10, 20 and 30.
Machine learning algorithms are increasingly popular in for fault detection and predictive maintenance, using data to identify normal patterns and outliers that may need investigation and/or repair. As mankind takes its next giant leap into space, powerful advanced analytics will continue to aid rockets, capsules and rovers that take us back to the Moon and beyond.
 Schwabacher, Mark, and Kai Goebel. “A Survey of Artificial Intelligence for Prognostics.” AAAI Fall Symposium: Artificial Intelligence for Prognostics. 2007.
 Saxena, A., Goebel, K., Simon, D., and Eklund, N. (2008). “Damage Propagation Modeling for Aircraft Engine Run-to-Failure Simulation.” In Proceedings of the International Conference on Prognostics and Health Management, 2008, 1–9. Piscataway, NJ: IEEE.
 Goldstein, Markus, and Seiichi Uchida. “A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data.” PloS one 11.4 (2016): e0152173.