Federal agencies understand that the health of their large-scale equipment is mission-critical. From fleets of machinery to military installations to aerospace, success often depends on ensuring the mission is not impacted by costly asset downtime and extending the life of the assets supporting the mission.

Asset health management is costly and complex

An ecosystem of sensor data from equipment can provide engineers and maintainers with asset health insight. The Internet of Things (IoT) has increased the volume and velocity of asset data. The expectation is that the increase in data would improve the asset health management program. In reality, the amount of data dilutes the efficacy of asset health management, and many agencies and industries remain increasingly challenged with high-cost profiles across sustainment due to the increasing complexity of component and system repairs.

Move from passive to proactive

Prognostics and health management (PHM) is an asset lifecycle concept that uses an integrated approach for the health management of a system. PHM enables real-time health assessment of a system under its actual operating conditions—as well as the prediction of its future state—by incorporating various disciplines, including sensing technologies, failure physics, machine learning, modern statistics, and reliability engineering. Because PHM can predict the system’s actual remaining life during its operation, it enables the condition-based maintenance (CBM), a strategy that can repair/replace soon-will-be (or before actual) damaged parts and reduce the total asset lifecycle costs.

Asset Health Management Maturity
Evolution of maintenance strategy – shifts away from corrective (unscheduled, passive) or preventive (scheduled, active) maintenance strategy to the condition-based (predictive, proactive) ones.

Fixing something before it breaks is more efficient and cost-effective than fixing it after it breaks. PHM helps:

  • Predict when a failure may happen— estimate remaining useful life (RUL)
  • Schedule repairs ahead of time, not after the failure, thus avoiding downtime and improving productivity
  • Extend the life of assets and defer new purchases, reducing the cost and complexity of repairs
  • Continuously monitor the asset condition

These benefits are driving federal agencies and large asset industries to adopt the proactive approach of predictive maintenance technologies and practices in their asset health management program.

Data is the underlying challenge

For highly complex systems, it can be difficult to establish a PHM program with predictive and sustainment maintenance capabilities. These platforms often monitor and interpret thousands and thousands of sensor statuses and error codes. For such systems, a human operator quickly enters the realm of information overload, and just applying analytical models without understanding the complex details of the underlying health monitoring and reporting system usually will not yield accurate health predictions.

A successful PHM program addresses four main challenges: data acquisition, diagnostics, prognostics, and health management.

DATA ACQUISITION collects measurement data from SCM, maintenance systems, and asset sensors and processes it to extract useful features for diagnosis.

The Challenges:

  • Data has been discarded, maintenance and repair data are handwritten or in a different system, not calibrated, not labeled, or is poor quality.
  • Too much sensor log data in a format not readily consumable for decision support

DIAGNOSTICS, in which the fault is detected for any anomaly, is isolated to determine which component is failing and how severe it is with respect to the failure threshold.

The Challenges:

  • Failures are unpredictable; error codes from various sensors are often reported simultaneously, in bursts or in sequences, and it is hard to identify the root cause or detect the symptoms of failure due to sympathetic part failures (i.e., multiple failure indications for one issue).
  • Assets and systems are frequently updated or may be reconfigured for different mission assignments resulting in different failure scenarios.
  • Multiple sensors used to measure the health condition of a same part or component resulting in repetitive warnings or faults.

PROGNOSTICS predicts how long it will take until failure under the current operating condition.

The Challenges:

  • When using a model-based approach, physical failure models for many components do not exist.
  • When using a data-driven approach, sensors might not measure the right variables at the ideal locations to provide the best information for machine learning algorithms to create the effective and efficient model for a component.

HEALTH MANAGEMENT, which is the optimal manner of maintenance scheduling and logistics support.

The Challenges:

  • The underlying data, systems, processes, and prognostic capabilities must be in place before you can achieve optimal health management.

Among these, prognostics is the key enabler that permits the reliability of a system to be evaluated in its actual life cycle conditions. In other words, prognostics predicts the time at which a system or a component will no longer perform its intended function, thus enabling agencies and asset owners to migrate system-level risks while extending remaining useful life.

A mission-ready machine learning solution

PHM is a broad domain, and within it, modern complex assets operate within an IoT construct, generating voluminous sensor, warning and fault data. Our expert data science team has developed a mission-ready machine learning approach to solve the inherent data, diagnostic, and prognostic sensor data challenges of PHM and includes the following components:

Data Aggregation: The system’s sensor, warnings, and error code data are acquired, initial data engineering and ETL are performed, and the dataset is prepared for analysis.

Pattern Markov Chains (PMC) Exploitation and Bayesian Network Inference (BNI) Framework: This solution framework contains the required algorithms to detect and identify error correlations, errors that are likely to occur together, and candidates of likely root cause errors in a given dataset.

  • PMCs provide a greater visual understanding of the underlying error codes and correlate sequences of codes across platforms, configurations, and updates. Machine learning is then intelligently employed to assess the state of the equipment and to predict Remaining Useful Life (RUL).
  • Using the correlated data generated by PMC, we apply BNI to validate error causality, and identify root cause of errors systematically and automatically from end-to-end.

RUL Modeling and Prediction Using Data-Driven Machine Learning Approach: Our approach constructs an effective and efficient model for a component or system directly from the sensor data by combining algorithms like long short-term memory (LSTM), regression, and Multivariate Auto-regressive State Space (MARSS).

By integrating machine learning prediction of the RUL with PMCs, our solution identifies sympathetic failures by discovering root causes, calculating time-to-failure for system components, and progressively establishing accurate predictive and sustainment maintenance capabilities.

Implementation for increased mission readiness

Our machine learning approach can be implemented as part of an integrated PHM system that supports processing data at the edge or in a datacenter. The agile solution can leverage an agency’s existing systems and capabilities, or recommend new advanced platforms such as combining Nvidia Tensor core GPUs, the Cisco Unified Computing System (UCS) and Cisco Intersight. These network and compute recommendations are specified based on the location, volume, and mission decision-support need of the data.

By combining scalable technology and compute with machine learning software stacks like TensorFlow and Scikit-Learn, we can deliver a powerful user-centric, data-driven visualization layer (e.g., OmniSci). The resulting solution allows asset managers, engineers, and maintenance support teams to see the failure predictions, target identified problem areas, make necessary adjustments, and increase the asset’s availability and mission effectiveness. With our machine learning PHM solution, agencies can achieve reductions in maintenance costs and machinery out-of-service time, ensuring high availability and increased mission readiness.

NT Concepts data scientists, data engineers, and software developers collectively employ our Machine Learning Life Cycle for both deep learning tasks and more traditional, unsupervised statistical methods. As AI/machine learning tasks and resultant models continuously evolve, the ability to develop quickly, pivot when necessary and update frequently is paramount. For this reason, we use an Agile AI development method, inserting feedback loops into the Machine Learning Life Cycle to allow for continuous iteration and rapid development sprints. To learn more about this proven approach, see our blog post The [Hidden] Challenges of Machine Learning: A Five-Part Series.