Skip to main content

Background: What is a Failure?

A Failure is Temporal's representation of various types of errors that occur in the system.

There are different types of Failures, and each has a different type in the SDKs and different information in the protobuf messages (which are used to communicate with the Temporal Cluster and appear in Event History).

The Temporal Cluster and SDKs emit metrics that can be used to monitor performance and troubleshoot issues. To collect and aggregate these metrics, you can use one of the following tools:

  • Prometheus
  • StatsD
  • M3

After you enable your monitoring tool, you can relay these metrics to any monitoring and observability platform.