Uploaded image for project: 'camunda BPM'
  1. camunda BPM
  2. CAM-14727

Serve engine metrics in Prometheus format

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Unresolved
    • Icon: L3 - Default L3 - Default
    • None
    • None
    • engine
    • None

      User Story (Required on creation): 

      As an operator, I can monitor/alert engine metrics as part of my operation monitoring tool of choice.

      Functional Requirements (Required before implementation):

      Technical Requirements (Required before implementation):

      • Create an internal representation of tagged metric samples (counters and gauges, a sample can have multiple tags)
      • Enable the job executor implementations to expose thread performance (active threads, available threads, blocked threads) where possible
      • Create an internal metrics collector that tracks samples
        • Should be pluggable in the engine configuration
        • Collects reported usage metrics from ACT_RU_METER_LOG
        • Collects performance metrics from the job executor (access via engine configuration)
        • Fetches number of jobs in backlog from the ACT_RU_JOB table
        • (optional) Makes the metrics collection interval configurable in the engine configuration
          • Used by the DbMetricsReporter to fetch the known metrics and to store them in ACT_RU_METER_LOG
          • We might optimize the reporter to only store non-zero metrics in the database to spare unnecessary data
        • Makes the list of collected usage and performance metrics configurable in the engine configuration
      • Documentation
        • What metrics do we collect (usage vs. performance)?
        • How do we collect metrics?
        • How can you influence this with configuration?
        • How can you provide your own metrics collector?
      • Create a REST endpoint (preferably under “/metrics/prometheus”) that exposes collected metrics in Prometheus format
        • Normalize metric names so they conform to Prometheus format (Snake case)
        • Serve Prometheus format based on requested result format (see MetricsServlet and Exporter)
        • Fetch metrics from configured engine collector via API
        • Transform internal metrics samples to Prometheus samples

      Limitations of Scope (Optional):

      Metrics not in scope

      • Job configuration (can be part of the diagnostics interface)
        • core-pool-size
        • max-pool-size
        • queue-capacity
        • wait-time-in-millis
        • max-wait
        • max-jobs-per-acquisition
      • Process metrics (should be monitored in Optimize)
        • Process Instances Started (by process definition)
        • Process Instances Ended (by process definition)
        • Process Instance Cycle Time
        • Number of incidents
      • Troubleshooting metrics
        • Can decrease performance.
        • They can be implemented as needed as a custom extension of the metrics collector
      • Health metrics (SUPPORT-10327)

      Hints (optional):

        This is the controller panel for Smart Panels app

              Unassigned Unassigned
              tobias.metzke Tobias Metzke-Bernstein
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: