This reference document describes the metrics that are automatically derived from logs collected by Loki in the NAIS platform.

loki:service:loglevel:count1m

This metric represents the count of logs aggregated over a 1-minute window, categorized by service and log level.

Description ΒΆ

The loki:service:loglevel:count1m metric provides a pre-aggregated count of log entries for each 1-minute interval, grouped by service, namespace, cluster, and log level. This metric is particularly useful for:

  • Monitoring the volume of logs at different severity levels
  • Setting up alerts for unusual increases in error or warning logs
  • Creating dashboards to visualize logging patterns across services
  • Identifying services with excessive logging

Labels ΒΆ

LabelDescriptionExample Values
service_nameThe name of the service or application that generated the logsmy-app, user-api
service_namespaceThe Kubernetes namespace where the service is runningteam-a, default
k8s_cluster_nameThe name of the Kubernetes clusterdev-gcp, prod-gcp
detected_levelThe log level or severity of the log entrieserror, warn, info, debug, trace

Usage Examples ΒΆ

Prometheus Query Examples ΒΆ

Count of error logs for a specific service in the last hour:

promql

Ratio of errors to total logs for all services in a namespace:

promql

Alert Examples ΒΆ

Alert on high number of error logs:

promql

Best Practices ΒΆ

  • Do not use increase() or rate() functions with this metric, as it is already pre-aggregated for 1-minute intervals
  • For longer time ranges, use range vector selectors like [60m:1m] to sample at 1-minute intervals
  • Consider setting appropriate thresholds for alerts based on your application's normal logging behavior
  • Combine with other metrics (like HTTP status codes) for more comprehensive service health monitoring