∑ Observability (Logs, Monitoring, Alerting)

This Map of Contents aggregates notes related to the three pillars of Observability: Logs, Monitoring, and Alerting.

可观测性不只是把 Prometheus、Loki、告警机器人都装上,而是回答 3 个连续问题:

  • Monitoring:系统现在怎么样了。
  • Logs / Traces:刚才到底发生了什么。
  • Alerting:什么时候需要把人叫醒,以及叫醒谁。

Current Backbone

Task Threads

Overview

📊 Monitoring

Collecting and analyzing metrics to understand the state of the system.

Strategy & Configuration

Specific Monitors

📝 Logs

Recording events to understand what happened.

Implementation & Practice

Troubleshooting & Experience

Solution Specific (SLS etc)

🔔 Alerting

Notifying relevant parties when metrics cross thresholds or specific events occur.

🛠 Tools Stack

ELK & Loki

Prometheus & Grafana

Cloud Services

up:: ∑ 项目与工作管理