Monitoring & Logging
Security and stability can only be achieved through transparency.
I implement monitoring, logging, and audit systems that map the entire lifecycle of an infrastructure—from hardware sensors and network traffic to application and security events.
This allows failures, security incidents, and performance bottlenecks to be detected early, automatically evaluated, and documented for resolution. At the same time, a robust data basis is created for root cause analysis, capacity planning, and audit-proof evidence for internal or external requirements.

Architecture & Goals
I develop monitoring concepts that combine technical monitoring, security analyses, and compliance control in an integrated system.
The goal is to create a complete, correlatable picture of the infrastructure—without media breaks or proprietary dependencies.
- Centralized metrics, log, and alerting architecture
- Integration of network, storage, and application monitoring
- Uniform data collection via exporters, agents, and APIs
- Multi-tenant dashboards for infrastructure, clusters, and services
Tools & Technologies
I consistently work with open, extensible tools that can be integrated into any environment.
They enable a high degree of flexibility while also ensuring reproducibility and versioning.
- Prometheus for metric collection and time series analysis
- Grafana for visualization, dashboards, and reporting
- Alertmanager for escalation and notification
- Loki and systemd-journald for central log aggregation
- Node Exporter, Ceph Exporter, Postgres Exporter for host and service metrics
Security & Auditing
Monitoring is also a tool for security. I combine technical measurement data with security audits to ensure verifiable compliance with compliance and hardening guidelines.
- OpenSCAP scans for security and compliance audits
- ClamAV, Rspamd, Fail2ban for attack detection
- Syslog-based security monitoring with anomaly detection
- Archiving of audit logs in accordance with data protection and audit requirements
Automation & Integration

I automate monitoring and testing processes so that maintenance and security are no longer manual tasks.
Alerts, reports, and dashboards are automatically generated and versioned.
- Automated deployment of monitoring stacks (Ansible)
- Regular self-checks and recovery tests
- Reporting workflows for management or customers
- Integration into ChatOps or ticket systems
Evaluation & Optimization

The collected data is used not only for security purposes, but also to optimize performance and stability.
I use monitoring results to make fact-based capacity, energy, and cost plans.
- Trend and load analyses over time
- Performance comparisons between releases or environments
- Documented recommendations for action from audit reports
Compliance & Documentation

Security does not end with firewalls or log files—it must be comprehensively documented and verifiable.
I create structured security documentation that permanently records technical measures, authorization concepts, and audit results.
- Documentation of security guidelines, roles, and processes
- Markdown/Bookstack-based audit reports and manuals
- Evidence management for ISO 27001, BSI basic protection, or internal policies
- Integration of audit results into monitoring and reporting systems
- Handover documentation and lessons learned processes

Trainings
You can find specific trainings and current topics in the Comelio GmbH training catalog.
Available in-house at your company, as a webinar, or as an open training—designed to meet different requirements.
Frequently asked questions about Monitoring & Logging
In this FAQ, you will find the topics that come up most frequently in consultations and training sessions. Each answer is kept brief and refers to further content where necessary. Can’t find your question? Feel free to contact me.

Prometheus vs. OpenTelemetry – which one should you use for what?
Prometheus (Pull, Exporter, Recording Rules) is ideal for metrics and alerting. OpenTelemetry collects metrics/logs/traces and forwards them via the collector. In practice: Prometheus for metrics + Alertmanager; Loki/Tempo/Jaeger for logs/traces; OTel Collector as a bridge where necessary.
How can I avoid alarm floods and “blind” dashboards?
SLI/SLO-based alerting, multi-level routes (page → ticket → report), inhibition/silences in Alertmanager, dead man’s switch, clear runbooks. Only page based on user impact (error rate/latency); system details remain “ticket-only.”
What makes monitoring audit- and revision-proof?
Time and identity chain (NTP/PTP, host IDs), tamper-proof storage (e.g., WORM/Object Lock), seamless pipeline (journald/syslog → Loki/archive), traceable policies & retention, regular self-checks/reports (ISO/BSI-compliant).
