SO-Log Troubleshooting: Common Issues and Fixes
Overview
This guide lists common SO-Log problems and step-by-step fixes so you can restore reliable logging and observability quickly.
1. No Logs Appearing
- Cause: Agent/service not running or misconfigured destination.
- Fixes:
- Check agent status: restart the SO-Log agent/service (systemctl restart so-log or equivalent).
- Confirm configuration file path and syntax: validate with the built-in config check (so-log –check-config) or lint the YAML/JSON.
- Verify output destination: ensure host, port, and protocol (TCP/UDP/HTTP) in config match the collector/endpoint.
- Inspect permissions: ensure the agent has read access to log files and network permissions.
- Tail the agent logs: use journalctl -u so-log or the agent log file to see startup errors.
2. Missing Fields or Incorrect Parsing
- Cause: Incorrect parser/ingestion rules, timestamp format mismatches.
- Fixes:
- Review parsing rules: compare sample log lines to parser regex or grok patterns; adjust patterns to match.
- Set correct timestamp format: configure time zone and format (e.g., ISO8601) to avoid timestamp parsing failures.
- Enable debug parsing: run the parser against sample logs to see captured fields.
- Add fallback/optional fields: make noncritical fields optional to avoid drop-on-parse errors.
3. High Latency or Backpressure
- Cause: Network saturation, slow collector, or too-high throughput.
- Fixes:
- Increase buffer sizes: raise agent memory buffers and spool limits temporarily.
- Enable batching and compression: reduce network calls by batching events and using gzip.
- Scale collectors: add replicas or increase resources for the receiving service.
- Apply rate limiting or sampling: reduce volume for noisy sources.
- Monitor network and I/O: check NIC, router, and disk metrics for bottlenecks.
4. Duplicate Log Entries
- Cause: Multiple agents shipping same files, retry logic without deduplication, or collector misconfiguration.
- Fixes:
- Ensure single source of truth: disable duplicate file watchers across hosts.
- Enable idempotent delivery: configure unique event IDs or sequence numbers.
- Tune retry/backoff: avoid immediate retries that produce duplicates on transient failures.
- Use deduplication at ingestion: configure the collector to drop duplicates based on ID or checksum.
5. Authentication/Authorization Failures
- Cause: Expired tokens, wrong credentials, or ACLs preventing ingestion.
- Fixes:
- Rotate or reissue credentials: update tokens and restart the agent.
- Verify TLS and certs: check expiry and chain; ensure the agent trusts the CA.
- Validate ACLs: confirm IP allowlists and firewall rules include agent hosts.
- Check scopes/roles: ensure service account has required write permissions.
6. Logs Truncated or Corrupted
- Cause: Line-length limits, binary data, or improper encoding.
- Fixes:
- Increase max line length: adjust agent parser limits.
- Handle binary payloads: base64-encode binary fields before shipping.
- Set correct character encoding: ensure UTF-8 is used consistently.
- Preprocess large events: truncate nonessential fields or store large blobs separately.
7. Indexing or Search Failures at Ingest
- Cause: Schema changes, mapping conflicts, or oversized documents.
- Fixes:
- Check index mappings: update mappings to accept new
Leave a Reply