SO-Log: A Complete Guide to Setup and Best Practices

SO-Log Troubleshooting: Common Issues and Fixes

Overview

This guide lists common SO-Log problems and step-by-step fixes so you can restore reliable logging and observability quickly.

1. No Logs Appearing

  • Cause: Agent/service not running or misconfigured destination.
  • Fixes:
    1. Check agent status: restart the SO-Log agent/service (systemctl restart so-log or equivalent).
    2. Confirm configuration file path and syntax: validate with the built-in config check (so-log –check-config) or lint the YAML/JSON.
    3. Verify output destination: ensure host, port, and protocol (TCP/UDP/HTTP) in config match the collector/endpoint.
    4. Inspect permissions: ensure the agent has read access to log files and network permissions.
    5. Tail the agent logs: use journalctl -u so-log or the agent log file to see startup errors.

2. Missing Fields or Incorrect Parsing

  • Cause: Incorrect parser/ingestion rules, timestamp format mismatches.
  • Fixes:
    1. Review parsing rules: compare sample log lines to parser regex or grok patterns; adjust patterns to match.
    2. Set correct timestamp format: configure time zone and format (e.g., ISO8601) to avoid timestamp parsing failures.
    3. Enable debug parsing: run the parser against sample logs to see captured fields.
    4. Add fallback/optional fields: make noncritical fields optional to avoid drop-on-parse errors.

3. High Latency or Backpressure

  • Cause: Network saturation, slow collector, or too-high throughput.
  • Fixes:
    1. Increase buffer sizes: raise agent memory buffers and spool limits temporarily.
    2. Enable batching and compression: reduce network calls by batching events and using gzip.
    3. Scale collectors: add replicas or increase resources for the receiving service.
    4. Apply rate limiting or sampling: reduce volume for noisy sources.
    5. Monitor network and I/O: check NIC, router, and disk metrics for bottlenecks.

4. Duplicate Log Entries

  • Cause: Multiple agents shipping same files, retry logic without deduplication, or collector misconfiguration.
  • Fixes:
    1. Ensure single source of truth: disable duplicate file watchers across hosts.
    2. Enable idempotent delivery: configure unique event IDs or sequence numbers.
    3. Tune retry/backoff: avoid immediate retries that produce duplicates on transient failures.
    4. Use deduplication at ingestion: configure the collector to drop duplicates based on ID or checksum.

5. Authentication/Authorization Failures

  • Cause: Expired tokens, wrong credentials, or ACLs preventing ingestion.
  • Fixes:
    1. Rotate or reissue credentials: update tokens and restart the agent.
    2. Verify TLS and certs: check expiry and chain; ensure the agent trusts the CA.
    3. Validate ACLs: confirm IP allowlists and firewall rules include agent hosts.
    4. Check scopes/roles: ensure service account has required write permissions.

6. Logs Truncated or Corrupted

  • Cause: Line-length limits, binary data, or improper encoding.
  • Fixes:
    1. Increase max line length: adjust agent parser limits.
    2. Handle binary payloads: base64-encode binary fields before shipping.
    3. Set correct character encoding: ensure UTF-8 is used consistently.
    4. Preprocess large events: truncate nonessential fields or store large blobs separately.

7. Indexing or Search Failures at Ingest

  • Cause: Schema changes, mapping conflicts, or oversized documents.
  • Fixes:
    1. Check index mappings: update mappings to accept new

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *