Exorcising Code Demons: Debugging Horror Stories

How to Exorcise the Ghosts of Bad Code: Debugging Horror Stories Enter the Haunted House: Why Bad Code Comes Back at Night Every engineer eventually meets the same ghastly cast: the bug that vanishes when you add a log, the fix that awakens a different...

Photo by Jim Grieco
Previous    Next

Exorcising Code Demons: Debugging Horror Stories

Posted: October 23, 2025 to Announcements.

Tags: Design, Search, Calendar, Support

Exorcising Code Demons: Debugging Horror Stories

How to Exorcise the Ghosts of Bad Code: Debugging Horror Stories

Enter the Haunted House: Why Bad Code Comes Back at Night

Every engineer eventually meets the same ghastly cast: the bug that vanishes when you add a log, the fix that awakens a different monster, the hotfix that seemed harmless until the full moon of peak traffic. These “ghosts of bad code” aren’t supernatural—they’re the natural result of complex systems, time pressures, and our imperfect mental models. Still, they behave like hauntings because they defy the easy rules we expect software to obey. They slip between layers, pass through module walls, and whisper through configuration files. Debugging them requires more than tools; it requires rituals, good habits, and the willingness to look under the floorboards.

This tour through debugging horror stories is a practical field guide: how the hauntings begin, how veterans chase them down, and how to salt the earth so they don’t return. Along the way, you’ll see real-world examples and the exorcisms that tampered them down for good.

The Anatomy of a Haunting: Symptoms vs. Sources

When code is haunted, the symptom rarely appears near the source:

  • Symptom: A sporadic crash on request timeout. Source: a hidden global that retains state across retries.
  • Symptom: A customer is billed twice. Source: a scheduler clock crosses a daylight saving boundary.
  • Symptom: A container won’t die. Source: PID 1 ignores SIGTERM because your app isn’t reaping children.

Effective exorcism separates the two. Instead of chasing the symptom across layers, you tighten a loop around candidate causes, shrinking the search space. The key is to create an environment where cause-and-effect is visible: high-signal logs, minimal reproduction, and timestamps you can trust. That’s the guiding pattern throughout these stories.

Case File #1: The Heisenbug in the Cache

A global API cache reduced traffic by 80%—and then support tickets spiked. Occasionally, a request returned stale data that should have been invalidated minutes earlier. Adding logs made the problem disappear; removing them brought it back. When the team raised log level, CPU rose and throughput fell, and—maddeningly—the bug vanished again.

Root cause: a racy read-write pattern in a multi-threaded cache. The validator swapped in a new value while readers in other threads were iterating the previous structure without memory barriers. The added logging slowed access enough to serialize the timing, masking the race. Production with no extra logs reintroduced the ghost.

How the Exorcism Worked

  • Reproduction: A test harness simulated high concurrency with randomized delays around cache read/write. Repro rate rose from 1 in 10,000 to 1 in 50.
  • Instrumentation: Metrics tracked cache version IDs and monotonic timestamps. A trace attribute carried the version seen per request.
  • Reasoning: A happens-before diagram revealed that invalidations were not atomic with visibility. The fix used an immutable snapshot structure with a compare-and-swap swap-in, and readers always saw a consistent snapshot.
  • Verification: A chaos test injected variable CPU load to ensure increased jitter couldn’t slip the fix.

Lessons: Logging that changes timing is a Schrödinger’s probe. Prefer non-blocking counters, sampling traces, and deterministic reproducibility under synthetic load to avoid masking the very ghost you’re chasing.

Case File #2: The Leak That Only Appeared on Tuesdays

Every Tuesday night, a batch job slowed to a crawl and the service started returning 502s. By Wednesday morning, all was well. Monitoring showed file descriptor usage climbing, but only after the job ran. Restarting the app cleared it… until next Tuesday.

Root cause: A weekly CSV import opened millions of small files on network storage. In an error branch, the code returned early without closing the handle. Garbage collection didn’t reclaim the descriptors before the process hit its limit. Tuesdays were special because the import dataset changed weekly; that batch had more malformed files, raising the error rate and leaking descriptors faster.

How the Exorcism Worked

  • Signal discovery: Alerted on the derivative of open file descriptors rather than an absolute threshold.
  • Forensics: Used lsof to identify thousands of .csv files stuck in deleted state. Correlated with job logs to find the error branch.
  • Fixes: Added try/finally style guarantees, enforced with a static analyzer, and instrumented counters for “opened,” “closed,” and “leaked” (close failures).
  • Guardrails: Lowered ulimit -n in staging to surface descriptor exhaustion early, and added a chaos test that injects file open failures to ensure cleanup paths are robust.

Lessons: Leaks often hide in error paths. Test the unhappy paths with failure injection; make resource acquisition visible via counters; and monitor rates, not just baselines.

Case File #3: Phantom Latency from a Harmless Feature Flag

A seemingly trivial feature flag toggled a “precompute suggestions” step. Turned on, latency rose by 30% even for requests that did not use the feature. Rollbacks helped in one environment but not another. Confusion ensued.

Root cause: Configuration drift. The flag was wired to an initializer that populated an in-memory structure during cold start. In one environment, autoscaling created additional cold starts that contended for a shared datastore, causing thundering-herd behavior. In another, the same flag path was no-op because a different config file omitted a “workers” setting, so the impact never appeared—masking the risk in staging.

How the Exorcism Worked

  • Visibility: Added a per-request “effective config” hash to logs to correlate behavior with inputs. Also included a boot-time “build plan” metric to show which initializers ran.
  • Stabilization: Warmed instances behind the load balancer to eliminate simultaneous initializations during scale-out.
  • Hygiene: Introduced a “flag review checklist” including default values, blast radius, and dependency mapping. Flags now expire with owner and sunset date metadata.

Lessons: Feature flags can be poltergeists—touching things you don’t expect. Always document and surface their dependencies, and treat environment parity as a first-class requirement.

Case File #4: The Zombie Process Tree and the Unkillable Deploy

Blue-green deploys occasionally stalled: new containers came up, but old ones wouldn’t terminate gracefully. Operations resorted to docker kill, which sometimes led to corrupted temp data and a cascade of retries.

Root cause: The service process ran as PID 1 in the container and did not reap child processes. Zombies accumulated and SIGTERM signals were ignored. The container’s “graceful shutdown” was a fiction because the orchestrator awaited a clean exit that never arrived.

How the Exorcism Worked

  • Observation: Ran ps inside the container and observed defunct children. Added metrics for child process counts.
  • Fix: Introduced a tiny init (e.g., tini) to handle SIGTERM/SIGCHLD. Implemented a shutdown hook with deadlines for in-flight requests and an early stop for admission of new work.
  • Validation: Injected synthetic SIGTERM during load tests to ensure latency budgets were honored.

Lessons: Containers lie if you ignore Unix realities. Handle signals, drain traffic, and reap children. If you don’t, your next deploy will turn into a ghost story at 2 a.m.

Case File #5: The Off-by-One That Charged Customers Twice at DST

A subscription service billed monthly at 02:30 local time. On the night daylight saving time ended, some customers were charged twice; on the spring transition, others missed a bill. The code looked careful, even elegant, with a well-tested scheduler.

Root cause: Time folded like a haunted hallway. The job keyed off “2:30 local time” and used local timestamps as idempotency keys. When clocks rolled back, the same local time occurred twice with different absolute instants, producing two bills; when time sprang forward, 02:30 never occurred, so bills were skipped.

How the Exorcism Worked

  • Normalization: Moved all scheduling and keys to absolute instants (UTC) and converted local time only in user-facing displays.
  • Idempotency: Added per-subscription deduplication keys based on a period identifier (year-month pair) rather than a wall-clock.
  • Test matrix: Ran calendar fuzz tests across time zones, leap years, leap seconds (simulated), and historical zone changes.

Lessons: Time zones are not bugs; they are boss-level enemies. Store UTC, key by domain periods, and test with pathological calendars.

Tools of the Exorcist: Seeing the Invisible

Observability That Doesn’t Lie

  • Structured logs: Emit JSON with fields for request IDs, user IDs (if permissible), version, and feature flags.
  • Correlation and trace context: Propagate a trace ID across services and attach exemplar metrics that reference a sample trace for slow paths.
  • High-cardinality discipline: Be careful with labels like user_id in metrics; use bucketing or sampling to avoid cardinality explosions.
  • Monotonic time: Measure durations with monotonic clocks to avoid wall-clock jumps that hide or fabricate latency.

Profilers and Flame Graphs

  • Sampling profilers: Low overhead, great for CPU hotspots and identifying unexpected stack paths.
  • Allocation profilers: Surface churn that triggers GC pauses and memory pressure that can masquerade as I/O issues.
  • Continuous profiling: Historical profiles let you bisect performance regressions like any other bug.

System-Level Probes

  • strace or ktrace: Reveal syscalls—timeouts often trace back to DNS lookups, blocked reads, or missing non-blocking flags.
  • tcpdump or capture tools: Validate whether timeouts are network-level, TLS-level, or app-level.
  • eBPF-based tools: Dynamic tracing of kernel and user events; invaluable for production-only hauntings.

Sanitizers and Static Analysis

  • Address/Thread/Undefined sanitizers: Catch memory misuses and data races that pass tests but explode under load.
  • Static analyzers and linters: Enforce cleanup of resources, nullability, and dangerous patterns such as broad catch without rethrow.
  • Fuzzers: Property-based or random inputs expose edge cases that humans don’t imagine.

Rituals and Safeguards: Process That Keeps Spirits at Bay

Code Archeology and Commit Forensics

  • git bisect: Use traffic replay in staging to check “good” vs “bad” commits quickly.
  • Blame with empathy: Understand why a decision was made; many ghosts were summoned for a reason that no longer applies.
  • Change journaling: Encourage commit messages that explain the “why,” not just the “what.”

Invariant-Driven Development

  • Design invariants: Clearly state contracts like “balances never negative,” “idempotency keys unique per period,” “cache readers see immutable snapshots.”
  • Assertions: Fail fast on invariant breaks in non-production; log and alert in production.
  • Property tests: Define behaviors that must hold across wide input ranges rather than narrow examples.

Chaos, Failure Injection, and Drills

  • Introduce controlled failures: Disk full, DNS slow, network partitions, clock skew.
  • Game days: Practice runbooks under pressure to convert panic into muscle memory.
  • Killswitches: One-click revert or feature disable to limit hauntings’ spread.

Review Checklists that Catch Ghosts Before They Form

  • Time handling: Is it UTC internally? Are period identifiers explicit?
  • Resource safety: Are files, sockets, and transactions closed in all paths?
  • Concurrency: Are shared structures immutable or synchronized? Is memory visibility guaranteed?
  • Observability: Are we emitting structured logs with correlation IDs? Are metrics meaningful and bounded?
  • Config safety: Are defaults safe? Is the blast radius documented?

Practical Patterns to Prevent Hauntings

Keep State Tame: Minimize Globals and Hidden Singletons

  • Dependency injection: Pass in dependencies explicitly so tests can swap them and so you can see what the code touches.
  • Pure functions where possible: Determinism reduces spookiness; side effects belong at the edges.
  • Configuration access: Centralize reads and validate at startup; fail fast on missing or conflicted configs.

Idempotency, Retries, and Backoff That Don’t Double-Charge

  • Idempotent endpoints: Use idempotency keys tied to the business entity and time period.
  • Retry budgets: Cap retries and use exponential backoff with jitter to avoid synchronized storms.
  • Deduplication: Store processed keys for a TTL when backend guarantees are weak.

Rules for Handling Time Without Summoning Demons

  • Store in UTC; convert at the edges.
  • Never schedule by “local wall time” for critical jobs; use absolute instants or well-defined periods.
  • Treat clocks as unreliable: Use monotonic for durations, NTP monitoring for drift, and tolerances in SLAs.

Data Migrations and Compatibility

  • Write-new/read-old: Deploy readers that handle both schemas before writing the new format.
  • Backfills with checkpoints: Resume safely, and record progress to avoid ghost writes.
  • Contracts in protobuf/Avro: Use explicit evolution rules, defaults, and field deprecation.

Feature Flag Hygiene

  • Expiry dates and owners: Stale flags turn into dead code shrouds.
  • Targeted rollouts and canaries: Contain the blast to a slice you can understand.
  • Config snapshots: Persist the effective flag state for post-incident reconstruction.

A Debugger’s Mindset: Heuristics When You’re Being Haunted

Tighten the Feedback Loop

  • Minimal reproduction: Reproduce locally or in a sandbox with the fewest moving parts. Shrink it until the bug disappears, then add back one dimension at a time.
  • Divide and conquer: Binary-search the space—disable half the system or revert half the commits to find the boundary.
  • Control the environment: Freeze versions, seed inputs, and lock clock behavior when possible.

Hypothesize, Don’t Guess

  • Write down a falsifiable hypothesis: “The cache returns stale data because readers see partial writes when the invalidation overlaps.”
  • Design the smallest experiment to prove it wrong. Surviving hypotheses earn more attention.
  • Keep a lab notebook: Timestamp observations; note what changed and what did not. Memory is a trickster.

Follow the Evidence Across Layers

  • Application: Logs with request IDs and payload sizes.
  • Runtime: GC pauses, thread pools, queue depths.
  • OS: Syscalls, open files, sockets, process states.
  • Network: RTTs, retransmits, TLS handshakes.
  • Storage: IOPS, latency percentiles, compaction events.

Hauntings often cross boundaries: an app-level timeout triggered by a kernel-level DNS cache miss exacerbated by an infrastructure-level security group change. Instrument all layers and correlate.

A Compact Field Kit Checklist

  • Repro harness with fixed seeds and traffic replay.
  • Toggle-able feature flags with effective config logging.
  • Request/trace IDs end-to-end.
  • Sampling profiler and a way to capture flame graphs in production safely.
  • Resource counters: open FDs, threads, goroutines, heap, GC, event loop lag.
  • Monotonic timestamps and a time abstraction for tests.
  • Chaos switches: inject latency, failure, clock skew.
  • One-command rollback and a well-practiced playbook.

Haunted Smells: Patterns That Predict Future Nightmares

  • “TODO: handle errors” left in production paths.
  • Broad catch/except that swallows stack traces and returns generic failures.
  • Mutable global caches without versioning, synchronization, or immutable snapshots.
  • Tests that assert logs instead of behavior, making logging changes dangerous.
  • Multiple time libraries with mixed assumptions (UTC vs local vs device time).
  • Configuration scattered across environment variables, YAML, and code defaults with no single source of truth.
  • Silent fallbacks: “If config missing, assume default” without emitting a prominent event.

When to Call an Exorcist: Escalation and Pairing

Some ghosts thrive on isolation. If you’ve burned hours with no movement: pair up. Describe the problem aloud. A second set of eyes can spot false assumptions: the “reliable” DNS cache, the hidden retry, the container networking mode that isn’t what you think. Escalate with context:

  • What is the observed symptom? Include examples with trace IDs.
  • What changed recently? Code, infra, data volume, workload shape, time of day.
  • What have you ruled out? Share experiments and their results.
  • What is the blast radius? Which customers or systems are affected and how badly.

When you treat debugging as a team sport with crisp communication and shared rituals, hauntings become manageable. Bugs still arrive at inconvenient hours, but they don’t own the night.

Real-World Mini-Catalog of Ghosts and Quick Exorcisms

  • Heisen-logging: Adding debug logs “fixes” the bug. Quick check: switch to counters or trace sampling; reduce log I/O; use deterministic sleeps in repro harness to stabilize timing.
  • Phantom CPU spikes: Every hour on the hour. Quick check: cron or scheduler alignment, compactions, or cache expirations. Stagger schedules; add jitter.
  • Socket timeouts under load but not in tests: Quick check: ephemeral port exhaustion. Observe TIME_WAIT counts; enable connection pooling and keep-alives; increase port range.
  • Slow SQL after deploy with no schema change: Quick check: plan cache invalidation or parameter sniffing. Force recompile or add plan guides; review indexes with actual runtime stats.
  • Memory “leak” that disappears on forced GC: Quick check: object pooling growth or caches without size limits. Cap pool sizes; attach eviction policies; monitor hit/miss ratios.
  • Message dedup systems that still double-process: Quick check: dedup window too small under retry storms; clock skew between producers and consumers. Use server-assigned timestamps and extend windows during incidents.
  • Edge-only failures: Works from office, fails for customer. Quick check: MTU mismatch causing fragmentation. Lower MSS or enable PMTUD; test with ping -M do.

From Horror to Habits: Living with Complex Systems

The line between a debugging horror story and a good engineering tale is usually an invariant, a log line, or a test you wish you had. The ghosts don’t vanish; they move on when the house is well lit. You light it by designing for observability, rehearsing graceful failures, writing contracts into code, and refusing cleverness where clarity will do. Then, when a haunting does arise, you’ll have the tools, the rituals, and the calm to face it—torch in one hand, profiler in the other, and a pocket full of idempotency keys.

 
AI
Venue AI Concierge