New ask Hacker News story: Security breaks during partial failures – design notes from distributed systems
Security breaks during partial failures – design notes from distributed systems
3 by sandhyavinjam | 0 comments on Hacker News.
TL;DR: Many security mechanisms fail not during attacks, but during partial outages. This post documents early design notes for a failure-aware security framework for distributed systems. The problem In production distributed systems, security often breaks when things are half working: auth services degrade → retries explode fallback paths widen access recovery logic becomes the attack surface Nothing is “exploited”, yet the system becomes unsafe. Most security models assume stable components and clean failures. Real systems don’t behave that way. Design assumptions We assume: correlated failures retries are adversarial timeouts are unsafe defaults recovery paths matter as much as steady-state logic We don’t assume: global consistency perfect identity reliable clocks centralized enforcement Framework ideas (high level) This work explores four ideas: 1. Failure-aware trust Trust degrades under failure, not just compromise Access narrows automatically during partial outages 2. Security invariants at runtime Invariants are continuously enforced Violations trigger containment, not alerts 3. Retry-safe security primitives Idempotent, monotonic, side-effect bounded Retries can’t escalate privilege 4. Security as observable state Trust level, degradation, and containment are visible If you can’t observe it, you can’t secure it What this is not Not zero trust marketing Not compliance Not a finished system It’s an attempt to treat failure as the normal case, not an exception. Why publish this early? Because many real failures: don’t fit clean research papers happen during incidents, not attacks are invisible outside production systems We’re sharing design notes to get feedback before formalizing or evaluating further. Feedback welcome If you’ve seen security regressions during outages or retries causing unsafe behavior, I’d like to hear about it. This is ongoing work. No claims of novelty or completeness.
3 by sandhyavinjam | 0 comments on Hacker News.
TL;DR: Many security mechanisms fail not during attacks, but during partial outages. This post documents early design notes for a failure-aware security framework for distributed systems. The problem In production distributed systems, security often breaks when things are half working: auth services degrade → retries explode fallback paths widen access recovery logic becomes the attack surface Nothing is “exploited”, yet the system becomes unsafe. Most security models assume stable components and clean failures. Real systems don’t behave that way. Design assumptions We assume: correlated failures retries are adversarial timeouts are unsafe defaults recovery paths matter as much as steady-state logic We don’t assume: global consistency perfect identity reliable clocks centralized enforcement Framework ideas (high level) This work explores four ideas: 1. Failure-aware trust Trust degrades under failure, not just compromise Access narrows automatically during partial outages 2. Security invariants at runtime Invariants are continuously enforced Violations trigger containment, not alerts 3. Retry-safe security primitives Idempotent, monotonic, side-effect bounded Retries can’t escalate privilege 4. Security as observable state Trust level, degradation, and containment are visible If you can’t observe it, you can’t secure it What this is not Not zero trust marketing Not compliance Not a finished system It’s an attempt to treat failure as the normal case, not an exception. Why publish this early? Because many real failures: don’t fit clean research papers happen during incidents, not attacks are invisible outside production systems We’re sharing design notes to get feedback before formalizing or evaluating further. Feedback welcome If you’ve seen security regressions during outages or retries causing unsafe behavior, I’d like to hear about it. This is ongoing work. No claims of novelty or completeness.
Comments
Post a Comment