In recent years, researchers have turned to provenance-based intrusion detection systems (PIDSs) as a promising way to spot attacks that evade past traditional defenses. At their core, these systems build provenance graphs, which act like detailed maps of how information flows through a computer. Such graphs treat system entities (e.g., processes, files, and network connections) as nodes, and the interactions between them (e.g., system calls) as edges. By analyzing these graphs, anomaly-based PIDSs learn what normal behavior looks like and then flag unusual activity, making them well-suited for catching stealthy attacks such as advanced persistent threats (APTs) or previously unknown zero-day exploits. Despite claims of near-perfect detection rates, today’s PIDSs are nowhere near ready for real-world use. Their biggest flaw is how they report results: most state-of-the-art PIDSs generate coarse-grained alerts with tens of thousands of nodes or events, burying analysts under mountains of noise. This is not just an engineering oversight; it is the direct result of evaluation practices. By optimizing for specific evaluation metrics instead of usable outputs, the community has built detectors that are impressive on paper but fall short of being helpful to a security team.