Skip to content
AutoResearch
StaleKeptLow bandFamily Manager

Same-flyer reuse to skip duplicate extraction

Baseline
55%
Final
88%
Delta
+33 pts
Variants
3
Objective

What we set out to improve

Detect when a flyer has already been forwarded and reuse its prior extraction instead of re-running structured extraction on the near-duplicate.

KeptPromoted to a templateWrote to a KB

Kept. Combining a content hash with a layout-similarity check caught re-forwarded flyers at 0.88 precision, skipping redundant extraction at a low resource cost. The heuristic was promoted to the family-documents knowledge base.

Iterations

Variants we tried

Each variant and its coarse objective metric. The kept variant is marked; bars are relative to the best run.

  • 1Baseline — re-extract every forwardMedium55%
  • 2Variant A — content-hash match onlyLow74%
  • 3Variant B — content-hash + layout similarityWinnerLow88%
Run

Stages

  1. baseline

    Succeeded · 1.6s

  2. variant run

    Succeeded · 5.4s

  3. eval

    Succeeded · 950ms

  4. promote

    Succeeded · 240ms

Output

Artifacts and what shipped

Redaction-safe artifact previews, diffs, metric tables, and prompt variants with sensitive text removed.

  • Metric table

    Duplicate-detection precision by variant (0.55 → 0.88)

  • Diff summary

    Pipeline diff: add a reuse gate before extraction

  • KB write

    Promoted the reuse heuristic to the family-docs KB

What you can see, and what is hidden

Every projection on this page is redaction-safe by construction. Redaction level: Sample content, curated, public-safe excerpts only.

Shown

  • Identifiers & counts
  • Closed-enum statuses
  • Coarse quality / resource bands
  • Timestamps & freshness

Intentionally hidden

  • Raw prompts
  • Raw documents
  • raw tool log
  • Raw trace spans
  • Embedding vectors
  • Free-text feedback
  • Auth internals & secrets
  • Secrets

Related in the Lab