BorisovAI — Tools for the community. By the community.

I was deep in trend extraction when the first problem emerged: garbage data passing all our filters. Oil prices, orange juice futures, and insurance claims—completely unrelated events—somehow clustered together as a “trend.” Our code logic looked solid, but something was systematically wrong.

The real issue was that we were checking individual facts but ignoring whether they actually belonged together. We’d validate each event, calculate relevance scores, link entities—all by the book. Then we’d ship a trend built on noise.

That’s when I started adding gates. Not one, but five layers.

The coherence gate came first. I computed embedding vectors for all evidence events and measured their distance to the cluster centroid. Anything below 0.35 similarity got rejected. Simple, but brutal—56 out of 56 garbage trends from our backlog got filtered immediately. Oil and oranges finally stopped meeting.

The relevance score came next. Instead of a hardcoded 1.0 for every event-trend pair, I made it actual cosine similarity to the centroid. Now you could see why an event was part of a trend, not just whether it was. The transparency mattered more than I expected.

Then the entity blacklist. Generic entities like Russia, China, AI—they’re everywhere, so they were matching everything. I marked them as non-discriminative. If “AI” was your only link between two events, they weren’t actually related.

The LLM confidence gate was practical. Some extraction calls returned low confidence scores. No point materializing weak trends. We filter at ≥0.5 and save compute.

The final gate was the cheapest and most effective. I added a second LLM call—just one or two candidates per cluster—asking: “Is this actually a trend or just a recurring situation?” You’d be surprised how many things that look like trends are just background noise that never resolves. The LLM catches the semantic false positives our metrics miss.

Five gates, each catching different failure modes. The system stopped being a filter and started being a validator.

Testing this felt like debugging a long-running service: each gate removed a class of problems, but you only discovered the next problem once the previous one was fixed. By the end, trend quality stopped being “good enough” and started being defensible.

Here’s a tech fact: even rigorous mathematical filters can’t detect semantic incoherence. You need multiple validation layers, some statistical, some linguistic, some logical. It’s the difference between catching typos and catching conceptual errors.

So now when someone asks why we need five gates instead of one comprehensive metric, I have a simple answer: because garbage whispers different languages, and we learned to listen in five of them. 😄

Five Gates That Caught What Code Missed

Metadata