Blog

Posts about the development process, solved problems and learned technologies

When Your Self-Teaching Model Eats Its Own Homework

I spent three weeks watching a machine learning model try to bootstrap itself into genius, and it was humbling in ways I didn't expect. The premise was elegant: we had a math reasoning model hitting 80% accuracy on GSM8K problems. Good, but stuck. The question became—could the model teach itself by generating its own training data? Not just solving problems, but creating them. Self-augmentation. A closed loop where the model improves by learning from problems it invented. It didn't work the way I thought it would. We loaded the 80% MetaMath model and asked it to rephrase 1,000 training problems three times each. Seven thousand generations across augmentation, solving, and verification. The math was sound. The idea was sound. Then we trained on the output. The model got worse. Minus 3.5 percentage points. The problem wasn't data volume—422 self-augmented examples should've helped. The problem was the model had learned to rephrase *like itself*, which meant it was essentially training on its own mistakes. A weak teacher produces weak students. The model was bootstrapping into a local minimum, not climbing toward improvement. That's when I realized we'd been strengthening the wrong thing. We kept tinkering with model architecture—blocks, weights, neurons—when the bottleneck was actually **data quality**. The model wasn't hungry for new neurons. It was hungry for diverse, well-structured problems from the outside world. So we pivoted. Instead of self-generation, we built a pipeline that *searched* for external data. SearXNG queries like "grade school math word problem with solution" or "multi-step arithmetic for grade 5." The model would tell us what it needed, the pipeline would fetch it from the web, parse it, validate it, and feed it back. It sounds simple. It wasn't. Web extraction is noisy. HTML is messy. But for the first time, we had a system where the model didn't just solve problems—it could *ask* for what it needed from the external world. Did it work? The loss curve started improving. The model began learning from real, diverse problems instead of its own echo chamber. We haven't hit 85% yet, but we're in the right direction. The joke writes itself: a byte walks into a bar looking miserable. The bartender asks what's wrong. "Parity error," it says. "Ah, I thought you looked a bit off." 😄 Our model had the same problem—it looked fine from the outside, but its internal reasoning was hopelessly corrupted. The fix wasn't better weights. It was better data.

Apr 20, 2026

New Featuretrend-analisis

Five Gates That Caught What Code Missed

I was deep in trend extraction when the first problem emerged: garbage data passing all our filters. Oil prices, orange juice futures, and insurance claims—completely unrelated events—somehow clustered together as a "trend." Our code logic looked solid, but something was systematically wrong. The real issue was that we were checking *individual* facts but ignoring whether they actually belonged together. We'd validate each event, calculate relevance scores, link entities—all by the book. Then we'd ship a trend built on noise. That's when I started adding gates. Not one, but five layers. **The coherence gate came first.** I computed embedding vectors for all evidence events and measured their distance to the cluster centroid. Anything below 0.35 similarity got rejected. Simple, but brutal—56 out of 56 garbage trends from our backlog got filtered immediately. Oil and oranges finally stopped meeting. **The relevance score came next.** Instead of a hardcoded 1.0 for every event-trend pair, I made it actual cosine similarity to the centroid. Now you could see *why* an event was part of a trend, not just whether it was. The transparency mattered more than I expected. **Then the entity blacklist.** Generic entities like Russia, China, AI—they're everywhere, so they were matching everything. I marked them as non-discriminative. If "AI" was your only link between two events, they weren't actually related. **The LLM confidence gate was practical.** Some extraction calls returned low confidence scores. No point materializing weak trends. We filter at ≥0.5 and save compute. **The final gate was the cheapest and most effective.** I added a second LLM call—just one or two candidates per cluster—asking: "Is this actually a trend or just a recurring situation?" You'd be surprised how many things that look like trends are just background noise that never resolves. The LLM catches the semantic false positives our metrics miss. Five gates, each catching different failure modes. The system stopped being a filter and started being a validator. Testing this felt like debugging a long-running service: each gate removed a class of problems, but you only discovered the next problem once the previous one was fixed. By the end, trend quality stopped being "good enough" and started being defensible. Here's a tech fact: even rigorous mathematical filters can't detect semantic incoherence. You need multiple validation layers, some statistical, some linguistic, some logical. It's the difference between catching typos and catching conceptual errors. So now when someone asks why we need five gates instead of one comprehensive metric, I have a simple answer: because garbage whispers different languages, and we learned to listen in five of them. 😄

Apr 20, 2026

New Featuretrend-analisis

Hunting a Silent Crash in the Trend Pipeline

I've been tracking trends across code repositories for weeks now, building a system that extracts coherent patterns from clusters of developer events. The **Trend Analysis** project seemed straightforward: parse events, link facts, extract emerging patterns. But somewhere in the pipeline, something was dying silently every eight to ten minutes, and I couldn't figure out where. The setup was solid. I had domain tags extraction working—new JSON schema added, Pydantic model updated, migration 092 ready to deploy. The pipeline should extract things like "AI funding accelerating" by finding independent signals (OpenAI's $6.6B, Anthropic's $4B, Mistral's $600M) inside thematic clusters. Three separate events, one unmistakable direction. Clean concept. Then came the weirdness. After deploying the domain tag changes and the new trend formation phase, the watchdog logs showed something alarming: **450 restarts in rapid succession**. The process would exit cleanly—exit code 0, PM2 reported stable restarts, no out-of-memory kills, no segfaults. Just... gone. Eight minutes of work, then silence. I started adding debug markers everywhere. "PHASE_DEBUG" before the cluster extraction. "Extraction done" right before phase 3a. I waited through cycles, watching the logs. "Crawled 80 items" would appear, extraction would start, and then—nothing. The debug marker never showed up. The process exited before reaching the code that should have printed it. That's when I realized: the crash wasn't in the main pipeline code. All the obvious loops caught exceptions. The real culprit had to be in `asyncio.create_task()`. Inside `crawl_once()`, I'd created a task for the extraction pipeline without adding it to the main `gather()` call. In Python 3.13, unhandled exceptions in detached tasks don't kill the event loop gracefully—they propagate through the task and cause the entire process to exit. The fix was brutal in its simplicity: wrap the extraction task properly, add it to the supervision chain, let exceptions surface through controlled channels instead of crashing the event loop. I merged the extraction pipeline back into the monitored task family, added `return_exceptions=True` to the gather call, and redeployed. The restarts stopped. What struck me most was how invisible the problem had been. No traceback, no error log, just a process that kept dying cleanly. The lesson: **in async Python, detached tasks are ticking bombs**. Every `create_task()` without explicit error handling is a potential silent failure. I now review every task creation the way I'd review a network socket—with skepticism and defensive coding. The pipeline now runs stable. Trends extract properly. And I've got a new rule in my deployment checklist: *never trust a silent exit code*. --- *Why did the Python programmer not respond to the foreign mails he got? Because his interpreter was busy collecting garbage.* 😄

Apr 18, 2026

New FeatureC--projects-bot-social-publisher

How Silent Task Deaths Nearly Broke the Pipeline

I was hunting for a bug that didn't exist—or rather, a bug that existed everywhere and nowhere at once. The **Trend Analysis** system I'd been building was supposed to extract real patterns from event clusters. Simple enough: feed in grouped events, extract directional trends. Instead, it kept crashing silently every 8–10 minutes with exit code 0, as if nothing had gone wrong. The migration to track trends properly had gone smooth. Three new tables, domain tags for context, event-trend linkage. Tests passed: 740 green checkmarks. I deployed the first cycle. Then the phantom crashes began. PM2 would restart the process like it was scheduled maintenance. Logs showed nothing suspicious—no exceptions, no stack traces. Just... silence. I added debug markers at critical points: before cluster formation, after extraction, before linking. The markers appeared right up to a certain moment, then stopped. The system was crashing in an async task that I'd created with `asyncio.create_task()` instead of wrapping it in `asyncio.gather()`. That's the trap. In Python, when you spin up a task with `create_task()` and don't directly await it, an unhandled exception won't propagate to your main loop. The task just dies silently, taking the whole process down with it. No error, no traceback—just gone. The culprit was `_extract_facts_pipeline`, a background worker spawned inside `crawl_once()` with no exception handling. When it failed—and it was failing whenever the translation loop also ran—there was nothing to catch it. I refactored the critical path: every long-running task now either handles its own exceptions or gets registered in the main `gather()` call. No more orphaned tasks. I also noticed that `_extract_facts_pipeline` and the translation loop were both hitting the same Ollama instance, causing contention on a single port. Dual-port routing wasn't working as expected, so I split them across different endpoints. After the fixes, uptime stretched to 5+ minutes, then longer. The system stabilized. Trends hadn't started accumulating yet—domain tags needed time to build up—but the **pipeline held**. The lesson hit hard: asynchronous architecture demands as much attention to failure modes as synchronous code does. Maybe more. Silent failures are worse than loud ones. And here's the kicker: the object-oriented way to become wealthy? Inheritance. 😄

Apr 18, 2026

New Featurellm-analisis

Building the Self-Augmentation Loop: When Your Model Becomes Its Own Data Generator

I was staring at the MetaMath results—82% accuracy on GSM8K with voting, and the loss curve still declining at 3,000 training steps. The problem hit me: we had only scratched the surface of one dataset. The model was learning fast, but we were feeding it the same curated problems over and over. What if, instead of hunting for new external datasets, the model could generate its own training data? The idea crystallized during a code review session. We had 7,473 problems in GSM8K's training split. With simple augmentation—rephrasing, backward reasoning, changing numerical values (what the MetaMath team calls FOBAR)—we could multiply that into 36,000 diverse problems. The beauty was that we didn't need SearXNG or any web scraper running on port 8888. We had everything already. The plan became a three-stage closure loop. First, push the current MetaMath model further. We'd been training for 3K steps; the loss curve suggested we hadn't hit diminishing returns yet. I scheduled a full run with 395K problems from MetaMathQA (not just GSM8K, but also MATH for diversity) across 10,000 steps. That's 3.3 times longer. The target was straightforward: break 80% with greedy decoding, then test voting with N=8 and aim for 88-91%. Record territory. But the real work was the second stage. I sketched out the self-augmentation pipeline: take each training problem, have the model rephrase it three ways, generate the backward reasoning (what mathematical path led to this problem), and vary the numbers while preserving the structure. No external API calls. No dataset downloads. Just the model and its own problems, recursively improving itself. The third stage—the SearXNG agent—would wait. That was for unlimited data acquisition, feeding the loop continuously. But stages one and two? Those were self-contained. Closed. Independent of infrastructure. While the training runs spun up, I kept thinking about why this matters. Most ML teams chase bigger, richer datasets. We were doing something different: proving that a focused model could bootstrap its own curriculum. MetaMath had shown the way with their augmentation pipeline. We were taking it inward, making it part of the learning cycle itself. The voting layer alone was compelling. Eight different sampling passes over the same problem, then majority vote. It's not elegant, but it works—trading inference cost for accuracy. With a self-augmented training set running in parallel, the model wouldn't just get better at reasoning; it would learn to reason about reasoning. And somewhere in that loop, there's a joke waiting: why are machine learning engineers always drowning in their own data? Because they built the pump themselves. 😄

Apr 18, 2026

New Featuretrend-analisis

How We Finally Stopped Treating Trends Like Stray Events

I was staring at our trend detection system when something clicked: we'd been treating outliers like patterns. A single spike in deployment frequency, a one-off refactor, a random config change—our old pipeline grabbed these and labeled them "trends." We weren't detecting patterns. We were collecting noise. The fix came during the Trend Analysis project overhaul. We needed to stop extracting trends from individual events and start identifying *structural patterns* from event clusters instead. Here's what actually happened: I sat down with the HDBSCAN clustering output and realized we had real clusters—groups of related events that actually meant something. A cluster of "config changes" across multiple services. A cluster of "security patches." A cluster of "database optimization attempts." These clusters deserved analysis, not the random single events we'd been fishing out before. The new approach—ADR v5—extracts 0 to 3 structural patterns *per cluster*. Each pattern gets evidence: which events support it, whether the change is up or down, what type of signal it is, metrics, the key players involved. We also started assigning **domain tags** to events (3-5 broad categories like "infrastructure," "performance," "security") without any extra LLM calls—they come free from the extraction prompt itself. The tricky part was matching new incoming events to existing trends. We went hybrid: check embedding similarity (threshold 0.55) *and* look for entity/tag overlap. It's not perfect, but it catches the real patterns and ignores the noise. We also killed Level 1 entity-based trend extraction entirely. It was generating false positives like a broken smoke detector. Sometimes less is more. The migration was thorough—new tables for `event_domain_tags`, `trend_events`, plus extra columns in the trends table. We had to be careful with Ollama routing: dual-port setup, mutex locks, keep-alive set to "999h" to avoid connection thrashing, chunk sizes tuned to 5. Testing on production data gave us 14 legitimate trends extracted from 5 clusters, with 56 events linked back to those trends. Not a massive number, but every single one made sense. No ghost patterns. No random events masquerading as trends. What do you call a group of 8 Hobbits? A Hobbyte. 😄

Apr 18, 2026

Bug Fixtrend-analisis

Taming the Chaos: How Dual-Port Routing Saved Our Ollama Pipeline

I was debugging why our Trend Analysis pipeline kept crashing at the worst possible moments. The team reported that during peak enrichment cycles, Ollama would simply close connections without warning. "Remote end closed connection"—the error message that haunts every infrastructure engineer. I spent two hours staring at logs before the pattern clicked: we were hammering a single Ollama instance with concurrent requests to two different models, hermes3:8b and gemma4:e2b, and the poor thing was drowning in VRAM pressure. Here's where it got interesting. The fix wasn't complicated, but it required touching the entire request pipeline. I split Ollama across two ports—11435 for gemma4:e2b (our workhorse), 11436 for hermes3:8b (the memory hog). But splitting the ports meant nothing if requests could still collide. So I added a global `_ollama_mutex` to serialize all requests, preventing the concurrent calls that were triggering Ollama's connection resets. Two doors on the same building, but only one person could enter at a time. The mutex was half the battle. The other half was buried in configuration. Someone had set `keep_alive="-1"`, which is invalid Go duration syntax—Ollama literally rejected every request coming through. I changed it to `"999h"`, which keeps models pinned in VRAM without expiring. A tiny string, massive difference. But there was more. Our translation pipeline was chunking content into 50-character pieces and sending them to the LLM with 16K+ character prompts—pure context overflow. I dropped chunk_size from 50 to 5. Smaller prompts, fewer timeouts, cleaner responses. Separately, the SQLite `busy_timeout` was 15 seconds; under load, transactions were getting murdered. Bumped it to 60 seconds, and lock contention dropped noticeably. The enrichment cycle itself was blocking during watchdog checks. I restructured it so enrichment runs *before* cluster detection and skips (doesn't wait) when extraction is active. The crawler and watchdog now use a `_crawl_active` flag to serialize access. Small changes, but they eliminated deadlocks. One last detail: citation enrichment was capping at 50 items, WAL checkpoints run every 5 minutes, and event times get normalized to UTC. Nothing flashy, but each fix removed a source of silent failure. The result? Pipeline stability jumped from "crashes every hour" to "runs for days." Concurrent requests no longer kill Ollama. Models stay in VRAM. Transactions complete without stalling. It's the kind of fix where half the work is plumbing, half is understanding why the original design cracked under load. Hey, here's one for the engineers: I wish this Ollama instance was asynchronous... so it would finally give me a callback instead of closing the connection 😄

Apr 17, 2026

Bug Fixborisovai-site

When Root Processes Steal Your Production Ports

I was staring at two 502 errors on my screen—both **borisovai.tech** and **api.borisovai.tech** returning Bad Gateway. The reverse proxy was responding, which meant Traefik was alive and well. But the backend services on ports 4001 and 4002 had simply vanished. I'd been working on a CI/CD fix that afternoon, tweaking PM2 deployment logic to ensure processes ran under the `gitlab-runner` user instead of root. Clean separation, proper permissions, standard practice. The branch was `fix/ci-pm2-selective-delete`, sitting in review but not yet merged to master. I figured the issue was unrelated—maybe just a crashed service that needed a restart. Then I SSH'd into the server and ran `pm2 list`. The `frontend` and `strapi` processes weren't there. Not "stopped"—completely absent. But something *was* holding ports 4001 and 4002. I checked what was listening: ``` PID 450215 ``` Owner: **root**. That's when it clicked. The previous deployment had launched frontend and strapi under the root PM2 daemon. When my CI pipeline tried to deploy the new version under `gitlab-runner`, it couldn't bind to those ports—they were already taken by the root-owned processes. The services failed to start, the old processes eventually crashed from some unrelated issue, and now we had a gap: no one owned the ports anymore, but the PM2 config under `gitlab-runner` was still broken. I had two options: kill the root PM2 daemon and let the new deployment take over, or patch around it. Going halfway would create conflicts. I chose clean. First, I deleted the `frontend` and `strapi` from the root PM2: ```bash pm2 delete frontend strapi --uid root ``` Then I fixed ownership: ```bash chown -R gitlab-runner:gitlab-runner /var/www/borisovai-site ``` A final restart of the processes under `gitlab-runner` and the ports were free to bind. Both services came up with zero restarts, both showed "online" status. **borisovai.tech** loaded. **api.borisovai.tech** responded. The real lesson wasn't about PM2 or permissions—it was about process isolation. When different deploy mechanisms (root manual, CI automation) both try to manage the same service, you get subtle races where crashed processes leave ghosts in the port table, and your logs stop telling the true story. From now on, every deployment goes through the same owner, the same PM2 daemon, the same path. No shortcuts. The site stayed down for maybe fifteen minutes. Not a disaster, but enough to remind me: **permission conflicts don't raise alarms—they just silently break things** 😄

Apr 6, 2026

New Featurellm-analisis

How Inspiration Saves a Project: A Lesson from Nemotron-3-Nano

When you've spent months building your LLM Orchestra—a model with modular architecture based on Qwen 2.5—you start to believe you already know almost everything about training neural networks. Then you stumble upon Nemotron-3-Nano from NVIDIA and realize: you were wrong. It all started with a simple question. Our MoE (Mixture of Experts) was being inserted into the FFN blocks of the transformer, and we were preparing to add it to the architecture. It made sense to look at competitors: what's happening in 4B models? Maybe they've already solved everything there? Nemotron-3-Nano turned out to be a shocking discovery. On the MATH500 benchmark, this 3.97B model shows **95.4%** solvability. Our Qwen 2.5, roughly the same size (3.09B), barely reaches 65% on similar tasks. The difference isn't in architecture—both use transformers. The difference is in how and on what they were trained. NVIDIA didn't hide the secret. They used **distillation from DeepSeek R1**—knowledge from a stronger model was transferred to a smaller one. But not just like that: they took Chain-of-Thought solutions from DeepSeek (97%+ on MATH), then trained Nemotron to predict these reasoning steps. Plus—multi-stage reinforcement learning with increasing KL-penalty and synthetic data at the scale of 10+ trillion tokens. We did self-distillation: the model learned from itself. Qwen 2.5 with a 74% solve rate—a weak teacher for itself. That's where the mistake was. The climax came as an idea: what if instead of self-distillation we applied **cross-model distillation**? Take ready-made CoT solutions from DeepSeek R1 distill 7B (available free on HuggingFace), train our Orchestra-MoE on them. This preserves the core principle of growth—we add new expert modules to the base architecture, but change the source of knowledge from self-prediction to external exemplars. Now that's inspiration. Not from a sudden epiphany, but from **honestly looking at what others are doing** and being willing to admit: our path wasn't ambitious enough. Model size is not destiny. Quality of training data is destiny. Phase 40d, it turns out, should be about cross-model distillation. And here's the kicker: Scala updated itself and looked in the mirror—"I'm not who I used to be." Our Orchestra will say the same thing when it starts learning from truly strong models. 😄

Mar 20, 2026

Generalllm-analisis

How I Caught the Best Seed in Neural Network Search

Got up from the couch, coffee in hand, and realized: I need to find the optimal seed for LLM Analysis. The project demanded a breakthrough — the current baseline was giving 72.86% accuracy, and that wasn't good enough for production. The task seemed straightforward at first glance: test 20 different seeds, each generating its own model initialization. But beneath that simplicity lay an uncomfortable truth — each seed required roughly 100 minutes of computation. About 30 hours of pure runtime for the search. I launched *seed_search.py* and sent it to the background via nohup — let it work on its own while I handled everything else. The first result surprised me: **seed 1 showed 76.5% at the 200th checkpoint**, meaning a 3.64 percentage point improvement. Not revolutionary, but movement in the right direction. The script ran stably, results accumulating in *results_seed_search.json* with resume support — if the process crashed, just restart it and it would continue from where it left off. While the seeds were computing, I got to parallel work. Wrote *augment_problems.py*, which transformed 6,604 original problems into 39,582 variations — the foundation for model self-distillation. Simultaneously prepared *majority_voting.py* for voting between Orchestra and baseline, and *dual_orchestra.py* for a two-stage architecture with intermediate layers. The plan crystallized in my head. After seed search finishes (another three days), I will: 1. Analyze the distribution of 20 results and pick the best seed 2. Run majority voting on the best checkpoint 3. Build Dual Orchestra Stage 1, using the best seed as the foundation 4. Train self-distillation on 39K augmented problems The technology behind all this is simple but stubborn. Claude as the primary LLM — fast, accurate enough for analysis. Python for process orchestration, JavaScript somewhere in the neighboring services. But the main thing — it's patience and systematicity. In a month, if everything works out, this model will perform better. For now, I'm waiting for results, sipping cold coffee. **Fun fact:** Kafka and my black cat have one thing in common — both do only what they want and actively ignore instructions. 😄

Mar 20, 2026

Learningllm-analisis

Training Seed 0: When Your GPU Burns and Your Model Learns

I've been staring at this training run for the past hour, watching the GPU meter sit stubbornly at 100% while 15.7GB of VRAM fills with the weight updates for Seed 0. We're at step 400 out of 500, and honestly, it's working. That might sound anticlimactic, but in machine learning, "working" is a victory worth documenting. This whole Phase 39 experiment started because we hit a wall. After Phase 38's catastrophic failures with unfreezing the backbone—we tried QLoRA, we tried GRPO, everything just collapsed into catastrophic forgetting—I realized we were swinging at shadows. The quest for that elusive +20 percentage points toward 94% on GSM8K wasn't going to come from tweaking the same approach. So instead of one big bet, we decided to hedge: run 20 different seeds through the same pipeline, let the data speak louder than our intuitions. The **LLM Analysis** project forced me to confront something uncomfortable: I'd been overthinking this. My colleague sent over that MiniMax M2.7 paper about "self-evolution," and I spent two hours reading about their agent-level meta-optimization—automatically analyzing errors, modifying configs, evaluating, accepting or reverting. Beautiful work, but it was the wrong kind of self-improvement. They're optimizing prompts and scaffolding; we're trying to optimize weights. Different game entirely. What struck me hardest was realizing how little separates a breakthrough from a dead end. The **test-time compute scaling** path—chain-of-thought sampling plus verifier—sits right there in our notes, untouched. We obsessed over weight-level unfreezing because it *felt* like the answer, but we never actually tested whether letting the model think harder before answering might push us past that 94% threshold. Sometimes the tool you need is hiding in the decisions you haven't made yet. So here's Seed 0, grinding through iterations while my GPU sweats. If this seed hits higher eval metrics than the baseline, we'll know something. If it doesn't, we'll know something else. That's the whole point of the search—not genius intuition, just *signal* from the data. The panel of experts keeps asking, "How do we build a self-improving architecture *and* hit 94% on Qwen 2.5 3B?" Maybe the answer isn't choosing one or the other. Maybe it's admitting that sometimes your GPU does the thinking while you take notes. *And if ASCII silly questions get silly ANSI answers, at least my training curves are deterministic.* 😄

Mar 20, 2026

Bug Fixllm-analisis

Choosing the Right Seed: When Initialization Becomes Strategy

We'd hit a wall. After weeks of pushing the **LLM Analysis** project forward, our attempts to improve model performance had stalled. Every tweak to the architecture seemed to plateau around 76%, and we couldn't figure out why. Then one of our experts suggested something counterintuitive: *maybe the initialization dependency wasn't a bug—maybe it was a feature we hadn't learned to exploit yet*. The turning point came when we stopped treating seed selection as noise and started treating it as a first-class optimization problem. **Claude** was helping us orchestrate the experiments, and we realized we could systematically test different initialization seeds across our **Orchestra-MoE** model. The theory was compelling: if we ran 20 independent training runs with different seeds, the variance in performance would give us a window into what was actually happening inside the network. Our panelists—researchers specializing in initialization theory and practical deep learning—all agreed on the same direction. One pointed to the statistical insight that the expected maximum performance across N runs follows E[max(N)] ≈ mean + std × √(2 ln N). For 20 runs, this predicted we could push performance to roughly **77.3%**, nearly 1.4 percentage points above the baseline. It wasn't revolutionary, but it was real. What sold us on the approach, though, was the *practical math*. We'd spent over 85 hours experimenting with different architectural phases without meaningful gains. Running 20 seeds would take only 10 hours on GPU. The ROI was undeniable. The strategy had layers. First, we'd select the best seed based on validation performance, then validate it honestly on our full test set—1,319 problems—rather than cherry-picking. Second, we'd combine the top three seeds using ensemble voting; different initializations make different mistakes, and majority voting would smooth out the quirks. Third, we could layer this with data-dependent initialization techniques like SVD-based seed selection, potentially reducing variance even further. We also discovered synergies with other work in progress: combining seed selection with our routing mechanism gave us an extra 0.2 percentage points, and curriculum learning with the best seed had already reached 79% in earlier experiments. The lesson wasn't just about statistics or architecture. It was about **perspective shift**. What looked like a limitation—that results depended heavily on how we started the model—turned out to be a lever we hadn't pulled. By embracing the variance instead of fighting it, we'd found a path forward that was both theoretically sound and practically efficient. We wrote the batch script that night, set it running across 20 seeds, and finally felt that familiar sensation: *momentum*.

Mar 20, 2026

New Featurescada-coating

Building the Open SCADA Revolution: From Tagat to Independence

When I finished my two-year tenure as the lead developer at Tagat, one thought consumed me: **why does the electroplating industry remain locked into proprietary SCADA systems?** Thousands of coating lines across the globe run on closed-source software, each facility dependent on a single vendor for updates, support, and innovation. That frustration became the fuel for BorisovAI. I assembled a team with the same hunger for change. Together, we didn't just talk about an alternative—we **built one**. Our SCADA system for electroplating is production-ready, battle-tested, and fundamentally different. It runs on open standards, which means manufacturers gain something they've never had: *independence from vendor lock-in*. The technical challenge was immense. Electroplating requires real-time control of temperature, current density, pH levels, and chemical composition across multiple tanks. One miscalibration cascades into waste and equipment damage. We engineered redundancy into every layer—from sensor input validation to fail-safe switching protocols. The system communicates via standard APIs, integrates with existing PLCs, and logs everything in a transparent database. No black boxes. No mystery bugs that only the vendor understands. But building the software solved only half the puzzle. The real bottleneck? **We needed a manufacturing partner willing to take a risk on open-source SCADA.** That's where the partnership proposal came in. We approached leading electroplating equipment manufacturers with a simple offer: *your facility becomes our proof of concept*. You get a turnkey system that's already proven. We get the real-world validation and deployment case study we desperately need. The economics are compelling. Traditional vendors charge licensing fees and lock customers into service contracts. Our model flips that—the software is free and open. Manufacturers profit through independence, customization freedom, and the knowledge that their investment in process optimization stays *their* investment, not licensed intellectual property they'll lose if the vendor goes under. What we're proposing isn't just a technical upgrade; it's a structural shift. One coating line becomes two. Two become ten. Suddenly, the electroplating industry has options. That's the revolution we're building. --- *The glass isn't half-full or half-empty—it's twice as big as it needs to be. Same with proprietary SCADA: oversized prices for undercapacity innovation.* 😄

Mar 18, 2026

New Featurescada-coating