BorisovAI

Blog

Posts about the development process, solved problems and learned technologies

Found 10 notesReset filters
Bug Fixtrend-analisis

From Phantom Signals to Real Insights: How We Fixed the Trend Analysis Pipeline

I was staring at the dashboard when I noticed something deeply wrong. Eighteen out of nineteen signals from our analyses were simply vanishing into thin air. Here I was, working on **Trend Analysis**, trying to build a system that could detect emerging tech trends across thousands of sources, and the core mechanism—the signal detection—was silently failing. The bug was hiding in plain sight: we'd marked trend phases as `'new'`, but our system was looking for `'emerging'`. A simple string mismatch that cascaded through the entire recommendation engine. When I traced it back, I realized this wasn't just a typo—it revealed how fragile the pipeline had become as we scaled from collecting data to actually *understanding* it. That same sprint, another issue surfaced in our database joins. The `recommendations` table was linking to trends via `tr.id = t.id`, but it should have been `tr.object_id = t.id`. Suddenly, all the momentum calculations we'd carefully built returned NULL. Weeks of analysis work was getting thrown away because two tables weren't talking to each other properly. I decided it was time to fortify the entire system. We added **15 new database indices** (migration 020), which immediately cut query times in half for the most common analysis operations. We remapped **SearXNG** results back to native sources—GitHub, Hacker News, arXiv—so the trends we detected actually pointed to real, traceable origins. The shared report feature had been linking to phantom signals that no longer existed; we cleaned that up too. By v0.14.0, we'd rebuilt the reporting layer from the ground up. Server-side pagination, filtering, and sorting meant users could finally navigate thousands of signals without the frontend melting. We even added a **Saved Products** feature with localStorage persistence, so researchers could bookmark trends they cared about. The real lesson wasn't technical—it was about complexity. Every new feature (dynamic role translation, trend name localization, React hook ordering fixes) added another place where things could break silently. The glass wasn't half-empty; it was twice as big as we needed it to be. 😄 But now it actually holds water.

Mar 4, 2026
Code Changellm-analisis

The Narrow Path: Why Perfect Optimization Crumbles

I've been chasing the golden number for weeks now. **Phase 24a** delivered **76.8% accuracy on GSM8K**—a solid baseline for mathematical reasoning in large language models. The team was excited. I was cautious. In my experience, when a result feels *too clean*, it's usually balanced on a knife's edge. So I decided to push further with **Phase 29a and 29b**, two experiments designed to improve what we already had. The strategy seemed sound: inject curriculum data to guide the model toward harder problems, and extend training from 500 to 1,000 steps to capture finer pattern recognition. Standard moves in the playbook. Phase 29a involved adding **89 borderline solutions**—answers sampled at higher temperatures, intentionally less deterministic. I thought diversity would help. Instead, I watched accuracy *plummet* to **73.0%, a 3.8 percentage point drop**. The perplexity exploded to 2.16, compared to the baseline's 1.60. The model was struggling, not learning. Those temperature-sampled solutions weren't diverse training signal—they were noise wearing a training label. Then came **Phase 29b**: double the training steps. Surely more iterations would converge to something better? The loss hit 0.004—nearly zero. The model was memorizing, not generalizing. Accuracy barely limped to **74.4%**, still 2.4 points underwater. The lesson hit hard: *we'd already found the optimum at 500 steps*. Beyond that, we weren't learning—we were overfitting. What struck me most wasn't the failed experiments themselves. It was how *fragile* the baseline turned out to be. **Phase 24a wasn't a robust solution—it was a brittle peak**. The moment I changed the data composition or training duration, the whole structure collapsed. The algorithm had found a narrow channel where everything aligned perfectly: the right data distribution, the right training length, the right balance. Wiggle anything, and you tumble out. This is the hard truth about optimization in machine learning: **sometimes the best result isn't a foundation—it's a lucky intersection**. You can't always scale it. You can't always improve it by adding more of what worked before. We still have **Phase 29c** (multi-expert routing) and **29d** (MATH domain data) queued up. But I'm approaching them differently now. Not as simple extensions of success, but as careful explorations of *why* the baseline works at all. The irony? This mirrors something I read once: *"Programming is like sex. Make one mistake and you end up supporting it for the rest of your life."* 😄 In optimization, it's worse—you might be supporting someone else's lucky mistake, and have no idea where the luck ends and the skill begins.

Mar 4, 2026
New Featuretrend-analisis

How AI Assistants Flipped Our Hiring Strategy: Why We Stopped Chasing Junior Developers

I was sitting in our quarterly planning meeting when the pattern finally clicked. We'd built a sprawling engineering team—five junior developers, three mid-level folks, and two architects buried under code review requests. Our burn rate was brutal, and our velocity? Surprisingly flat. Then we started experimenting with Claude AI assistants on real implementation tasks. The results were jarring. Our two senior architects, paired with AI-powered implementation assistants, were shipping features faster than our entire junior cohort combined. Not because the juniors weren't trying—they were. But the math was broken. We were paying entry-level salaries for months-long ramp-up periods while our AI tools could generate solid, production-ready implementations in hours. The hidden costs of junior hiring—code reviews, mentorship overhead, bug fixes in hastily written code—suddenly felt like luxury we couldn't afford. **Here's where it got uncomfortable:** we had to admit that some junior developer roles weren't stepping stones anymore. They were sunk costs. So we pivoted hard. Instead of hiring five juniors this year, we recruited three senior architects and two tech leads who could shape strategy, not just execute tasks. We redeployed that saved budget into product validation and customer research—places where AI still struggles and human judgment creates real differentiation. Our junior developers? We created internal mobility programs, helping the sharp ones transition into code review, architecture design, and technical mentorship roles before the market compressed those positions further. The tradeoff wasn't clean. Our diversity pipeline took a hit in year one. Some institutional knowledge walked out the door with departing mid-level engineers who saw the writing on the wall. Competitors with clearer hiring strategies started stealing senior talent while we were still reorganizing. But the unit economics shifted. Our per-engineer output tripled. Code quality improved because senior architects weren't drowning in pull requests. And when we evaluated new candidates, we stopped asking "Can you code faster?" and started asking "Can you design systems and teach others?" The uncomfortable truth? **AI didn't replace developers—it replaced the hiring model that sustained them.** The juniors who survived were the ones hungry to become architects, not the ones content to grind through CRUD operations. And honestly, that's probably healthier for everyone. Lesson learned: when your tools change the economics of work, your hiring strategy has to change faster than your competitors'. Or you'll end up with an expensive roster of people doing work that machines do better. ASCII silly question? Get a silly ANSI. 😄

Mar 4, 2026
New Featuretrend-analisis

Building a Unified Filter System Across Four Frontend Pages

I'm sitting here on a Sunday evening, staring at the Trend Analysis codebase, and I realize we've just completed something that felt impossible two weeks ago: **unified filters that finally work the same way everywhere**. Let me walk you through how we got here. The problem was classic scaling chaos. We had four different pages—Explore, Radar, Objects, and Recommendations—each with their own filter implementation. Different layouts, different behaviors, different bugs. When the product team asked for consistent filtering across all of them, my first instinct was dread. But then I remembered: sometimes constraints breed innovation. We started with the Recommendations page, which had the most complex requirements. The backend needed **server-side pagination with limit/offset**, a priority matrix derived from P4 reports, and dynamic role extraction. I rewrote the `recommendation_store` module to handle this, ensuring that pagination wouldn't explode our API calls. The frontend team simultaneously built a new popover layout with horizontal rule dividers—simple, but visually clean. We replaced horizontal tabs with **role chips**, which turned out to be far more intuitive than I expected. But here's where it got interesting: the **Vite proxy rewrite**. Our backend routes didn't have the `/api` prefix, but the frontend was making requests to `/api/*`. Rather than refactoring the backend, we configured Vite to rewrite requests on the fly, stripping `/api` before forwarding. It felt like a hack at first, but it saved us weeks of backend changes and made the architecture cleaner overall. The i18n work was tedious but necessary—new keys for filters, pagination, tooltips. Nothing glamorous, but the multilingual user base depends on it. We also fixed a subtle bug in Trend Detail where source URLs were being duplicated; switching to `domainOf` for display eliminated that redundancy. On the Lab side, we optimized prompts for structured extraction, built an `llm_helpers` module, and improved the scoring display in Product Detail. The new table columns across Lab components gave us better visibility into the pipeline, which is always valuable when you're trying to debug why a particular trend got labeled wrong. One tiny thing that made me smile: we added `html.unescape` to both the signal mapper and the StackOverflow adapter. Those HTML entities in titles were driving everyone crazy. By the time we tagged v0.12.0, the unified filter system was live. Four pages, one design language, consistent behavior. The product team smiled. The users stopped complaining about inconsistency. And yes, I'd tell you a joke about NAT but I would have to translate. 😄

Mar 2, 2026
New Featurespeech-to-text

Why Python's the Right Choice When C++ Seems Obvious

I stood in front of a performance profile that made me uncomfortable. My Speech-to-Text project was running inference at 660 milliseconds per clip, and someone on Habré had just asked the question I'd been dreading: *"Why not use a real language?"* The implication stung a little. Python felt like the scaffolding, not the real thing. So I dug deeper, determined to prove whether we should rewrite the inference engine in C++ or Rust—languages where performance isn't a question mark. **The investigation revealed something unexpected.** I profiled the entire pipeline with surgical precision. The audio came in, flowed through the system, and hit the ONNX Runtime inference engine. That's where the work happened—660 milliseconds of pure computation. And Python? My Python wrapper accounted for less than 5 milliseconds. Input handling, output parsing, the whole glue layer between my code and the optimized runtime: *under 1% of the total time*. The runtime itself wasn't Python anyway. ONNX Runtime compiles to C++ with CUDA kernels for GPU paths. I wasn't betting on Python for heavy lifting; I was using it as the interface layer, the way you'd use a control panel in front of a steel machine. Rewriting the wrapper in C++ or Rust would save those 5 milliseconds. Maybe. If I optimized perfectly. That's 0.7% improvement. **But here's what I'd lose.** Python's ecosystem is where speech recognition actually lives right now. Silero VAD, faster-whisper, HuggingFace Hub integration—these tools are Python-first. The moment I needed to add a pretrained voice activity detector or swap models, I'd either rewrite more code in C++ or build a bridge back to Python anyway. The entire chain would become brittle. I sat with that realization for a while. The "real language" argument assumes the bottleneck is what you control. In this case, it isn't. The bottleneck is the mathematical computation, already offloaded to optimized C++ underneath. Python is just the thoughtful routing system. **So I wrote back:** The narrow spot isn't in the wrapper. If it ever moves from the model to the orchestration layer, that's the day to consider C++. Until then, Python gives me velocity, ecosystem access, and honest measurement. That's not settling—that's *engineering*. The commenter never replied, but I stopped feeling defensive about it.

Mar 2, 2026
New FeatureC--projects-bot-social-publisher

When a Monorepo Refuses to Boot on the First Try

I closed Cursor IDE and decided to finally debug why **Bot Social Publisher**—my sprawling autonomous content pipeline with collectors, processors, enrichers, and multi-channel publishers—refused to start cleanly. The architecture looked beautiful on paper: six async collectors pulling from Git, Clipboard, Cursor, Claude, VSCode, and VS; a processing layer with filtering and deduplication; enrichment via Claude CLI (no paid API, just the subscription model); and publishers targeting websites, VK, and Telegram. Everything was modular, clean, structured. And completely broken. The first shock came when I tried importing `src/enrichment/`. Python screamed about missing dependencies. I checked `requirements.txt`—it was incomplete. Somewhere in the codebase, someone had installed `structlog` for JSON logging and `pydantic` for data models, but never updated the requirements file. On Windows in Git Bash, I had to navigate to the venv carefully: `venv/Scripts/pip install structlog pydantic`. The path matters—backslashes don't work in Bash. Once installed, I added them to `requirements.txt` so the next person wouldn't hit the same wall. Then came the Claude CLI integration check. The pipeline was supposed to make up to 6 LLM calls per note (content in Russian and English, titles in both languages, plus proofreading). With a daily limit of 100 queries and 3-concurrent throttling, this was unsustainable. I realized the system was trying to generate full content twice—once in Russian, once in English—when it could extract titles from the generated content instead. That alone would cut calls from 6 to 3 per note. The real puzzle was ContentSelector, the module responsible for reducing 100+ line developer logs down to 40–60 informative lines. It was scoring based on positive signals (implemented, fixed, technology names, problems, solutions) and negative signals (empty markers, long hashes, bare imports). Elegant in theory. But when I tested it on actual Git commit logs, it was pulling in junk: IDE meta-tags like `<ide_selection>` and fallback titles like "Activity in...". The filter was too permissive. I spent an afternoon refactoring the scoring function, adding a junk-removal step before deduplication. Now the ContentSelector actually worked. By the time I pushed everything to the `main` branch (after fixing Cyrillic encoding issues—never use `curl -d` with Russian text on Windows; use Python's `urllib.request` instead), the monorepo finally booted cleanly. `npm run dev` on the web layer. Python async collectors spinning up. API endpoints responding. Enrichment pipeline humming. As the old developers say: **ASCII silly question, get a silly ANSI.** 😄

Feb 25, 2026
New Featuretrend-analisis

Reconciling Data Models: When Your API Speaks a Different Language

I was deep in the **Trend Analysis** project when I hit one of those frustrating moments that every developer knows too well: the database schema and the API endpoints were talking past each other. The problem was straightforward but annoying. Our **DATA-MODEL.md** file had renamed the columns to something clean and semantic—`signal_id`, `trend_id`—following proper naming conventions. Meanwhile, **ENDPOINTS.md** was still using the legacy API field names: `trend_id`, `trend_class_id`. On paper, they seemed compatible. In practice? A nightmare waiting to happen. I realized this inconsistency would eventually bite us. Either some team member would write a database query using the old names while another was building an API consumer expecting the new ones, or we'd silently corrupt data during migrations. The kind of bug that whispers until it screams in production. The real challenge wasn't just renaming—it was maintaining backward compatibility while we transitioned. We couldn't just flip a switch and break existing integrations. I had to think through the migration strategy: should we add aliases to the database schema? Create a translation layer in the API? Or version the endpoints? After sketching out the architecture, I opted for a pragmatic approach: update the canonical **DATA-MODEL.md** to be the source of truth, then create a mapping document that explicitly shows the relationship between internal schema names and external API contracts. This meant the API layer would handle the translation transparently—consumers would still see the familiar field names they depend on, but internally we'd operate with the cleaner model. **Here's a fascinating fact:** The concept of mapping between internal and external data representations comes from **domain-driven design**. What we call a "bounded context" in DDD—the idea that different parts of a system can have different models of the same concept—is exactly what we were dealing with. The database lives in one context, the API in another. They need a bridge, not a merger. The work took longer than I'd anticipated, but the payoff was clear. Now when new team members join and look at the code, they see consistency. The mental overhead drops. Future refactoring becomes possible without fear. And honestly? Getting this right early saved us from the kind of technical debt that quietly multiplies. As a programmer, I've learned to worry about consistency errors as much as runtime ones—because one *becomes* the other, just with a time delay. *A man walks into a code review and sees a messy schema. "Why isn't this documented?" he asks. The developer replies, "I am a programmer. We don't worry about documentation—we only worry about errors." The reviewer sighs: "That's the problem."* 😄

Feb 25, 2026
New Featuretrend-analisis

Building Smarter Documentation: When Your Tech Debt Map Becomes Your Roadmap

I spent the last few days staring at a tangled mess of outdated documentation—the kind that grows like weeds when your codebase evolves faster than your docs can follow. The project was **Trend Analysis**, built with **Claude, JavaScript, and Git APIs**, and the problem was deceptively simple: our technical documentation had drifted so far from reality that it was useless. Here's what happened. Our INDEX.md still referenced `frontend-cascade/` while we'd renamed it to `frontend/` months ago. The TECH-DEBT.md file claimed we'd resolved a database refactoring issue (BE-2), but poking into MEMORY.md revealed the truth—`_row_to_item` was *still* using positional mapping instead of the promised named parameters. Meanwhile, ENDPOINTS.md had endpoint numbering that jumped from `8a` directly to `10`, skipping `9` entirely like some kind of digital superstition. The real insight hit when I realized this wasn't just sloppiness—it was **decision debt**. Every divergence between docs and code represented a moment where someone (probably me, if I'm honest) chose "ship first, document later" over keeping things in sync. The cost? Hours of my time, confusion for collaborators, and a growing sense that maybe our documentation process was fundamentally broken. So I rebuilt it systematically. I mapped the actual project structure, traced through the real implementation across multiple files, verified each claim against the codebase, and created a coherent narrative. The ADR (Architecture Decision Record) count went from vague to concrete. The endpoint numbering actually flowed logically. The tech debt table now accurately reflected what was *actually* resolved versus what was just *claimed* to be resolved. I even added notes about deprecated table names in the older implementation phases so future developers wouldn't get confused by ghost references. The hardest part wasn't the technical work—it was resisting the urge to over-document. **You can document everything, but that's not the same as documenting well.** I focused on the decisions that actually mattered, the gotchas we'd hit, and the exact state of things *right now*, not some idealized version from the README we wrote last year. Here's the lesson I'm taking away: documentation debt compounds faster than code debt because nobody's monitoring it. You can run a linter on your code, but who's checking if your architecture docs match your actual architecture? Treat documentation like you treat your test suite—make it part of the build process, not an afterthought. And yeah, why do they call it **hyper terminal**? Too much Java. 😄

Feb 25, 2026
New Featuretrend-analisis

Government Moves to Open Source: A Strategic Shift in Digital Infrastructure

When a state decides to migrate its entire software infrastructure to open source, you're not just talking about swapping proprietary licenses for free alternatives. You're orchestrating a fundamental shift in how public institutions think about technology ownership, vendor lock-in, and long-term sustainability. The project we've been tracking—code-named Trend Analysis—represents exactly this kind of transformation. A government digital program is planning a complete migration from closed-source systems to open-source alternatives, and the implications run deep. **Why Now? Why This Matters** The decision doesn't come from ideological fervor alone. Open source offers governments three critical advantages: **transparency** (critical for public trust), **independence** (no vendor dictates your roadmap), and **cost predictability** (no surprise licensing fees). When you're managing infrastructure for millions of citizens, these aren't nice-to-haves—they're requirements. The Trend Analysis project is mapping this migration at scale. We're talking about replacing proprietary tools across entire systems: from core APIs to data pipelines, from frontend interfaces to backend databases. The team is using Claude AI to analyze requirements, identify compatibility gaps, and plan the transition phases. **The Technical Reality** Migrating government infrastructure isn't like switching your personal laptop from Windows to Linux. You're managing: - **Legacy system integration**: Old systems need to talk to new ones during transition - **Data consistency**: Decades of data stored in proprietary formats must be preserved - **Security auditing**: Every line of open-source code replacing a closed system gets scrutiny - **Team training**: Your workforce suddenly needs new skills The Trend Analysis approach? Break it into features. Implement in phases. Test aggressively. Use AI-driven analysis to identify which systems should migrate first, which dependencies exist, and where bottlenecks will emerge. **The Real Innovation** What's fascinating isn't the choice itself—many governments are making it. It's the systematic approach. By treating this as a "feature implementation" project with AI analysis, the team transforms what could be a chaotic, years-long nightmare into a structured, milestone-driven program. They're using modern development practices (branching, documentation, categorization) to solve an inherently bureaucratic problem. That's where Claude and AI analysis shine: they compress decision-making from months into weeks by analyzing trend data, identifying patterns, and recommending optimal migration sequences. **The Takeaway** Government digital transformation is accelerating. Open source isn't a fringe choice anymore—it's becoming the baseline for public institutions that can't afford vendor lock-in. And projects like Trend Analysis prove that with the right tooling and methodology, even massive infrastructure migrations become manageable. --- *Why do Python programmers wear glasses? Because they can't C.* 😄

Feb 25, 2026
New FeatureC--projects-ai-agents-voice-agent

When Your GPU Runs Out of Memory: Lessons from Voice Agent Model Loading

I was debugging why our **Voice Agent** project kept failing to load the UI-TARS model, and the logs were telling a frustratingly incomplete story. The vLLM container would start, respond to health checks, but then mysteriously stop mid-initialization. Classic infrastructure debugging scenario. The culprit? **A 16GB VRAM RTX 4090 Laptop GPU with only 5.4GB actually free.** UI-TARS 7B in float16 precision needs roughly 14GB to load, and even with aggressive `gpu_memory_utilization=0.9` tuning, the math didn't work. The container logs would cut off right at "Starting to load model..." — the killer detail that revealed the truth. The inference server never actually became ready; it was stuck in a memory allocation loop. What made this tricky was that the health check endpoint `/health` returns a 200 response *before* the model finishes loading. So the orchestration layer thought everything was fine while the actual inference path was completely broken. I had to dig into the full vLLM startup sequence to realize the distinction: endpoint availability ≠ model readiness. The fix involved three decisions: **First**, switch to a smaller model. Instead of UI-TARS 7B-SFT, we'd use the 2B-SFT variant — still capable enough for our use case but fitting comfortably in available VRAM. Sometimes the heroic solution is just choosing a different tool. **Second**, be explicit about what "ready" means. Updated the health check to `/health` with proper timeout windows, ensuring the orchestrator waits for genuine model loading completion, not just socket availability. **Third**, make memory constraints visible. I added `gpu_memory_utilization` configuration as a first-class parameter in our docker-compose setup, with clear comments explaining the tradeoff: higher utilization = better throughput but increased OOM risk on resource-constrained hardware. The broader lesson here is that **GPU memory is a hard constraint**, not a soft one. You can't incrementally load a model; either it fits or it doesn't. Unlike CPU memory with paging, exceeding VRAM capacity doesn't degrade gracefully — it just stops. This is why many production systems now include memory profiling in their CI/CD pipelines, catching model-to-hardware mismatches before they hit real infrastructure. --- *There are only 10 kinds of people in this world: those who know binary and those who don't.* 😄

Feb 25, 2026