Choosing Software Is About to Become a Whole Lot Harder
Field guide · March 14, 2026
For a long time, people relied on a set of informal signals when evaluating software. If an application looked polished, had a coherent interface, came from a recognizable vendor or open-source project, and appeared to contain a meaningful amount of code, it was generally safe to assume that real engineering effort had gone into it. That did not mean the software was perfect, but it usually meant it had matured through the friction of time. Systems were written, tested, broken, rewritten, and gradually improved through use.
Those signals are about to become far less reliable.
Large language models are dramatically lowering the barrier to producing what appears to be mature software. A single developer can scaffold a web application, integrate frameworks, generate APIs, assemble configuration files, and produce documentation in hours. The interface looks reasonable. The repository contains thousands of lines of code. The project may even include tests and deployment scripts.
From the outside, it can look indistinguishable from software that evolved over months or years of engineering work.
The difficulty is that appearance no longer tells us very much about the underlying quality of the system.
Historically, the amount of code in a project acted as a rough proxy for effort. Writing tens of thousands of lines of functioning code took time, and along the way someone inevitably discovered bugs, architectural weaknesses, operational constraints, and security concerns. Those discoveries forced the system to mature.
When code can be generated quickly, that natural maturation process changes. It becomes possible to produce a large and seemingly sophisticated codebase without ever confronting the deeper questions that determine whether the system will remain stable under real-world conditions. A project can look complete even when the architectural foundations are fragile, the operational model is unclear, and no one has spent much time thinking about how the system behaves once real users, real data, and real failures enter the picture.
This is why the common debate about AI-generated software often misses the point. Much of the conversation focuses on authorship. Was this written by a human engineer or generated by a model?
In practice, that question matters far less than people think.
The questions that actually matter are different. Can the system be understood by someone who did not originally build it? Can it be tested in meaningful ways? Can it be secured and monitored? Can it be operated reliably? Can it be maintained once the original author moves on?
Those qualities have always defined good software. What has changed is that the superficial signals we once relied on to infer those qualities are becoming unreliable.
The real risk is not that AI will produce code that does not run. In many cases the code runs perfectly well. The risk is that the systems built this way may lack structural integrity. Individual pieces of the application may look well designed, but the architecture connecting those pieces may never have been fully considered. Logic becomes duplicated across modules. Security controls appear in some places but not others. Dependency chains grow longer than anyone realizes. Features accumulate faster than the underlying design evolves to support them.
Everything works until the moment it does not. And when something breaks, understanding why can be far more difficult than it should be.
There is also another dynamic emerging that deserves attention. If creating software becomes dramatically easier, the number of software projects will increase just as dramatically. A single motivated developer can now produce something that looks like a fully realized platform. Sometimes those projects mature and attract communities. Just as often they do not. The developer moves on, interests shift, and the repository gradually goes quiet.
The code remains available, but the stewardship disappears.
In a world where software creation accelerates, abandonment risk becomes a practical concern. Organizations may increasingly find themselves evaluating tools that appear sophisticated but are effectively maintained by one person in their spare time. If that person loses interest or changes direction, the project can stall overnight. The code may still exist, but the support, maintenance, and security updates that keep software healthy over time vanish.
All of this means that the criteria we use to evaluate software need to evolve.
Transparency becomes far more valuable
When appearance becomes easy to manufacture, visibility becomes more important. One of the reasons open-source software becomes especially valuable in this environment is that it allows people to examine the underlying reality of a project. The architecture, testing practices, dependency choices, and maintenance activity are all visible.
That transparency does not guarantee quality. Plenty of open-source projects are fragile or poorly maintained. But it does allow interested parties to verify whether real engineering discipline exists beneath the surface.
You can see whether defects are tracked and resolved. You can see whether tests exist and whether they are meaningful. You can see how frequently the software evolves and who is responsible for maintaining it. In other words, transparency restores the ability to evaluate signals that would otherwise be hidden.
In an environment where convincing software can be generated quickly, that kind of visibility becomes incredibly valuable. It's one of the reasons our own open-source product, Hekate, is AGPL-licensed with a public codebase — the same evaluation standards we apply to other people's software, we should expect to be applied to ours.
A more useful way to evaluate software
If the old signals are weakening, organizations need to start evaluating software along deeper dimensions. One useful way to think about this is to examine several core areas.
Functional reliability
Does the system actually behave correctly under normal and abnormal conditions? Are the core use cases documented? Is there evidence that the system has been tested against edge cases and failure scenarios?
Signals of strength:
- Automated tests with meaningful coverage
- Clear bug tracking and changelogs
- Reproducible builds and stable releases
Warning signs:
- Demo-quality polish with little evidence of testing
- Large numbers of unresolved defects
- Frequent regressions between releases
Code quality and maintainability
Generated code can be syntactically correct and still structurally weak. The real question is whether someone new to the project could understand and safely modify the system.
Signals of strength:
- Clear module boundaries
- Consistent naming and style
- Readable error handling and sensible abstractions
Warning signs:
- Massive files with mixed responsibilities
- Duplicated logic across the codebase
- Unnecessary architectural complexity
Security posture
AI-generated code often reproduces insecure patterns found in training data. Security controls should be visible and deliberate. This is exactly the kind of audit the vendor-AI-claim review is designed to surface.
Signals of strength:
- Proper handling of secrets
- Dependency scanning and patching processes
- Clear authentication and authorization models
Warning signs:
- Credentials embedded in code or examples
- Overly permissive access controls
- Weak or nonexistent audit logging
Operational maturity
Software that runs in a demo is not the same as software that can run reliably in production. The discipline of engineering for operations is what separates the two.
Signals of strength:
- Documented deployment processes
- Monitoring, logging, and health checks
- Backup and recovery procedures
Warning signs:
- Setup that depends on tribal knowledge
- Little visibility into failures
- No clear operational runbook
Sustainability and stewardship
In an era where a single developer can produce a sophisticated application quickly, the long-term health of a project depends on who maintains it.
Signals of strength:
- Multiple active contributors
- Regular releases and roadmap visibility
- Documentation intended for future maintainers
Warning signs:
- Long periods of inactivity
- One-person ownership with no backup
- Unclear governance or support model
Strategic fit
Even well-written software can be the wrong choice if it does not align with the environment where it will operate. This is the kind of evaluation that fits naturally into managed-services operating-model design — figuring out what should be in-house, brokered, or SaaS for your specific institution.
Signals of strength:
- Clear integration points with existing systems
- Skills that exist within the organization
- Reasonable path for migration or replacement
Warning signs:
- Heavy lock-in to proprietary components
- Operational complexity beyond the team's capabilities
- No exit path if the software becomes unsustainable
A simple evaluation scorecard
One practical approach is to score each category from 1 to 5:
- Functional reliability
- Code maintainability
- Security posture
- Operational readiness
- Documentation quality
- Maintainer sustainability
- Strategic fit
- Exit or replacement feasibility
This does not eliminate risk, but it forces the conversation into concrete terms. If a system scores poorly on security, it probably should not handle sensitive data. If operational readiness is weak, it may be suitable for experimentation but not production use. If sustainability is questionable, it may not be wise to build critical dependencies around it.
The goal is not to eliminate risk entirely. The goal is to make those risks visible and intentional rather than accidental.
Stewardship now matters as much as authorship
Perhaps the most important shift is that software quality will increasingly depend on stewardship rather than authorship. The question is no longer simply who wrote the code. It is who understands it, who maintains it, and who is accountable for its future.
Well-governed projects with clear ownership, transparent development practices, and real operational discipline will stand out more and more. Projects built through enthusiasm alone may still produce interesting ideas, but they will struggle to sustain themselves over time.
AI is going to dramatically accelerate software creation. That is, in many ways, a remarkable development. It lowers barriers to experimentation and allows more ideas to become real systems.
But it also means the world is about to be flooded with software that looks finished.
For technology leaders, developers, and organizations responsible for choosing and operating software, the challenge will be learning to look beyond that surface appearance. The real evaluation will not be whether the system works during a demonstration, but whether the system can withstand the pressures that inevitably arise once it becomes part of a real environment.
In other words, the question will no longer be whether a piece of software works today. The question will be whether it will still work when it matters.
Tags
- AI
- Software evaluation
- Open source
- Supply chain
- Stewardship