Choosing Software Is About to Become a Whole Lot Harder

Field guide · March 14, 2026

← All resourcesBy Harry Hoffman, Synaptic Cybersecurity Alliance

For a long time, people relied on a set of informal signals when evaluating software. If an application looked polished, had a coherent interface, came from a recognizable vendor or open-source project, and appeared to contain a meaningful amount of code, it was generally safe to assume that real engineering effort had gone into it. That did not mean the software was perfect, but it usually meant it had matured through the friction of time. Systems were written, tested, broken, rewritten, and gradually improved through use.

Those signals are about to become far less reliable.

Large language models are dramatically lowering the barrier to producing what appears to be mature software. A single developer can scaffold a web application, integrate frameworks, generate APIs, assemble configuration files, and produce documentation in hours. The interface looks reasonable. The repository contains thousands of lines of code. The project may even include tests and deployment scripts.

From the outside, it can look indistinguishable from software that evolved over months or years of engineering work.

The difficulty is that appearance no longer tells us very much about the underlying quality of the system.

Historically, the amount of code in a project acted as a rough proxy for effort. Writing tens of thousands of lines of functioning code took time, and along the way someone inevitably discovered bugs, architectural weaknesses, operational constraints, and security concerns. Those discoveries forced the system to mature.

When code can be generated quickly, that natural maturation process changes. It becomes possible to produce a large and seemingly sophisticated codebase without ever confronting the deeper questions that determine whether the system will remain stable under real-world conditions. A project can look complete even when the architectural foundations are fragile, the operational model is unclear, and no one has spent much time thinking about how the system behaves once real users, real data, and real failures enter the picture.

This is why the common debate about AI-generated software often misses the point. Much of the conversation focuses on authorship. Was this written by a human engineer or generated by a model?

In practice, that question matters far less than people think.

The questions that actually matter are different. Can the system be understood by someone who did not originally build it? Can it be tested in meaningful ways? Can it be secured and monitored? Can it be operated reliably? Can it be maintained once the original author moves on?

Those qualities have always defined good software. What has changed is that the superficial signals we once relied on to infer those qualities are becoming unreliable.

The real risk is not that AI will produce code that does not run. In many cases the code runs perfectly well. The risk is that the systems built this way may lack structural integrity. Individual pieces of the application may look well designed, but the architecture connecting those pieces may never have been fully considered. Logic becomes duplicated across modules. Security controls appear in some places but not others. Dependency chains grow longer than anyone realizes. Features accumulate faster than the underlying design evolves to support them.

Everything works until the moment it does not. And when something breaks, understanding why can be far more difficult than it should be.

There is also another dynamic emerging that deserves attention. If creating software becomes dramatically easier, the number of software projects will increase just as dramatically. A single motivated developer can now produce something that looks like a fully realized platform. Sometimes those projects mature and attract communities. Just as often they do not. The developer moves on, interests shift, and the repository gradually goes quiet.

The code remains available, but the stewardship disappears.

In a world where software creation accelerates, abandonment risk becomes a practical concern. Organizations may increasingly find themselves evaluating tools that appear sophisticated but are effectively maintained by one person in their spare time. If that person loses interest or changes direction, the project can stall overnight. The code may still exist, but the support, maintenance, and security updates that keep software healthy over time vanish.

All of this means that the criteria we use to evaluate software need to evolve.

Transparency becomes far more valuable

When appearance becomes easy to manufacture, visibility becomes more important. One of the reasons open-source software becomes especially valuable in this environment is that it allows people to examine the underlying reality of a project. The architecture, testing practices, dependency choices, and maintenance activity are all visible.

That transparency does not guarantee quality. Plenty of open-source projects are fragile or poorly maintained. But it does allow interested parties to verify whether real engineering discipline exists beneath the surface.

You can see whether defects are tracked and resolved. You can see whether tests exist and whether they are meaningful. You can see how frequently the software evolves and who is responsible for maintaining it. In other words, transparency restores the ability to evaluate signals that would otherwise be hidden.

In an environment where convincing software can be generated quickly, that kind of visibility becomes incredibly valuable. It's one of the reasons our own open-source product, Hekate, is AGPL-licensed with a public codebase — the same evaluation standards we apply to other people's software, we should expect to be applied to ours.

A more useful way to evaluate software

If the old signals are weakening, organizations need to start evaluating software along deeper dimensions. One useful way to think about this is to examine several core areas.

Functional reliability

Does the system actually behave correctly under normal and abnormal conditions? Are the core use cases documented? Is there evidence that the system has been tested against edge cases and failure scenarios?

Signals of strength:

Automated tests with meaningful coverage
Clear bug tracking and changelogs
Reproducible builds and stable releases

Warning signs:

Demo-quality polish with little evidence of testing
Large numbers of unresolved defects
Frequent regressions between releases

Code quality and maintainability

Generated code can be syntactically correct and still structurally weak. The real question is whether someone new to the project could understand and safely modify the system.

Signals of strength:

Clear module boundaries
Consistent naming and style
Readable error handling and sensible abstractions

Warning signs:

Massive files with mixed responsibilities
Duplicated logic across the codebase
Unnecessary architectural complexity

Security posture

AI-generated code often reproduces insecure patterns found in training data. Security controls should be visible and deliberate. This is exactly the kind of audit the vendor-AI-claim review is designed to surface.

Signals of strength:

Proper handling of secrets
Dependency scanning and patching processes
Clear authentication and authorization models

Warning signs:

Credentials embedded in code or examples
Overly permissive access controls
Weak or nonexistent audit logging

Operational maturity

Software that runs in a demo is not the same as software that can run reliably in production. The discipline of engineering for operations is what separates the two.

Signals of strength:

Documented deployment processes
Monitoring, logging, and health checks
Backup and recovery procedures

Warning signs:

Setup that depends on tribal knowledge
Little visibility into failures
No clear operational runbook

Sustainability and stewardship

In an era where a single developer can produce a sophisticated application quickly, the long-term health of a project depends on who maintains it.

Signals of strength:

Multiple active contributors
Regular releases and roadmap visibility
Documentation intended for future maintainers

Warning signs:

Long periods of inactivity
One-person ownership with no backup
Unclear governance or support model

Strategic fit

Even well-written software can be the wrong choice if it does not align with the environment where it will operate. This is the kind of evaluation that fits naturally into managed-services operating-model design — figuring out what should be in-house, brokered, or SaaS for your specific institution.

Signals of strength:

Clear integration points with existing systems
Skills that exist within the organization
Reasonable path for migration or replacement

Warning signs:

Heavy lock-in to proprietary components
Operational complexity beyond the team's capabilities
No exit path if the software becomes unsustainable

A simple evaluation scorecard

One practical approach is to score each category from 1 to 5:

Functional reliability
Code maintainability
Security posture
Operational readiness
Documentation quality
Maintainer sustainability
Strategic fit
Exit or replacement feasibility

This does not eliminate risk, but it forces the conversation into concrete terms. If a system scores poorly on security, it probably should not handle sensitive data. If operational readiness is weak, it may be suitable for experimentation but not production use. If sustainability is questionable, it may not be wise to build critical dependencies around it.

The goal is not to eliminate risk entirely. The goal is to make those risks visible and intentional rather than accidental.

Stewardship now matters as much as authorship

Perhaps the most important shift is that software quality will increasingly depend on stewardship rather than authorship. The question is no longer simply who wrote the code. It is who understands it, who maintains it, and who is accountable for its future.

Well-governed projects with clear ownership, transparent development practices, and real operational discipline will stand out more and more. Projects built through enthusiasm alone may still produce interesting ideas, but they will struggle to sustain themselves over time.

AI is going to dramatically accelerate software creation. That is, in many ways, a remarkable development. It lowers barriers to experimentation and allows more ideas to become real systems.

But it also means the world is about to be flooded with software that looks finished.

For technology leaders, developers, and organizations responsible for choosing and operating software, the challenge will be learning to look beyond that surface appearance. The real evaluation will not be whether the system works during a demonstration, but whether the system can withstand the pressures that inevitably arise once it becomes part of a real environment.

In other words, the question will no longer be whether a piece of software works today. The question will be whether it will still work when it matters.

Choosing Software Is About to Become a Whole Lot Harder

Transparency becomes far more valuable

A more useful way to evaluate software

Functional reliability

Code quality and maintainability

Security posture

Operational maturity

Sustainability and stewardship

Strategic fit

A simple evaluation scorecard

Stewardship now matters as much as authorship

Tags

Related at Synaptic Cyber