LLMs, Language, and the Need for a Semantic Substrate
Large language models have changed how we write software. You can describe a feature in English and receive working code. You can translate between programming languages in seconds. You can scaffold entire systems conversationally.
But this raises a structural question:
If English can generate code, what is the real source of truth?
- LLMs, Language, and the Need for a Semantic Substrate
- Three Layers of Specification
- 1. Natural Language
- 2. Programming Languages
- 3. The Missing Layer
- Why LLMs Increase the Need for Structure
- A Better Architecture
- What a Semantic Substrate Must Provide
- Natural Language vs World Models
- Cohesive Systems: Semantics First
- The Strategic Shift
- The Future of Programming
- References
Three Layers of Specification
There are now three competing layers:
1. Natural Language
English is:
- High bandwidth
- Flexible
- Expressive
- Ideal for exploration
But it is also:
- Ambiguous
- Context-sensitive
- Non-deterministic
- Not mechanically verifiable
Natural language captures intent. It does not define semantics.
When used as the source of truth, systems drift. Regeneration changes behavior. Intent cannot be diffed. Invariants cannot be checked.
English is an excellent design surface. It is not a stable semantic foundation.
2. Programming Languages
C#, Rust, TypeScript, SQL - these remain the operational source of truth.
They provide:
- Deterministic execution
- Type systems
- Tooling and static analysis
- Mechanical verification
But they mix:
- Meaning
- Infrastructure
- Optimization
- Incidental complexity
Programming languages are execution-oriented. They encode how more than what. LLMs can generate code in any of them, but they are still generating implementations, not semantics.
3. The Missing Layer
What is missing is a stable, high-level semantic substrate.
Something that is:
- Easy to read and write
- Declarative and composable
- Deterministic and canonical
- Verifiable
- Compilable to multiple runtimes
- Friendly to AI systems
Not English. Not raw code. A semantics-first intermediate representation.
Why LLMs Increase the Need for Structure
It might appear that AI eliminates the need for formal specification languages.
The opposite is true.
The more automation you introduce, the more you need:
- Explicit invariants
- Canonical forms
- Deterministic normalization
- Separation of semantics from execution
Without this, regeneration becomes chaotic. Systems become unstable under iteration. LLMs reduce the cost of authoring high-level abstractions. They do not remove the need for semantic anchors.
A Better Architecture
A stable future stack looks like this:
Layer 1 — Natural Language
Intent capture. Exploration. Collaboration. Conversation with AI.
Layer 2 — Semantic Substrate
Explicit domain models. Relations. Transitions. Processes. Invariants.
Layer 3 — Generated Code & Runtime
APIs. Storage. Orchestration. Infrastructure. LLMs operate between these layers. They do not replace them.
What a Semantic Substrate Must Provide
An effective substrate must:
- Represent entities as typed algebraic structures
- Represent relations as composable relational algebra
- Represent transitions as deterministic state transformations
- Represent processes as coordination semantics
- Make invariants explicit
- Separate meaning from execution strategy
- Compile to multiple targets (C#, SQL, Elastic, EDI, etc.)
- Support canonical diffing and analysis
It must be expressive enough for real systems. And precise enough for mechanical reasoning.
Natural Language vs World Models
Some AI research (e.g., predictive latent world models) suggests that intelligence arises from learned internal representations rather than symbolic structure. That may be true for general cognition.
But software engineering requires:
- Auditability
- Stability
- Determinism
- Long-term maintainability
Learned latent embeddings are not suitable as specification languages. Symbolic semantics remain necessary.
Cohesive Systems: Semantics First
Cohesive Systems treats semantics as the foundation.
Everything else is projection.
- Entities define stable data algebra.
- Relations define compositional queries and mappings.
- Transitions define deterministic state evolution.
- Processes define coordination semantics.
Execution is layered on top. Infrastructure is generated from meaning. AI assists at every level, but does not replace the substrate.
The Strategic Shift
Before LLMs, building high-level semantic DSLs was expensive.
Now:
- AI can generate boilerplate.
- AI can translate between representations.
- AI can help author semantic models directly.
This changes the economics.
We can afford to build systems where:
- Meaning is explicit.
- Code is derived.
- Infrastructure is generated.
- Semantics are stable.
The Future of Programming
English will increasingly become the interactive interface.
Programming languages will remain execution targets.
But the durable systems — the ones that scale in complexity without collapsing — will rely on a stable semantic core.
That is the direction of Semantic Systems Engineering.
Not replacing code with English.
Not replacing structure with probability.
But introducing a deterministic substrate between intent and execution.
Cohesive Systems is building that substrate.