Cohesive & AI

LLMs, Language, and the Need for a Semantic Substrate

Large language models have changed how we write software. You can describe a feature in English and receive working code. You can translate between programming languages in seconds. You can scaffold entire systems conversationally.

But this raises a structural question:

If English can generate code, what is the real source of truth?

LLMs, Language, and the Need for a Semantic Substrate
Three Layers of Specification
1. Natural Language
2. Programming Languages
3. The Missing Layer
Why LLMs Increase the Need for Structure
A Better Architecture
What a Semantic Substrate Must Provide
Natural Language vs World Models
Cohesive Systems: Semantics First
The Strategic Shift
The Future of Programming
References

Three Layers of Specification

There are now three competing layers:

1. Natural Language

English is:

High bandwidth
Flexible
Expressive
Ideal for exploration

But it is also:

Ambiguous
Context-sensitive
Non-deterministic
Not mechanically verifiable

Natural language captures intent. It does not define semantics.

When used as the source of truth, systems drift. Regeneration changes behavior. Intent cannot be diffed. Invariants cannot be checked.

English is an excellent design surface. It is not a stable semantic foundation.

2. Programming Languages

C#, Rust, TypeScript, SQL - these remain the operational source of truth.

They provide:

Deterministic execution
Type systems
Tooling and static analysis
Mechanical verification

But they mix:

Meaning
Infrastructure
Optimization
Incidental complexity

Programming languages are execution-oriented. They encode how more than what. LLMs can generate code in any of them, but they are still generating implementations, not semantics.

3. The Missing Layer

What is missing is a stable, high-level semantic substrate.

Something that is:

Easy to read and write
Declarative and composable
Deterministic and canonical
Verifiable
Compilable to multiple runtimes
Friendly to AI systems

Not English. Not raw code. A semantics-first intermediate representation.

Why LLMs Increase the Need for Structure

It might appear that AI eliminates the need for formal specification languages.

The opposite is true.

The more automation you introduce, the more you need:

Explicit invariants
Canonical forms
Deterministic normalization
Separation of semantics from execution

Without this, regeneration becomes chaotic. Systems become unstable under iteration. LLMs reduce the cost of authoring high-level abstractions. They do not remove the need for semantic anchors.

A Better Architecture

A stable future stack looks like this:

Layer 1 — Natural Language

Intent capture. Exploration. Collaboration. Conversation with AI.

Layer 2 — Semantic Substrate

Explicit domain models. Relations. Transitions. Processes. Invariants.

Layer 3 — Generated Code & Runtime

APIs. Storage. Orchestration. Infrastructure. LLMs operate between these layers. They do not replace them.

What a Semantic Substrate Must Provide

An effective substrate must:

Represent entities as typed algebraic structures
Represent relations as composable relational algebra
Represent transitions as deterministic state transformations
Represent processes as coordination semantics
Make invariants explicit
Separate meaning from execution strategy
Compile to multiple targets (C#, SQL, Elastic, EDI, etc.)
Support canonical diffing and analysis

It must be expressive enough for real systems. And precise enough for mechanical reasoning.

Natural Language vs World Models

Some AI research (e.g., predictive latent world models) suggests that intelligence arises from learned internal representations rather than symbolic structure. That may be true for general cognition.

But software engineering requires: