Cohesive Systems
  • Building Blocks
  • Products
  • Vision
  • Information
  • GitHub

Cohesive ARI Architecture

  • Architecture
  • Formalization
  • Formulation
  • Candidate Generation
  • Feature Extraction
  • Scoring
  • Reranking
  • Constraint Solving

Architecture

The ARI relation inference architecture is a cascade of both deterministic and probabilistic models.

image

Formalization

ARI solves a structured prediction problem over a bipartite graph. This is MAP inference in a constrained graphical model.

Formulation

Let the sets of source SSS and target TTT field paths be denoted as:

S={ si }T={ ti }S=\set{s_i} \qquad T=\set{t_i}S={si​}T={ti​}

A mapping (relation) RRR is a subset of their Cartesian product:

R⊆S×TR\sube S\times TR⊆S×T

Ari seeks an optimal relation R∗R^*R∗:

R∗=arg max⁡R⊆S×TL(R)R^*=\argmax_{R\sube S\times T} \mathcal L(R)R∗=R⊆S×Targmax​L(R)

Candidate Generation

For each source field s∈Ss\in Ss∈S generate a candidate set:

C(s)⊆TC(s)\sube TC(s)⊆T

thus forming the candidate relation:

C={ (s,t)∈S×T∣t∈C(s) }\mathcal C=\set{ (s,t)\in S\times T\mid t\in C(s) }C={(s,t)∈S×T∣t∈C(s)}

Feature Extraction

Each candidate pair is mapped to a feature vector:

ϕ ⁣:S×T→Rd\phi\colon S\times T\to\R^dϕ:S×T→Rd

and denote the feature vector associated to candidate pair s,ts,ts,t:

xs,t=ϕ(s,t)\mathbf x_{s,t}=\phi(s,t)xs,t​=ϕ(s,t)

Features may include:

  • Lexical similarity
  • Structural relations
  • Embedding similarity (bi-encoder): ϕemb(s,t)=⟨f(s),g(t)⟩\phi_\text{emb}(s,t)=\lang f(s), g(t)\rangϕemb​(s,t)=⟨f(s),g(t)⟩

Scoring

Assign a unary score to each candidate:

u(s,t)=fθ(xs,t)u(s,t)=f_\theta(\mathbf x_{s,t})u(s,t)=fθ​(xs,t​)

Common forms:

  • Linear: w⊤xs,tw^\top \mathbf x_{s,t}w⊤xs,t​
  • GBDT

We retain the top-kkk candidates for each s∈Ss\in Ss∈S:

Ck(s)=Top-kt∈C(s)u(s,t)C_k(s)=\text{Top-k}_{t\in C(s)} u(s,t)Ck​(s)=Top-kt∈C(s)​u(s,t)

Reranking

Define pairwise scoring over candidate assignments:

p((s,t),(s′,t′))p((s,t), (s',t'))p((s,t),(s′,t′))

This captures structural consistency:

  • Schema constraints
  • Co-occurence
  • Graph compatibility

Examples:

  • Cross-encoder: pce=hθ(s,t,s′,t′)p_\text{ce} = h_\theta (s,t,s',t')pce​=hθ​(s,t,s′,t′)
  • CRF/GNN: p=ψθ(G,(s,t),(s′,t′))p= \psi_\theta(\mathcal G, (s,t), (s',t'))p=ψθ​(G,(s,t),(s′,t′))

Constraint Solving

Define the global objective in terms of unary and pairwise scores:

max⁡x∑(s,t)score(s,t)xs,t+∑(s,t),(s′,t′)p((s,t),(s′,t′))xs,txs′,t′\max_x \sum_{(s,t)} \text{score}(s,t) \mathbf x_{s,t} + \sum_{(s,t),(s',t')} p((s,t),(s',t')) \mathbf x_{s,t} \mathbf x_{s',t'}xmax​(s,t)∑​score(s,t)xs,t​+(s,t),(s′,t′)∑​p((s,t),(s′,t′))xs,t​xs′,t′​

where xs,t∈{0,1}x_{s,t}\in\{0,1\}xs,t​∈{0,1} for (s,t)∈Ck(s,t)\in C_k(s,t)∈Ck​.

Define the following linear constraints:

  • One-to-one
    • ∑t∈Txs,t≤1∀s∈S\sum_{t\in T} x_{s,t} \le 1 \quad \forall s\in S∑t∈T​xs,t​≤1∀s∈S
    • ∑s∈Sxs,t≤1∀t∈T\sum_{s\in S} x_{s,t} \le 1 \quad \forall t\in T∑s∈S​xs,t​≤1∀t∈T
  • Type/ontology
    • xs,t=0if  ¬ compatible(s,t)x_{s,t}=0\quad \text{if }\ \lnot\text{ compatible}(s,t)xs,t​=0if  ¬ compatible(s,t)
  • Structural constraints
    • xs,t+xs′,t′≤1x_{s,t} + x_{s',t'} \le 1xs,t​+xs′,t′​≤1 (mutual exclusion)
    • xs,t≤xparent(s),parent(t)x_{s,t}\le x_\text{parent(s),parent(t)}xs,t​≤xparent(s),parent(t)​ (hierarchical consistency)

The final mapping is:

R∗={ (s,t)∈Ck∣xs,t∗=1 }R^*=\set{ (s,t)\in C_k \mid x^*_{s,t} = 1 }R∗={(s,t)∈Ck​∣xs,t∗​=1}
Cohesive Systems

Contact

About

FAQ

© Cohesive Systems 2026

GitHubXLinkedIn