- Architecture
- Formalization
- Formulation
- Candidate Generation
- Feature Extraction
- Scoring
- Reranking
- Constraint Solving
Architecture
The ARI relation inference architecture is a cascade of both deterministic and probabilistic models.

Formalization
ARI solves a structured prediction problem over a bipartite graph. This is MAP inference in a constrained graphical model.
Formulation
Let the sets of source and target field paths be denoted as:
A mapping (relation) is a subset of their Cartesian product:
Ari seeks an optimal relation :
Candidate Generation
For each source field generate a candidate set:
thus forming the candidate relation:
Feature Extraction
Each candidate pair is mapped to a feature vector:
and denote the feature vector associated to candidate pair :
Features may include:
- Lexical similarity
- Structural relations
- Embedding similarity (bi-encoder):
Scoring
Assign a unary score to each candidate:
Common forms:
- Linear:
- GBDT
We retain the top- candidates for each :
Reranking
Define pairwise scoring over candidate assignments:
This captures structural consistency:
- Schema constraints
- Co-occurence
- Graph compatibility
Examples:
- Cross-encoder:
- CRF/GNN:
Constraint Solving
Define the global objective in terms of unary and pairwise scores:
where for .
Define the following linear constraints:
- One-to-one
- Type/ontology
- Structural constraints
- (mutual exclusion)
- (hierarchical consistency)
The final mapping is: