Bridge

Bridge is the cross-embedding translation layer. Query across data indexed by different embedding models without re-indexing, using cycle-consistency trained translation matrices.

The Problem

Enterprise RAG systems accumulate data indexed by different embedding models:

OpenAI ada-002: Older Slack messages
text-embedding-3-small: Recent documents
Cohere embed-v3: Partner data
Custom models: Proprietary embeddings

Querying across these requires translating between embedding spaces.

The Solution

Vec2Vec learns bidirectional translation matrices with cycle-consistency training:

graph LR
    subgraph "Source Space"
        X[Embedding X]
    end
    
    subgraph "Translation"
        F[Forward F]
        G[Inverse G]
    end
    
    subgraph "Target Space"
        Y[Embedding Y]
    end
    
    X --> F --> Y
    Y --> G --> X
    
    X -.->|"Should match"| X

$\mathcal{L}_{cycle} = ||x - G(F(x))||^2 + ||y - F(G(y))||^2$

API Endpoints

Translate Embedding

POST /v1/vec2vec/translate
Content-Type: application/json

{
  "vector": [0.1, 0.2, 0.3, ...],
  "source_model": "text-embedding-ada-002",
  "target_model": "text-embedding-3-small"
}

Response:

{
  "vector": [0.15, 0.18, 0.28, ...],
  "source_model": "text-embedding-ada-002",
  "target_model": "text-embedding-3-small",
  "confidence": 0.94,
  "translation_applied": true,
  "translation_id": "xlate-abc123"
}

Train Translation

POST /v1/vec2vec/train
Content-Type: application/json

{
  "source_model": "text-embedding-ada-002",
  "target_model": "text-embedding-3-small",
  "parallel_texts": [
    "The authentication service uses singleton pattern...",
    "We decided to migrate to microservices...",
    "Quarterly report shows 15% growth..."
  ],
  "config": {
    "epochs": 100,
    "learning_rate": 0.001,
    "use_neural": false
  }
}

Response:

{
  "job_id": "train-xyz789",
  "status": "training",
  "estimated_time_seconds": 120
}

Get Training Status

GET /v1/vec2vec/training/{job_id}

Response:

{
  "job_id": "train-xyz789",
  "status": "completed",
  "progress": 1.0,
  "accuracy": 0.94,
  "cycle_loss": 0.032,
  "completed_at": "2026-01-20T02:00:00Z"
}

Get Registry Status

GET /v1/vec2vec/registry

Lists all available translation pairs:

{
  "translations": [
    {
      "source_model": "text-embedding-ada-002",
      "target_model": "text-embedding-3-small",
      "confidence": 0.94,
      "trained_at": "2026-01-15T10:00:00Z",
      "sample_count": 10000
    },
    {
      "source_model": "text-embedding-3-small",
      "target_model": "text-embedding-ada-002",
      "confidence": 0.93,
      "trained_at": "2026-01-15T10:00:00Z",
      "sample_count": 10000
    }
  ]
}

Get Translation Confidence

GET /v1/vec2vec/confidence?source=ada-002&target=3-small

Remove Translation

DELETE /v1/vec2vec/{translation_id}

SDK Usage

TypeScript

import { MetalogueClient } from '@metalogue/sdk';

const client = new MetalogueClient({ apiKey: API_KEY });

// Translation is automatic during queries
// But you can also translate manually:

const translated = await client.translateEmbedding({
  vector: sourceEmbedding,
  sourceModel: 'text-embedding-ada-002',
  targetModel: 'text-embedding-3-small',
});

console.log(`Confidence: ${translated.confidence}`);

Python

from metalogue import MetalogueClient

client = MetalogueClient(api_key=API_KEY)

# Manual translation
translated = await client.translate_embedding(
    vector=source_embedding,
    source_model="text-embedding-ada-002",
    target_model="text-embedding-3-small",
)

print(f"Confidence: {translated.confidence}")

Training Pipeline

Cycle-Consistency Training

Vec2Vec uses cycle-consistency to learn high-quality translations:

Sample parallel texts - Documents that exist in both embedding spaces
Embed with both models - Generate source and target embeddings
Train forward matrix F - Maps source → target
Train inverse matrix G - Maps target → source
Minimize cycle loss - Ensure F(G(y)) ≈ y and G(F(x)) ≈ x

Linear vs Neural

Mode	Pros	Cons
Linear	Fast, interpretable, small model	Limited expressiveness
Neural	Higher accuracy for complex mappings	Slower, larger model

For most model pairs, linear translation achieves 90%+ accuracy.

Training Configuration

{
  "config": {
    "epochs": 100,
    "learning_rate": 0.001,
    "use_neural": false,
    "min_samples": 1000,
    "validation_split": 0.1
  }
}

Confidence Scores

Translation confidence indicates quality:

Confidence	Interpretation
0.95+	Excellent - near-lossless translation
0.85-0.95	Good - minor semantic drift
0.70-0.85	Fair - use with caution
<0.70	Poor - consider retraining

Confidence is computed as:

$confidence = 1 - \frac{\mathcal{L}_{cycle}}{\mathcal{L}_{baseline}}$

Automatic Translation

During federated queries, Vec2Vec automatically:

Detects the source model for each document
Translates to the query embedding space
Computes similarity in unified space
Reports federation_confidence per result

{
  "items": [
    {
      "content": "...",
      "federation_confidence": 0.94
    }
  ]
}

Pre-Trained Pairs

Metalogue includes pre-trained translations for common model pairs:

Source	Target	Confidence
ada-002	3-small	0.94
3-small	ada-002	0.93
ada-002	3-large	0.91
3-small	cohere-v3	0.88

Best Practices

Use high-quality parallel data - More diverse samples = better translation
Monitor confidence - Retrain if confidence drops below 0.85
Train bidirectionally - Always train both F and G
Start with linear - Only use neural for complex mappings
Validate with held-out data - Use validation split

Patent Reference

Vec2Vec is covered by Patent #2, Claim 8:

"A method for translating embedding vectors between heterogeneous embedding spaces using cycle-consistency trained bidirectional translation matrices."

Bridge

Bridge

The Problem

The Solution

API Endpoints

Translate Embedding

Train Translation

Get Training Status

Get Registry Status

Get Translation Confidence

Remove Translation

SDK Usage

TypeScript

Python

Training Pipeline

Cycle-Consistency Training

Linear vs Neural

Training Configuration

Confidence Scores

Automatic Translation

Pre-Trained Pairs

Best Practices

Patent Reference

Next Steps