Bridge

Vec2Vec translation for cross-embedding space queries.

Bridge

Bridge is the cross-embedding translation layer. Query across data indexed by different embedding models without re-indexing, using cycle-consistency trained translation matrices.

The Problem

Enterprise RAG systems accumulate data indexed by different embedding models:

  • OpenAI ada-002: Older Slack messages
  • text-embedding-3-small: Recent documents
  • Cohere embed-v3: Partner data
  • Custom models: Proprietary embeddings

Querying across these requires translating between embedding spaces.

The Solution

Vec2Vec learns bidirectional translation matrices with cycle-consistency training:

graph LR
    subgraph "Source Space"
        X[Embedding X]
    end
    
    subgraph "Translation"
        F[Forward F]
        G[Inverse G]
    end
    
    subgraph "Target Space"
        Y[Embedding Y]
    end
    
    X --> F --> Y
    Y --> G --> X
    
    X -.->|"Should match"| X

Lcycle=xG(F(x))2+yF(G(y))2\mathcal{L}_{cycle} = ||x - G(F(x))||^2 + ||y - F(G(y))||^2

API Endpoints

Translate Embedding

POST /v1/vec2vec/translate
Content-Type: application/json

{
  "vector": [0.1, 0.2, 0.3, ...],
  "source_model": "text-embedding-ada-002",
  "target_model": "text-embedding-3-small"
}

Response:

{
  "vector": [0.15, 0.18, 0.28, ...],
  "source_model": "text-embedding-ada-002",
  "target_model": "text-embedding-3-small",
  "confidence": 0.94,
  "translation_applied": true,
  "translation_id": "xlate-abc123"
}

Train Translation

POST /v1/vec2vec/train
Content-Type: application/json

{
  "source_model": "text-embedding-ada-002",
  "target_model": "text-embedding-3-small",
  "parallel_texts": [
    "The authentication service uses singleton pattern...",
    "We decided to migrate to microservices...",
    "Quarterly report shows 15% growth..."
  ],
  "config": {
    "epochs": 100,
    "learning_rate": 0.001,
    "use_neural": false
  }
}

Response:

{
  "job_id": "train-xyz789",
  "status": "training",
  "estimated_time_seconds": 120
}

Get Training Status

GET /v1/vec2vec/training/{job_id}

Response:

{
  "job_id": "train-xyz789",
  "status": "completed",
  "progress": 1.0,
  "accuracy": 0.94,
  "cycle_loss": 0.032,
  "completed_at": "2026-01-20T02:00:00Z"
}

Get Registry Status

GET /v1/vec2vec/registry

Lists all available translation pairs:

{
  "translations": [
    {
      "source_model": "text-embedding-ada-002",
      "target_model": "text-embedding-3-small",
      "confidence": 0.94,
      "trained_at": "2026-01-15T10:00:00Z",
      "sample_count": 10000
    },
    {
      "source_model": "text-embedding-3-small",
      "target_model": "text-embedding-ada-002",
      "confidence": 0.93,
      "trained_at": "2026-01-15T10:00:00Z",
      "sample_count": 10000
    }
  ]
}

Get Translation Confidence

GET /v1/vec2vec/confidence?source=ada-002&target=3-small

Remove Translation

DELETE /v1/vec2vec/{translation_id}

SDK Usage

TypeScript

import { MetalogueClient } from '@metalogue/sdk';

const client = new MetalogueClient({ apiKey: API_KEY });

// Translation is automatic during queries
// But you can also translate manually:

const translated = await client.translateEmbedding({
  vector: sourceEmbedding,
  sourceModel: 'text-embedding-ada-002',
  targetModel: 'text-embedding-3-small',
});

console.log(`Confidence: ${translated.confidence}`);

Python

from metalogue import MetalogueClient

client = MetalogueClient(api_key=API_KEY)

# Manual translation
translated = await client.translate_embedding(
    vector=source_embedding,
    source_model="text-embedding-ada-002",
    target_model="text-embedding-3-small",
)

print(f"Confidence: {translated.confidence}")

Training Pipeline

Cycle-Consistency Training

Vec2Vec uses cycle-consistency to learn high-quality translations:

  1. Sample parallel texts - Documents that exist in both embedding spaces
  2. Embed with both models - Generate source and target embeddings
  3. Train forward matrix F - Maps source → target
  4. Train inverse matrix G - Maps target → source
  5. Minimize cycle loss - Ensure F(G(y)) ≈ y and G(F(x)) ≈ x

Linear vs Neural

ModeProsCons
LinearFast, interpretable, small modelLimited expressiveness
NeuralHigher accuracy for complex mappingsSlower, larger model

For most model pairs, linear translation achieves 90%+ accuracy.

Training Configuration

{
  "config": {
    "epochs": 100,
    "learning_rate": 0.001,
    "use_neural": false,
    "min_samples": 1000,
    "validation_split": 0.1
  }
}

Confidence Scores

Translation confidence indicates quality:

ConfidenceInterpretation
0.95+Excellent - near-lossless translation
0.85-0.95Good - minor semantic drift
0.70-0.85Fair - use with caution
<0.70Poor - consider retraining

Confidence is computed as:

confidence=1LcycleLbaselineconfidence = 1 - \frac{\mathcal{L}_{cycle}}{\mathcal{L}_{baseline}}

Automatic Translation

During federated queries, Vec2Vec automatically:

  1. Detects the source model for each document
  2. Translates to the query embedding space
  3. Computes similarity in unified space
  4. Reports federation_confidence per result
{
  "items": [
    {
      "content": "...",
      "federation_confidence": 0.94
    }
  ]
}

Pre-Trained Pairs

Metalogue includes pre-trained translations for common model pairs:

SourceTargetConfidence
ada-0023-small0.94
3-smallada-0020.93
ada-0023-large0.91
3-smallcohere-v30.88

Best Practices

  1. Use high-quality parallel data - More diverse samples = better translation
  2. Monitor confidence - Retrain if confidence drops below 0.85
  3. Train bidirectionally - Always train both F and G
  4. Start with linear - Only use neural for complex mappings
  5. Validate with held-out data - Use validation split

Patent Reference

Vec2Vec is covered by Patent #2, Claim 8:

"A method for translating embedding vectors between heterogeneous embedding spaces using cycle-consistency trained bidirectional translation matrices."

Next Steps