Bridge
Vec2Vec translation for cross-embedding space queries.
Bridge
Bridge is the cross-embedding translation layer. Query across data indexed by different embedding models without re-indexing, using cycle-consistency trained translation matrices.
The Problem
Enterprise RAG systems accumulate data indexed by different embedding models:
- OpenAI ada-002: Older Slack messages
- text-embedding-3-small: Recent documents
- Cohere embed-v3: Partner data
- Custom models: Proprietary embeddings
Querying across these requires translating between embedding spaces.
The Solution
Vec2Vec learns bidirectional translation matrices with cycle-consistency training:
graph LR
subgraph "Source Space"
X[Embedding X]
end
subgraph "Translation"
F[Forward F]
G[Inverse G]
end
subgraph "Target Space"
Y[Embedding Y]
end
X --> F --> Y
Y --> G --> X
X -.->|"Should match"| X
API Endpoints
Translate Embedding
POST /v1/vec2vec/translate
Content-Type: application/json
{
"vector": [0.1, 0.2, 0.3, ...],
"source_model": "text-embedding-ada-002",
"target_model": "text-embedding-3-small"
}
Response:
{
"vector": [0.15, 0.18, 0.28, ...],
"source_model": "text-embedding-ada-002",
"target_model": "text-embedding-3-small",
"confidence": 0.94,
"translation_applied": true,
"translation_id": "xlate-abc123"
}
Train Translation
POST /v1/vec2vec/train
Content-Type: application/json
{
"source_model": "text-embedding-ada-002",
"target_model": "text-embedding-3-small",
"parallel_texts": [
"The authentication service uses singleton pattern...",
"We decided to migrate to microservices...",
"Quarterly report shows 15% growth..."
],
"config": {
"epochs": 100,
"learning_rate": 0.001,
"use_neural": false
}
}
Response:
{
"job_id": "train-xyz789",
"status": "training",
"estimated_time_seconds": 120
}
Get Training Status
GET /v1/vec2vec/training/{job_id}
Response:
{
"job_id": "train-xyz789",
"status": "completed",
"progress": 1.0,
"accuracy": 0.94,
"cycle_loss": 0.032,
"completed_at": "2026-01-20T02:00:00Z"
}
Get Registry Status
GET /v1/vec2vec/registry
Lists all available translation pairs:
{
"translations": [
{
"source_model": "text-embedding-ada-002",
"target_model": "text-embedding-3-small",
"confidence": 0.94,
"trained_at": "2026-01-15T10:00:00Z",
"sample_count": 10000
},
{
"source_model": "text-embedding-3-small",
"target_model": "text-embedding-ada-002",
"confidence": 0.93,
"trained_at": "2026-01-15T10:00:00Z",
"sample_count": 10000
}
]
}
Get Translation Confidence
GET /v1/vec2vec/confidence?source=ada-002&target=3-small
Remove Translation
DELETE /v1/vec2vec/{translation_id}
SDK Usage
TypeScript
import { MetalogueClient } from '@metalogue/sdk';
const client = new MetalogueClient({ apiKey: API_KEY });
// Translation is automatic during queries
// But you can also translate manually:
const translated = await client.translateEmbedding({
vector: sourceEmbedding,
sourceModel: 'text-embedding-ada-002',
targetModel: 'text-embedding-3-small',
});
console.log(`Confidence: ${translated.confidence}`);
Python
from metalogue import MetalogueClient
client = MetalogueClient(api_key=API_KEY)
# Manual translation
translated = await client.translate_embedding(
vector=source_embedding,
source_model="text-embedding-ada-002",
target_model="text-embedding-3-small",
)
print(f"Confidence: {translated.confidence}")
Training Pipeline
Cycle-Consistency Training
Vec2Vec uses cycle-consistency to learn high-quality translations:
- Sample parallel texts - Documents that exist in both embedding spaces
- Embed with both models - Generate source and target embeddings
- Train forward matrix F - Maps source → target
- Train inverse matrix G - Maps target → source
- Minimize cycle loss - Ensure F(G(y)) ≈ y and G(F(x)) ≈ x
Linear vs Neural
| Mode | Pros | Cons |
|---|---|---|
| Linear | Fast, interpretable, small model | Limited expressiveness |
| Neural | Higher accuracy for complex mappings | Slower, larger model |
For most model pairs, linear translation achieves 90%+ accuracy.
Training Configuration
{
"config": {
"epochs": 100,
"learning_rate": 0.001,
"use_neural": false,
"min_samples": 1000,
"validation_split": 0.1
}
}
Confidence Scores
Translation confidence indicates quality:
| Confidence | Interpretation |
|---|---|
| 0.95+ | Excellent - near-lossless translation |
| 0.85-0.95 | Good - minor semantic drift |
| 0.70-0.85 | Fair - use with caution |
| <0.70 | Poor - consider retraining |
Confidence is computed as:
Automatic Translation
During federated queries, Vec2Vec automatically:
- Detects the source model for each document
- Translates to the query embedding space
- Computes similarity in unified space
- Reports
federation_confidenceper result
{
"items": [
{
"content": "...",
"federation_confidence": 0.94
}
]
}
Pre-Trained Pairs
Metalogue includes pre-trained translations for common model pairs:
| Source | Target | Confidence |
|---|---|---|
| ada-002 | 3-small | 0.94 |
| 3-small | ada-002 | 0.93 |
| ada-002 | 3-large | 0.91 |
| 3-small | cohere-v3 | 0.88 |
Best Practices
- Use high-quality parallel data - More diverse samples = better translation
- Monitor confidence - Retrain if confidence drops below 0.85
- Train bidirectionally - Always train both F and G
- Start with linear - Only use neural for complex mappings
- Validate with held-out data - Use validation split
Patent Reference
Vec2Vec is covered by Patent #2, Claim 8:
"A method for translating embedding vectors between heterogeneous embedding spaces using cycle-consistency trained bidirectional translation matrices."
