Skip to main content

Base Embedding Model (768 Dimensions)

Our base embedding model (BGE Base) offers an excellent balance of semantic richness and efficiency with 768-dimensional vectors. This model is recommended as the default choice for most production applications, providing strong performance across a wide range of use cases.

Model Overview

BGE Base v1.5

Configuration & Setup

OpenAI SDK Setup (Compatible with ada-002)

import OpenAI from 'openai';

const openai = new OpenAI({
apiKey: 'your-neuredge-key',
baseURL: 'https://api.neuredge.dev/v1/'
});

Native SDK Setup

import { Neuredge } from '@neuredge/sdk';

const neuredge = new Neuredge({
apiKey: 'your-api-key'
});

Real-World Applications & Examples

1. Advanced Search & Retrieval

Semantic Search with Hybrid Ranking

// Generate embeddings for documents
async function generateEmbeddings(documents) {
const response = await openai.embeddings.create({
model: '@cf/baai/bge-base-en-v1.5',
input: documents.map(doc => doc.content)
});
return response.data.map(d => d.embedding);
}

// Hybrid search implementation
async function hybridSearch(query, documents) {
// Get query embedding
const queryEmbedding = await openai.embeddings.create({
model: '@cf/baai/bge-base-en-v1.5',
input: query
});

// Combine semantic and keyword search
const results = await db.query(`
WITH semantic_results AS (
SELECT id, content,
1 - (embedding <=> $1) as semantic_score,
ts_rank(to_tsvector('english', content),
plainto_tsquery('english', $2)) as keyword_score
FROM documents
)
SELECT id, content,
(semantic_score * 0.7 + keyword_score * 0.3) as final_score
FROM semantic_results
ORDER BY final_score DESC
LIMIT 10
`, [queryEmbedding.data[0].embedding, query]);

return results.rows;
}

Use Cases:

  • Enterprise search systems
  • Legal document retrieval
  • Research paper search
  • Knowledge base systems
  • Technical documentation search

2. Recommendation Engine

Advanced Content Recommendation

// Generate embeddings for user interactions
async function getUserProfile(userInteractions) {
const response = await openai.embeddings.create({
model: '@cf/baai/bge-base-en-v1.5',
input: userInteractions.map(interaction =>
\`${interaction.content} ${interaction.category} ${interaction.tags.join(' ')}\`
)
});

// Create user profile by averaging embeddings
return response.data.reduce((acc, curr, idx, arr) => {
return acc.map((val, i) => val + curr.embedding[i] / arr.length);
}, new Array(768).fill(0));
}

// Find personalized recommendations
async function getRecommendations(userProfile) {
const results = await db.query(`
SELECT content_id, title, description,
1 - (content_embedding <=> $1) as relevance_score,
popularity_score,
recency_score
FROM content
WHERE category = ANY($2)
ORDER BY (
relevance_score * 0.6 +
popularity_score * 0.2 +
recency_score * 0.2
) DESC
LIMIT 10
`, [userProfile, userPreferredCategories]);

return results.rows;
}

Use Cases:

  • Content recommendations
  • Personalized feeds
  • Related articles
  • Course suggestions
  • Media recommendations

3. Semantic Analysis

Advanced Text Analysis System

// Multi-aspect semantic analysis
async function analyzeText(text, aspects) {
// Generate embeddings for text and aspects
const response = await openai.embeddings.create({
model: '@cf/baai/bge-base-en-v1.5',
input: [text, ...aspects]
});

const [textEmbedding, ...aspectEmbeddings] = response.data.map(d => d.embedding);

// Calculate relevance scores for each aspect
return aspects.map((aspect, idx) => ({
aspect,
relevance: cosineSimilarity(textEmbedding, aspectEmbeddings[idx]),
confidence: normalize(aspectEmbeddings[idx])
}));
}

// Example usage for content moderation
const analysis = await analyzeText(userContent, [
'professional tone',
'technical accuracy',
'emotional sentiment',
'controversial content'
]);

Use Cases:

  • Content moderation
  • Sentiment analysis
  • Topic classification
  • Brand alignment
  • Quality assessment

4. Knowledge Graph Enhancement

Semantic Knowledge Graph

// Generate embeddings for entities and relationships
async function enrichKnowledgeGraph(entities) {
const response = await openai.embeddings.create({
model: '@cf/baai/bge-base-en-v1.5',
input: entities.map(e =>
\`${e.name} ${e.description} ${e.relationships.join(' ')}\`
)
});

// Store enriched entities
await db.query(`
INSERT INTO knowledge_graph (
entity_id,
name,
embedding,
metadata
)
SELECT
unnest($1::uuid[]),
unnest($2::text[]),
unnest($3::vector[]),
unnest($4::jsonb[])
`, [
entities.map(e => e.id),
entities.map(e => e.name),
response.data.map(d => d.embedding),
entities.map(e => e.metadata)
]);
}

Use Cases:

  • Knowledge graphs
  • Entity resolution
  • Relationship mapping
  • Data enrichment
  • Semantic networks

Integration Examples

import express from 'express';
import OpenAI from 'openai';
import { Pool } from 'pg';
import pgvector from 'pgvector/pg';

const app = express();
const pool = new Pool();
await pool.query('CREATE EXTENSION IF NOT EXISTS vector');

const openai = new OpenAI({
apiKey: process.env.NEUREDGE_API_KEY,
baseURL: 'https://api.neuredge.dev/v1/'
});

app.post('/semantic-search', async (req, res) => {
try {
const { query, filters, page = 1, limit = 10 } = req.body;

const embedding = await openai.embeddings.create({
model: '@cf/baai/bge-base-en-v1.5',
input: query
});

const offset = (page - 1) * limit;
const results = await pool.query(`
SELECT
content,
metadata,
1 - (embedding <=> $1) as similarity,
ts_rank(to_tsvector('english', content),
plainto_tsquery('english', $2)) as text_rank
FROM documents
WHERE category = ANY($3)
AND 1 - (embedding <=> $1) > 0.7
ORDER BY (similarity * 0.7 + text_rank * 0.3) DESC
LIMIT $4 OFFSET $5
`, [
embedding.data[0].embedding,
query,
filters.categories,
limit,
offset
]);

res.json({
results: results.rows,
page,
total: results.rowCount
});
} catch (error) {
res.status(500).json({ error: error.message });
}
});

Best Practices

  1. Data Quality

    • Preprocess text thoroughly
    • Handle edge cases
    • Normalize inputs
    • Maintain context
  2. Performance Optimization

    • Use appropriate indexing
    • Implement caching
    • Batch operations
    • Monitor resources
  3. Vector Operations

    • Choose right similarity metric
    • Normalize vectors when needed
    • Use efficient indexes
    • Consider approximation methods

Token Management

PlanMonthly Token Quota
Free Tier300K tokens
$29 Plan3M tokens
$49 Plan4.5M tokens

When to Use

Ideal For:

  • Production applications
  • Enterprise search
  • Recommendation systems
  • Content analysis
  • Knowledge management

Consider Alternatives When:

  • Resource constraints exist
  • Maximum speed needed
  • Basic similarity sufficient
  • Limited storage available

Getting Started

To begin using BGE Base embeddings: