Base Embedding Model (768 Dimensions)

Our base embedding model (BGE Base) offers an excellent balance of semantic richness and efficiency with 768-dimensional vectors. This model is recommended as the default choice for most production applications, providing strong performance across a wide range of use cases.

Model Overview

BGE Base v1.5

Dimensions: 768
Provider: BAAI
License: Apache 2.0
Max Tokens: 8192
Model Card: BAAI/bge-base-en-v1.5

Configuration & Setup

OpenAI SDK Setup (Compatible with ada-002)

import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: 'your-neuredge-key',
  baseURL: 'https://api.neuredge.dev/v1/'
});

Native SDK Setup

import { Neuredge } from '@neuredge/sdk';

const neuredge = new Neuredge({
  apiKey: 'your-api-key'
});

Real-World Applications & Examples

1. Advanced Search & Retrieval

Semantic Search with Hybrid Ranking

// Generate embeddings for documents
async function generateEmbeddings(documents) {
  const response = await openai.embeddings.create({
    model: '@cf/baai/bge-base-en-v1.5',
    input: documents.map(doc => doc.content)
  });
  return response.data.map(d => d.embedding);
}

// Hybrid search implementation
async function hybridSearch(query, documents) {
  // Get query embedding
  const queryEmbedding = await openai.embeddings.create({
    model: '@cf/baai/bge-base-en-v1.5',
    input: query
  });

  // Combine semantic and keyword search
  const results = await db.query(`
    WITH semantic_results AS (
      SELECT id, content, 
        1 - (embedding <=> $1) as semantic_score,
        ts_rank(to_tsvector('english', content), 
                plainto_tsquery('english', $2)) as keyword_score
      FROM documents
    )
    SELECT id, content,
      (semantic_score * 0.7 + keyword_score * 0.3) as final_score
    FROM semantic_results
    ORDER BY final_score DESC
    LIMIT 10
  `, [queryEmbedding.data[0].embedding, query]);

  return results.rows;
}

Use Cases:

Enterprise search systems
Legal document retrieval
Research paper search
Knowledge base systems
Technical documentation search

2. Recommendation Engine

Advanced Content Recommendation

// Generate embeddings for user interactions
async function getUserProfile(userInteractions) {
  const response = await openai.embeddings.create({
    model: '@cf/baai/bge-base-en-v1.5',
    input: userInteractions.map(interaction => 
      \`${interaction.content} ${interaction.category} ${interaction.tags.join(' ')}\`
    )
  });

  // Create user profile by averaging embeddings
  return response.data.reduce((acc, curr, idx, arr) => {
    return acc.map((val, i) => val + curr.embedding[i] / arr.length);
  }, new Array(768).fill(0));
}

// Find personalized recommendations
async function getRecommendations(userProfile) {
  const results = await db.query(`
    SELECT content_id, title, description,
      1 - (content_embedding <=> $1) as relevance_score,
      popularity_score,
      recency_score
    FROM content
    WHERE category = ANY($2)
    ORDER BY (
      relevance_score * 0.6 +
      popularity_score * 0.2 +
      recency_score * 0.2
    ) DESC
    LIMIT 10
  `, [userProfile, userPreferredCategories]);

  return results.rows;
}

Use Cases:

Content recommendations
Personalized feeds
Related articles
Course suggestions
Media recommendations

3. Semantic Analysis

Advanced Text Analysis System

// Multi-aspect semantic analysis
async function analyzeText(text, aspects) {
  // Generate embeddings for text and aspects
  const response = await openai.embeddings.create({
    model: '@cf/baai/bge-base-en-v1.5',
    input: [text, ...aspects]
  });

  const [textEmbedding, ...aspectEmbeddings] = response.data.map(d => d.embedding);

  // Calculate relevance scores for each aspect
  return aspects.map((aspect, idx) => ({
    aspect,
    relevance: cosineSimilarity(textEmbedding, aspectEmbeddings[idx]),
    confidence: normalize(aspectEmbeddings[idx])
  }));
}

// Example usage for content moderation
const analysis = await analyzeText(userContent, [
  'professional tone',
  'technical accuracy',
  'emotional sentiment',
  'controversial content'
]);

Use Cases:

Content moderation
Sentiment analysis
Topic classification
Brand alignment
Quality assessment

4. Knowledge Graph Enhancement

Semantic Knowledge Graph

// Generate embeddings for entities and relationships
async function enrichKnowledgeGraph(entities) {
  const response = await openai.embeddings.create({
    model: '@cf/baai/bge-base-en-v1.5',
    input: entities.map(e => 
      \`${e.name} ${e.description} ${e.relationships.join(' ')}\`
    )
  });

  // Store enriched entities
  await db.query(`
    INSERT INTO knowledge_graph (
      entity_id,
      name,
      embedding,
      metadata
    )
    SELECT 
      unnest($1::uuid[]),
      unnest($2::text[]),
      unnest($3::vector[]),
      unnest($4::jsonb[])
  `, [
    entities.map(e => e.id),
    entities.map(e => e.name),
    response.data.map(d => d.embedding),
    entities.map(e => e.metadata)
  ]);
}

Use Cases:

Knowledge graphs
Entity resolution
Relationship mapping
Data enrichment
Semantic networks

Integration Examples

Express.js with Advanced Vector Search

import express from 'express';
import OpenAI from 'openai';
import { Pool } from 'pg';
import pgvector from 'pgvector/pg';

const app = express();
const pool = new Pool();
await pool.query('CREATE EXTENSION IF NOT EXISTS vector');

const openai = new OpenAI({
  apiKey: process.env.NEUREDGE_API_KEY,
  baseURL: 'https://api.neuredge.dev/v1/'
});

app.post('/semantic-search', async (req, res) => {
  try {
    const { query, filters, page = 1, limit = 10 } = req.body;
    
    const embedding = await openai.embeddings.create({
      model: '@cf/baai/bge-base-en-v1.5',
      input: query
    });

    const offset = (page - 1) * limit;
    const results = await pool.query(`
      SELECT 
        content,
        metadata,
        1 - (embedding <=> $1) as similarity,
        ts_rank(to_tsvector('english', content), 
                plainto_tsquery('english', $2)) as text_rank
      FROM documents
      WHERE category = ANY($3)
        AND 1 - (embedding <=> $1) > 0.7
      ORDER BY (similarity * 0.7 + text_rank * 0.3) DESC
      LIMIT $4 OFFSET $5
    `, [
      embedding.data[0].embedding,
      query,
      filters.categories,
      limit,
      offset
    ]);

    res.json({
      results: results.rows,
      page,
      total: results.rowCount
    });
  } catch (error) {
    res.status(500).json({ error: error.message });
  }
});

Best Practices

Data Quality
- Preprocess text thoroughly
- Handle edge cases
- Normalize inputs
- Maintain context
Performance Optimization
- Use appropriate indexing
- Implement caching
- Batch operations
- Monitor resources
Vector Operations
- Choose right similarity metric
- Normalize vectors when needed
- Use efficient indexes
- Consider approximation methods

Token Management

Plan	Monthly Token Quota
Free Tier	300K tokens
$29 Plan	3M tokens
$49 Plan	4.5M tokens

When to Use

✅ Ideal For:

Production applications
Enterprise search
Recommendation systems
Content analysis
Knowledge management

❌ Consider Alternatives When:

Resource constraints exist
Maximum speed needed
Basic similarity sufficient
Limited storage available

Getting Started

To begin using BGE Base embeddings:

Model Overview​

BGE Base v1.5​

Configuration & Setup​

OpenAI SDK Setup (Compatible with ada-002)​

Native SDK Setup​

Real-World Applications & Examples​

1. Advanced Search & Retrieval​

Semantic Search with Hybrid Ranking​

2. Recommendation Engine​

Advanced Content Recommendation​

3. Semantic Analysis​

Advanced Text Analysis System​

4. Knowledge Graph Enhancement​

Semantic Knowledge Graph​

Integration Examples​

Express.js with Advanced Vector Search​

Best Practices​

Token Management​

When to Use​

Getting Started​

Model Overview

BGE Base v1.5

Configuration & Setup

OpenAI SDK Setup (Compatible with ada-002)

Native SDK Setup

Real-World Applications & Examples

1. Advanced Search & Retrieval

Semantic Search with Hybrid Ranking

2. Recommendation Engine

Advanced Content Recommendation

3. Semantic Analysis

Advanced Text Analysis System

4. Knowledge Graph Enhancement

Semantic Knowledge Graph

Integration Examples

Express.js with Advanced Vector Search

Best Practices

Token Management

When to Use

Getting Started