Skip to main content

Large Embedding Model (1024 Dimensions)

Our large embedding model (BGE Large) provides the most detailed semantic representations with 1024-dimensional vectors. This model is ideal for applications requiring maximum semantic accuracy and fine-grained understanding.

Model Overview

BGE Large v1.5

Configuration & Setup

OpenAI SDK Setup (Compatible with ada-002)

import OpenAI from 'openai';

const openai = new OpenAI({
apiKey: 'your-neuredge-key',
baseURL: 'https://api.neuredge.dev/v1/'
});

Native SDK Setup

import { Neuredge } from '@neuredge/sdk';

const neuredge = new Neuredge({
apiKey: 'your-api-key'
});

Real-World Applications & Examples

1. Research Paper Analysis

Academic Literature Matching

// Generate embeddings for research papers
async function generatePaperEmbeddings(papers) {
const response = await openai.embeddings.create({
model: '@cf/baai/bge-large-en-v1.5',
input: papers.map(paper =>
\`Title: ${paper.title}
Abstract: ${paper.abstract}
Keywords: ${paper.keywords.join(', ')}
Methods: ${paper.methodology}\`
)
});
return response.data.map(d => d.embedding);
}

// Find related research
async function findRelatedResearch(paperEmbedding) {
const results = await db.query(`
WITH paper_matches AS (
SELECT
paper_id,
title,
authors,
publication_date,
citations_count,
1 - (embedding <=> $1) as semantic_similarity
FROM research_papers
WHERE 1 - (embedding <=> $1) > 0.8
)
SELECT *,
(semantic_similarity * 0.6 +
ln(citations_count + 1) * 0.4) as relevance_score
FROM paper_matches
ORDER BY relevance_score DESC
LIMIT 20
`, [paperEmbedding]);

return results.rows;
}

Use Cases:

  • Literature reviews
  • Citation analysis
  • Research mapping
  • Expert finding
  • Methodology comparison

Contract Analysis System

// Generate embeddings for legal documents
async function analyzeLegalDocuments(documents) {
const response = await openai.embeddings.create({
model: '@cf/baai/bge-large-en-v1.5',
input: documents.map(doc =>
\`${doc.title}
${doc.content}
Type: ${doc.documentType}
Jurisdiction: ${doc.jurisdiction}\`
)
});

// Store with legal metadata
await db.query(`
INSERT INTO legal_documents (
doc_id,
embedding,
metadata,
jurisdiction,
document_type,
effective_date
)
SELECT
unnest($1::uuid[]),
unnest($2::vector[]),
unnest($3::jsonb[]),
unnest($4::text[]),
unnest($5::text[]),
unnest($6::date[])
`, [
documents.map(d => d.id),
response.data.map(d => d.embedding),
documents.map(d => d.metadata),
documents.map(d => d.jurisdiction),
documents.map(d => d.documentType),
documents.map(d => d.effectiveDate)
]);
}

Use Cases:

  • Contract analysis
  • Compliance checking
  • Case law research
  • Legal precedent search
  • Regulatory mapping

3. Clinical Research

Medical Literature Analysis

// Clinical study matching
async function findRelevantStudies(patientProfile) {
const response = await openai.embeddings.create({
model: '@cf/baai/bge-large-en-v1.5',
input: \`Condition: ${patientProfile.condition}
Symptoms: ${patientProfile.symptoms.join(', ')}
Demographics: ${patientProfile.demographics}
Medical History: ${patientProfile.history}\`
});

const results = await db.query(`
SELECT
study_id,
title,
methodology,
outcomes,
publication_date,
1 - (embedding <=> $1) as relevance,
statistical_significance,
sample_size
FROM clinical_studies
WHERE status = 'published'
AND 1 - (embedding <=> $1) > 0.85
ORDER BY
relevance * 0.5 +
statistical_significance * 0.3 +
ln(sample_size) * 0.2 DESC
LIMIT 15
`, [response.data[0].embedding]);

return results.rows;
}

Use Cases:

  • Clinical trial matching
  • Treatment research
  • Patient cohort analysis
  • Medical literature review
  • Evidence-based medicine

4. Financial Analysis

Advanced Market Analysis

// Generate embeddings for financial reports
async function analyzeFinancialDocuments(reports) {
const response = await openai.embeddings.create({
model: '@cf/baai/bge-large-en-v1.5',
input: reports.map(report =>
\`Company: ${report.companyName}
Report Type: ${report.type}
Period: ${report.period}
Key Metrics: ${report.keyMetrics.join(', ')}
Risk Factors: ${report.riskFactors}
Market Outlook: ${report.outlook}\`
)
});

return response.data.map((d, i) => ({
reportId: reports[i].id,
embedding: d.embedding,
metadata: {
company: reports[i].companyName,
period: reports[i].period,
metrics: reports[i].keyMetrics
}
}));
}

Use Cases:

  • Market research
  • Investment analysis
  • Risk assessment
  • Trend identification
  • Competitive analysis

Integration Examples

Advanced Research Platform

import express from 'express';
import OpenAI from 'openai';
import { Pool } from 'pg';
import pgvector from 'pgvector/pg';

const app = express();
const pool = new Pool();
await pool.query('CREATE EXTENSION IF NOT EXISTS vector');

const openai = new OpenAI({
apiKey: process.env.NEUREDGE_API_KEY,
baseURL: 'https://api.neuredge.dev/v1/'
});

app.post('/analyze-research', async (req, res) => {
try {
const {
query,
filters = {},
page = 1,
limit = 20,
yearRange = [1900, 2025],
minCitations = 0
} = req.body;

const embedding = await openai.embeddings.create({
model: '@cf/baai/bge-large-en-v1.5',
input: query
});

const offset = (page - 1) * limit;
const results = await pool.query(`
WITH semantic_matches AS (
SELECT
paper_id,
title,
authors,
publication_year,
citations_count,
journal_impact_factor,
1 - (embedding <=> $1) as semantic_score
FROM research_papers
WHERE publication_year BETWEEN $2 AND $3
AND citations_count >= $4
AND field = ANY($5)
)
SELECT *,
(semantic_score * 0.4 +
(citations_count::float / max_citations) * 0.3 +
(journal_impact_factor / max_impact) * 0.3) as final_score
FROM semantic_matches
CROSS JOIN (
SELECT
max(citations_count) as max_citations,
max(journal_impact_factor) as max_impact
FROM semantic_matches
) as metrics
ORDER BY final_score DESC
LIMIT $6 OFFSET $7
`, [
embedding.data[0].embedding,
yearRange[0],
yearRange[1],
minCitations,
filters.fields || ['*'],
limit,
offset
]);

res.json({
results: results.rows,
page,
total: results.rowCount
});
} catch (error) {
res.status(500).json({ error: error.message });
}
});

Best Practices

  1. Quality Focus

    • Use high-quality input text
    • Maintain context completeness
    • Consider domain specificity
    • Validate embeddings quality
  2. System Architecture

    • Implement proper indexing
    • Consider caching strategies
    • Plan for scaling
    • Monitor performance
  3. Search Implementation

    • Use approximate search
    • Implement filters
    • Consider hybrid approaches
    • Optimize thresholds

Token Management

PlanMonthly Token Quota
Free Tier300K tokens
$29 Plan3M tokens
$49 Plan4.5M tokens

When to Use

Ideal For:

  • Research applications
  • Legal analysis
  • Medical research
  • Financial analysis
  • Complex semantic tasks

Consider Alternatives When:

  • Speed is critical
  • Resources are limited
  • Basic similarity suffices
  • Cost is a major factor

Getting Started

To begin using BGE Large embeddings: