Large Embedding Model (1024 Dimensions)
Our large embedding model (BGE Large) provides the most detailed semantic representations with 1024-dimensional vectors. This model is ideal for applications requiring maximum semantic accuracy and fine-grained understanding.
Model Overview
BGE Large v1.5
- Dimensions: 1024
- Provider: BAAI
- License: Apache 2.0
- Max Tokens: 8192
- Model Card: BAAI/bge-large-en-v1.5
Configuration & Setup
OpenAI SDK Setup (Compatible with ada-002)
import OpenAI from 'openai';
const openai = new OpenAI({
apiKey: 'your-neuredge-key',
baseURL: 'https://api.neuredge.dev/v1/'
});
Native SDK Setup
import { Neuredge } from '@neuredge/sdk';
const neuredge = new Neuredge({
apiKey: 'your-api-key'
});
Real-World Applications & Examples
1. Research Paper Analysis
Academic Literature Matching
// Generate embeddings for research papers
async function generatePaperEmbeddings(papers) {
const response = await openai.embeddings.create({
model: '@cf/baai/bge-large-en-v1.5',
input: papers.map(paper =>
\`Title: ${paper.title}
Abstract: ${paper.abstract}
Keywords: ${paper.keywords.join(', ')}
Methods: ${paper.methodology}\`
)
});
return response.data.map(d => d.embedding);
}
// Find related research
async function findRelatedResearch(paperEmbedding) {
const results = await db.query(`
WITH paper_matches AS (
SELECT
paper_id,
title,
authors,
publication_date,
citations_count,
1 - (embedding <=> $1) as semantic_similarity
FROM research_papers
WHERE 1 - (embedding <=> $1) > 0.8
)
SELECT *,
(semantic_similarity * 0.6 +
ln(citations_count + 1) * 0.4) as relevance_score
FROM paper_matches
ORDER BY relevance_score DESC
LIMIT 20
`, [paperEmbedding]);
return results.rows;
}
Use Cases:
- Literature reviews
- Citation analysis
- Research mapping
- Expert finding
- Methodology comparison
2. Legal Document Analysis
Contract Analysis System
// Generate embeddings for legal documents
async function analyzeLegalDocuments(documents) {
const response = await openai.embeddings.create({
model: '@cf/baai/bge-large-en-v1.5',
input: documents.map(doc =>
\`${doc.title}
${doc.content}
Type: ${doc.documentType}
Jurisdiction: ${doc.jurisdiction}\`
)
});
// Store with legal metadata
await db.query(`
INSERT INTO legal_documents (
doc_id,
embedding,
metadata,
jurisdiction,
document_type,
effective_date
)
SELECT
unnest($1::uuid[]),
unnest($2::vector[]),
unnest($3::jsonb[]),
unnest($4::text[]),
unnest($5::text[]),
unnest($6::date[])
`, [
documents.map(d => d.id),
response.data.map(d => d.embedding),
documents.map(d => d.metadata),
documents.map(d => d.jurisdiction),
documents.map(d => d.documentType),
documents.map(d => d.effectiveDate)
]);
}
Use Cases:
- Contract analysis
- Compliance checking
- Case law research
- Legal precedent search
- Regulatory mapping
3. Clinical Research
Medical Literature Analysis
// Clinical study matching
async function findRelevantStudies(patientProfile) {
const response = await openai.embeddings.create({
model: '@cf/baai/bge-large-en-v1.5',
input: \`Condition: ${patientProfile.condition}
Symptoms: ${patientProfile.symptoms.join(', ')}
Demographics: ${patientProfile.demographics}
Medical History: ${patientProfile.history}\`
});
const results = await db.query(`
SELECT
study_id,
title,
methodology,
outcomes,
publication_date,
1 - (embedding <=> $1) as relevance,
statistical_significance,
sample_size
FROM clinical_studies
WHERE status = 'published'
AND 1 - (embedding <=> $1) > 0.85
ORDER BY
relevance * 0.5 +
statistical_significance * 0.3 +
ln(sample_size) * 0.2 DESC
LIMIT 15
`, [response.data[0].embedding]);
return results.rows;
}
Use Cases:
- Clinical trial matching
- Treatment research
- Patient cohort analysis
- Medical literature review
- Evidence-based medicine
4. Financial Analysis
Advanced Market Analysis
// Generate embeddings for financial reports
async function analyzeFinancialDocuments(reports) {
const response = await openai.embeddings.create({
model: '@cf/baai/bge-large-en-v1.5',
input: reports.map(report =>
\`Company: ${report.companyName}
Report Type: ${report.type}
Period: ${report.period}
Key Metrics: ${report.keyMetrics.join(', ')}
Risk Factors: ${report.riskFactors}
Market Outlook: ${report.outlook}\`
)
});
return response.data.map((d, i) => ({
reportId: reports[i].id,
embedding: d.embedding,
metadata: {
company: reports[i].companyName,
period: reports[i].period,
metrics: reports[i].keyMetrics
}
}));
}
Use Cases:
- Market research
- Investment analysis
- Risk assessment
- Trend identification
- Competitive analysis
Integration Examples
Advanced Research Platform
import express from 'express';
import OpenAI from 'openai';
import { Pool } from 'pg';
import pgvector from 'pgvector/pg';
const app = express();
const pool = new Pool();
await pool.query('CREATE EXTENSION IF NOT EXISTS vector');
const openai = new OpenAI({
apiKey: process.env.NEUREDGE_API_KEY,
baseURL: 'https://api.neuredge.dev/v1/'
});
app.post('/analyze-research', async (req, res) => {
try {
const {
query,
filters = {},
page = 1,
limit = 20,
yearRange = [1900, 2025],
minCitations = 0
} = req.body;
const embedding = await openai.embeddings.create({
model: '@cf/baai/bge-large-en-v1.5',
input: query
});
const offset = (page - 1) * limit;
const results = await pool.query(`
WITH semantic_matches AS (
SELECT
paper_id,
title,
authors,
publication_year,
citations_count,
journal_impact_factor,
1 - (embedding <=> $1) as semantic_score
FROM research_papers
WHERE publication_year BETWEEN $2 AND $3
AND citations_count >= $4
AND field = ANY($5)
)
SELECT *,
(semantic_score * 0.4 +
(citations_count::float / max_citations) * 0.3 +
(journal_impact_factor / max_impact) * 0.3) as final_score
FROM semantic_matches
CROSS JOIN (
SELECT
max(citations_count) as max_citations,
max(journal_impact_factor) as max_impact
FROM semantic_matches
) as metrics
ORDER BY final_score DESC
LIMIT $6 OFFSET $7
`, [
embedding.data[0].embedding,
yearRange[0],
yearRange[1],
minCitations,
filters.fields || ['*'],
limit,
offset
]);
res.json({
results: results.rows,
page,
total: results.rowCount
});
} catch (error) {
res.status(500).json({ error: error.message });
}
});
Best Practices
-
Quality Focus
- Use high-quality input text
- Maintain context completeness
- Consider domain specificity
- Validate embeddings quality
-
System Architecture
- Implement proper indexing
- Consider caching strategies
- Plan for scaling
- Monitor performance
-
Search Implementation
- Use approximate search
- Implement filters
- Consider hybrid approaches
- Optimize thresholds
Token Management
Plan | Monthly Token Quota |
---|---|
Free Tier | 300K tokens |
$29 Plan | 3M tokens |
$49 Plan | 4.5M tokens |
When to Use
✅ Ideal For:
- Research applications
- Legal analysis
- Medical research
- Financial analysis
- Complex semantic tasks
❌ Consider Alternatives When:
- Speed is critical
- Resources are limited
- Basic similarity suffices
- Cost is a major factor
Getting Started
To begin using BGE Large embeddings: