Overview
Our text generation capabilities provide state-of-the-art language models for a wide range of applications. With OpenAI-compatible endpoints and multiple model sizes, we offer the flexibility to choose the right model for your specific needs.
OpenAI Compatibility
Our text generation models are fully compatible with OpenAI's Chat API, allowing for seamless migration from OpenAI's GPT models. Simply update the base URL and model name:
import OpenAI from 'openai';
const openai = new OpenAI({
apiKey: 'your-neuredge-key',
baseURL: 'https://api.neuredge.dev/v1/'
});
const completion = await openai.chat.completions.create({
model: '@cf/meta/llama-3.1-70b-instruct', // Use any Neuredge model
messages: [
{role: 'system', content: 'You are a helpful assistant.'},
{role: 'user', content: 'What is quantum computing?'}
]
});
Available Models
Category | Parameters | Models | Use Case | Performance |
---|---|---|---|---|
Base Models | ≤3B | TinyLlama, Phi-2, Gemma-2B | Rapid prototyping, Development | Good |
Small Models | 3.1B-8B | Llama-3.1-8B, Mistral-7B, DeepSeek Coder | Production applications | Better |
Medium Models | 8.1B-20B | Qwen-14B, DeepSeek Math | Specialized tasks | Great |
XLarge Models | 40B+ | Llama-3.1-70B | Maximum performance | Best |
Model Selection Guide
-
Base Models (≤3B)
- Ideal for: Development, testing, prototyping
- Best for: Quick iterations, simple tasks
- Trade-offs: Lower accuracy for faster speed
-
Small Models (3.1B-8B)
- Ideal for: Production applications
- Best for: Most business use cases
- Trade-offs: Balanced performance and speed
-
Medium Models (8.1B-20B)
- Ideal for: Specialized tasks
- Best for: Domain-specific applications
- Trade-offs: Better quality with moderate speed
-
XLarge Models (40B+)
- Ideal for: Maximum performance needs
- Best for: Complex reasoning, research
- Trade-offs: Highest quality but slower
Core Capabilities
1. Chat Completions
const response = await openai.chat.completions.create({
model: '@cf/meta/llama-3.1-8b-instruct',
messages: [
{
role: 'system',
content: 'You are a knowledgeable assistant specializing in technology.'
},
{
role: 'user',
content: 'Explain how blockchain works in simple terms.'
}
],
temperature: 0.7,
max_tokens: 500
});
2. Streaming Responses
const stream = await openai.chat.completions.create({
model: '@cf/meta/llama-3.1-8b-instruct',
messages: [{
role: 'user',
content: 'Write a story about space exploration'
}],
stream: true
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
3. Function Calling (Available on select models)
const response = await openai.chat.completions.create({
model: '@cf/meta/llama-3.1-70b-instruct',
messages: [{
role: 'user',
content: 'What\'s the weather like in London?'
}],
functions: [{
name: 'get_weather',
description: 'Get current weather for a location',
parameters: {
type: 'object',
properties: {
location: {
type: 'string',
description: 'City name'
}
},
required: ['location']
}
}]
});
Integration Examples
Express.js Chat API
import express from 'express';
import OpenAI from 'openai';
const app = express();
app.use(express.json());
const openai = new OpenAI({
apiKey: process.env.NEUREDGE_API_KEY,
baseURL: 'https://api.neuredge.dev/v1/'
});
app.post('/chat', async (req, res) => {
try {
const { messages, stream = false } = req.body;
if (stream) {
const stream = await openai.chat.completions.create({
model: '@cf/meta/llama-3.1-8b-instruct',
messages,
stream: true
});
res.setHeader('Content-Type', 'text/event-stream');
for await (const chunk of stream) {
res.write(`data: ${JSON.stringify(chunk)}\n\n`);
}
res.end();
} else {
const completion = await openai.chat.completions.create({
model: '@cf/meta/llama-3.1-8b-instruct',
messages
});
res.json(completion);
}
} catch (error) {
res.status(500).json({ error: error.message });
}
});
Next.js API Route with Streaming
// pages/api/chat.js
import { OpenAIStream, StreamingTextResponse } from 'ai';
import OpenAI from 'openai';
const openai = new OpenAI({
apiKey: process.env.NEUREDGE_API_KEY,
baseURL: 'https://api.neuredge.dev/v1/'
});
export default async function handler(req, res) {
const { messages } = await req.json();
const response = await openai.chat.completions.create({
model: '@cf/meta/llama-3.1-8b-instruct',
messages,
stream: true
});
const stream = OpenAIStream(response);
return new StreamingTextResponse(stream);
}
Best Practices
-
Model Selection
- Choose based on task complexity
- Consider latency requirements
- Balance cost and performance
- Test different sizes
-
Prompt Engineering
- Be specific and clear
- Use system messages effectively
- Include examples when needed
- Consider temperature setting
-
Performance Optimization
- Implement streaming for long responses
- Cache common responses
- Use appropriate max_tokens
- Monitor token usage
-
Error Handling
- Implement retry logic
- Handle rate limits
- Provide fallbacks
- Monitor responses
Token Management
Plan | Monthly Token Quota (Base Models) |
---|---|
Free Tier | 300K tokens |
$29 Plan | 3M tokens |
$49 Plan | 4.5M tokens |
Note: Token quotas vary by model size. See individual model pages for details.
When to Use Each Model Size
Base Models (≤3B)
✅ Ideal For:
- Development and testing
- Quick prototypes
- Simple interactions
- Learning and exploration
Small Models (3.1B-8B)
✅ Ideal For:
- Production applications
- Customer support
- Content generation
- General business use
Medium Models (8.1B-20B)
✅ Ideal For:
- Specialized tasks
- Technical content
- Mathematical computations
- Domain expertise
XLarge Models (40B+)
✅ Ideal For:
- Complex reasoning
- Research applications
- Professional content
- Maximum accuracy needs
Getting Started
- Install the SDK:
npm install @neuredge/sdk
# or
pip install neuredge-sdk
- Initialize the client:
// Using OpenAI SDK
import OpenAI from 'openai';
const openai = new OpenAI({
apiKey: 'your-neuredge-key',
baseURL: 'https://api.neuredge.dev/v1/'
});
// Or using native SDK
import { Neuredge } from '@neuredge/sdk';
const neuredge = new Neuredge({
apiKey: 'your-api-key'
});
Learn more: