Skip to main content

Small Models (3.1B-8B Parameters)

Our small models provide an excellent balance of performance and efficiency. With 3.1B to 8B parameters, these models offer strong capabilities while maintaining fast inference speeds and reasonable resource requirements.

Available Models

Llama-3.1-8B

  • Parameters: 8B
  • Context Window: 8192 tokens
  • Provider: Meta
  • License: Meta Llama 3 License
  • Key Features:
    • Multilingual support
    • Efficient architecture
    • Balanced performance
    • JSON mode available

Mistral-7B

  • Parameters: 7B
  • Context Window: 8192 tokens
  • Provider: Mistral AI
  • License: Apache 2.0
  • Key Features:
    • Strong few-shot learning
    • Efficient inference
    • General purpose capabilities

DeepSeek Coder 6.7B

  • Parameters: 6.7B
  • Training: 87% code, 13% natural language
  • Provider: DeepSeek AI
  • License: Apache 2.0
  • Key Features:
    • Code specialization
    • Multiple programming languages
    • Technical documentation generation

Configuration & Setup

OpenAI SDK Setup

import OpenAI from 'openai';

const openai = new OpenAI({
apiKey: 'your-neuredge-key',
baseURL: 'https://api.neuredge.dev/v1/'
});

Native SDK Setup

import { Neuredge } from '@neuredge/sdk';

const neuredge = new Neuredge({
apiKey: 'your-api-key'
});

Real-World Applications & Examples

1. Customer Support Automation

Support Bot Implementation

const response = await openai.chat.completions.create({
model: '@cf/meta/llama-3.1-8b-instruct',
messages: [
{
role: 'system',
content: \`You are a customer support agent for an e-commerce platform.
Be concise, helpful, and professional.
Current return policy: 30-day returns for unused items.\`
},
{
role: 'user',
content: "I received my order yesterday but it's the wrong size. What should I do?"
}
],
temperature: 0.7,
max_tokens: 200
});

Use Cases:

  • 24/7 customer support
  • FAQ automation
  • Order status inquiries
  • Return processing assistance
  • Product recommendations

2. Code Generation & Documentation

Code Assistant

const response = await openai.chat.completions.create({
model: '@cf/deepseek-coder-6.7b-base',
messages: [
{
role: 'system',
content: 'You are a coding assistant. Provide clean, well-documented code with explanations.'
},
{
role: 'user',
content: \`Create a React component for a responsive navigation bar with:
- Mobile hamburger menu
- Dark/light theme toggle
- User profile dropdown\`
}
],
temperature: 0.3,
max_tokens: 500
});

Use Cases:

  • Code completion
  • Documentation generation
  • Code review assistance
  • Bug fixing suggestions
  • Architecture patterns

3. Content Optimization

SEO Content Enhancement

const response = await openai.chat.completions.create({
model: '@cf/mistral/mistral-7b-instruct-v0.2',
messages: [
{
role: 'system',
content: 'You are an SEO expert. Optimize content while maintaining readability and value.'
},
{
role: 'user',
content: \`Optimize this product description for SEO:
${originalDescription}
Target keywords: ergonomic office chair, lumbar support, adjustable height\`
}
],
temperature: 0.6,
max_tokens: 300
});

Use Cases:

  • Content optimization
  • Meta description generation
  • Keyword integration
  • Title tag creation
  • Product descriptions

4. Data Analysis & Reporting

Report Generation

const response = await openai.chat.completions.create({
model: '@cf/meta/llama-3.1-8b-instruct',
messages: [
{
role: 'system',
content: 'You are a data analyst generating insights from sales data.'
},
{
role: 'user',
content: \`Analyze this monthly sales data and create a summary report:
${salesData}
Focus on key trends, anomalies, and recommendations.\`
}
],
temperature: 0.4,
max_tokens: 600
});

Use Cases:

  • Sales analysis
  • Performance reporting
  • Trend identification
  • Data summarization
  • KPI tracking

Integration Examples

Express.js API with Rate Limiting

import express from 'express';
import rateLimit from 'express-rate-limit';
import OpenAI from 'openai';

const app = express();
app.use(express.json());

const limiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 100 // limit each IP to 100 requests per windowMs
});

const openai = new OpenAI({
apiKey: process.env.NEUREDGE_API_KEY,
baseURL: 'https://api.neuredge.dev/v1/'
});

app.post('/optimize-content', limiter, async (req, res) => {
try {
const { content, keywords } = req.body;

const response = await openai.chat.completions.create({
model: '@cf/mistral/mistral-7b-instruct-v0.2',
messages: [
{
role: 'system',
content: 'Optimize content for SEO while maintaining natural flow.'
},
{
role: 'user',
content: \`Content: ${content}\nTarget keywords: ${keywords.join(', ')}\`
}
]
});

res.json({
optimizedContent: response.choices[0].message.content,
usage: response.usage
});
} catch (error) {
res.status(500).json({ error: error.message });
}
});

Next.js with Streaming

// pages/api/stream-completion.js
import { OpenAIStream, StreamingTextResponse } from 'ai';
import OpenAI from 'openai';

const openai = new OpenAI({
apiKey: process.env.NEUREDGE_API_KEY,
baseURL: 'https://api.neuredge.dev/v1/'
});

export default async function handler(req, res) {
const { prompt } = await req.json();

const response = await openai.chat.completions.create({
model: '@cf/meta/llama-3.1-8b-instruct',
messages: [{ role: 'user', content: prompt }],
stream: true
});

const stream = OpenAIStream(response);
return new StreamingTextResponse(stream);
}

Best Practices

  1. Model Selection

    • Use Llama-3.1-8B for general tasks
    • Choose DeepSeek Coder for programming
    • Select Mistral-7B for balanced performance
  2. Performance Optimization

    • Implement response streaming
    • Use appropriate temperature settings
    • Cache common responses
    • Monitor token usage
  3. Error Handling

    • Implement retry logic
    • Handle rate limits
    • Provide fallback responses
    • Log errors for monitoring

Token Management

PlanMonthly Token Quota
Free Tier200K tokens
$29 Plan2M tokens
$49 Plan3M tokens

When to Use

Ideal For:

  • Production applications
  • Customer support systems
  • Content generation
  • Code assistance
  • Data analysis
  • Most business applications

Consider Alternatives When:

  • Extremely complex reasoning needed
  • Very long context required
  • Maximum accuracy critical
  • Resource constraints exist

Getting Started

To start using our small models: