Small Models (3.1B-8B Parameters)

Our small models provide an excellent balance of performance and efficiency. With 3.1B to 8B parameters, these models offer strong capabilities while maintaining fast inference speeds and reasonable resource requirements.

Available Models

Llama-3.1-8B

Parameters: 8B
Context Window: 8192 tokens
Provider: Meta
License: Meta Llama 3 License
Key Features:
- Multilingual support
- Efficient architecture
- Balanced performance
- JSON mode available

Mistral-7B

Parameters: 7B
Context Window: 8192 tokens
Provider: Mistral AI
License: Apache 2.0
Key Features:
- Strong few-shot learning
- Efficient inference
- General purpose capabilities

DeepSeek Coder 6.7B

Parameters: 6.7B
Training: 87% code, 13% natural language
Provider: DeepSeek AI
License: Apache 2.0
Key Features:
- Code specialization
- Multiple programming languages
- Technical documentation generation

Configuration & Setup

OpenAI SDK Setup

import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: 'your-neuredge-key',
  baseURL: 'https://api.neuredge.dev/v1/'
});

Native SDK Setup

import { Neuredge } from '@neuredge/sdk';

const neuredge = new Neuredge({
  apiKey: 'your-api-key'
});

Real-World Applications & Examples

1. Customer Support Automation

Support Bot Implementation

const response = await openai.chat.completions.create({
  model: '@cf/meta/llama-3.1-8b-instruct',
  messages: [
    {
      role: 'system',
      content: \`You are a customer support agent for an e-commerce platform.
      Be concise, helpful, and professional.
      Current return policy: 30-day returns for unused items.\`
    },
    {
      role: 'user',
      content: "I received my order yesterday but it's the wrong size. What should I do?"
    }
  ],
  temperature: 0.7,
  max_tokens: 200
});

Use Cases:

24/7 customer support
FAQ automation
Order status inquiries
Return processing assistance
Product recommendations

2. Code Generation & Documentation

Code Assistant

const response = await openai.chat.completions.create({
  model: '@cf/deepseek-coder-6.7b-base',
  messages: [
    {
      role: 'system',
      content: 'You are a coding assistant. Provide clean, well-documented code with explanations.'
    },
    {
      role: 'user',
      content: \`Create a React component for a responsive navigation bar with:
      - Mobile hamburger menu
      - Dark/light theme toggle
      - User profile dropdown\`
    }
  ],
  temperature: 0.3,
  max_tokens: 500
});

Use Cases:

Code completion
Documentation generation
Code review assistance
Bug fixing suggestions
Architecture patterns

3. Content Optimization

const response = await openai.chat.completions.create({
  model: '@cf/mistral/mistral-7b-instruct-v0.2',
  messages: [
    {
      role: 'system',
      content: 'You are an SEO expert. Optimize content while maintaining readability and value.'
    },
    {
      role: 'user',
      content: \`Optimize this product description for SEO:
      ${originalDescription}
      Target keywords: ergonomic office chair, lumbar support, adjustable height\`
    }
  ],
  temperature: 0.6,
  max_tokens: 300
});

Use Cases:

Content optimization
Meta description generation
Keyword integration
Title tag creation
Product descriptions

4. Data Analysis & Reporting

Report Generation

const response = await openai.chat.completions.create({
  model: '@cf/meta/llama-3.1-8b-instruct',
  messages: [
    {
      role: 'system',
      content: 'You are a data analyst generating insights from sales data.'
    },
    {
      role: 'user',
      content: \`Analyze this monthly sales data and create a summary report:
      ${salesData}
      Focus on key trends, anomalies, and recommendations.\`
    }
  ],
  temperature: 0.4,
  max_tokens: 600
});

Use Cases:

Sales analysis
Performance reporting
Trend identification
Data summarization
KPI tracking

Integration Examples

Express.js API with Rate Limiting

import express from 'express';
import rateLimit from 'express-rate-limit';
import OpenAI from 'openai';

const app = express();
app.use(express.json());

const limiter = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 100 // limit each IP to 100 requests per windowMs
});

const openai = new OpenAI({
  apiKey: process.env.NEUREDGE_API_KEY,
  baseURL: 'https://api.neuredge.dev/v1/'
});

app.post('/optimize-content', limiter, async (req, res) => {
  try {
    const { content, keywords } = req.body;
    
    const response = await openai.chat.completions.create({
      model: '@cf/mistral/mistral-7b-instruct-v0.2',
      messages: [
        {
          role: 'system',
          content: 'Optimize content for SEO while maintaining natural flow.'
        },
        {
          role: 'user',
          content: \`Content: ${content}\nTarget keywords: ${keywords.join(', ')}\`
        }
      ]
    });
    
    res.json({
      optimizedContent: response.choices[0].message.content,
      usage: response.usage
    });
  } catch (error) {
    res.status(500).json({ error: error.message });
  }
});

Next.js with Streaming

// pages/api/stream-completion.js
import { OpenAIStream, StreamingTextResponse } from 'ai';
import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: process.env.NEUREDGE_API_KEY,
  baseURL: 'https://api.neuredge.dev/v1/'
});

export default async function handler(req, res) {
  const { prompt } = await req.json();

  const response = await openai.chat.completions.create({
    model: '@cf/meta/llama-3.1-8b-instruct',
    messages: [{ role: 'user', content: prompt }],
    stream: true
  });

  const stream = OpenAIStream(response);
  return new StreamingTextResponse(stream);
}

Best Practices

Model Selection
- Use Llama-3.1-8B for general tasks
- Choose DeepSeek Coder for programming
- Select Mistral-7B for balanced performance
Performance Optimization
- Implement response streaming
- Use appropriate temperature settings
- Cache common responses
- Monitor token usage
Error Handling
- Implement retry logic
- Handle rate limits
- Provide fallback responses
- Log errors for monitoring

Token Management

Plan	Monthly Token Quota
Free Tier	200K tokens
$29 Plan	2M tokens
$49 Plan	3M tokens

When to Use

✅ Ideal For:

Production applications
Customer support systems
Content generation
Code assistance
Data analysis
Most business applications

❌ Consider Alternatives When:

Extremely complex reasoning needed
Very long context required
Maximum accuracy critical
Resource constraints exist

Getting Started

To start using our small models:

Available Models​

Llama-3.1-8B​

Mistral-7B​

DeepSeek Coder 6.7B​

Configuration & Setup​

OpenAI SDK Setup​

Native SDK Setup​

Real-World Applications & Examples​

1. Customer Support Automation​

Support Bot Implementation​

2. Code Generation & Documentation​

Code Assistant​

3. Content Optimization​

SEO Content Enhancement​

4. Data Analysis & Reporting​

Report Generation​

Integration Examples​

Express.js API with Rate Limiting​

Next.js with Streaming​

Best Practices​

Token Management​

When to Use​

Getting Started​

Available Models

Llama-3.1-8B

Mistral-7B

DeepSeek Coder 6.7B

Configuration & Setup

OpenAI SDK Setup

Native SDK Setup

Real-World Applications & Examples

1. Customer Support Automation

Support Bot Implementation

2. Code Generation & Documentation

Code Assistant

3. Content Optimization

SEO Content Enhancement

4. Data Analysis & Reporting

Report Generation

Integration Examples

Express.js API with Rate Limiting

Next.js with Streaming

Best Practices

Token Management

When to Use

Getting Started