Overview

Our text generation capabilities provide state-of-the-art language models for a wide range of applications. With OpenAI-compatible endpoints and multiple model sizes, we offer the flexibility to choose the right model for your specific needs.

OpenAI Compatibility

Our text generation models are fully compatible with OpenAI's Chat API, allowing for seamless migration from OpenAI's GPT models. Simply update the base URL and model name:

import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: 'your-neuredge-key',
  baseURL: 'https://api.neuredge.dev/v1/'
});

const completion = await openai.chat.completions.create({
  model: '@cf/meta/llama-3.1-70b-instruct',  // Use any Neuredge model
  messages: [
    {role: 'system', content: 'You are a helpful assistant.'},
    {role: 'user', content: 'What is quantum computing?'}
  ]
});

Available Models

Category	Parameters	Models	Use Case	Performance
Base Models	≤3B	TinyLlama, Phi-2, Gemma-2B	Rapid prototyping, Development	Good
Small Models	3.1B-8B	Llama-3.1-8B, Mistral-7B, DeepSeek Coder	Production applications	Better
Medium Models	8.1B-20B	Qwen-14B, DeepSeek Math	Specialized tasks	Great
XLarge Models	40B+	Llama-3.1-70B	Maximum performance	Best

Model Selection Guide

Base Models (≤3B)
- Ideal for: Development, testing, prototyping
- Best for: Quick iterations, simple tasks
- Trade-offs: Lower accuracy for faster speed
Small Models (3.1B-8B)
- Ideal for: Production applications
- Best for: Most business use cases
- Trade-offs: Balanced performance and speed
Medium Models (8.1B-20B)
- Ideal for: Specialized tasks
- Best for: Domain-specific applications
- Trade-offs: Better quality with moderate speed
XLarge Models (40B+)
- Ideal for: Maximum performance needs
- Best for: Complex reasoning, research
- Trade-offs: Highest quality but slower

Core Capabilities

1. Chat Completions

const response = await openai.chat.completions.create({
  model: '@cf/meta/llama-3.1-8b-instruct',
  messages: [
    {
      role: 'system',
      content: 'You are a knowledgeable assistant specializing in technology.'
    },
    {
      role: 'user',
      content: 'Explain how blockchain works in simple terms.'
    }
  ],
  temperature: 0.7,
  max_tokens: 500
});

2. Streaming Responses

const stream = await openai.chat.completions.create({
  model: '@cf/meta/llama-3.1-8b-instruct',
  messages: [{
    role: 'user',
    content: 'Write a story about space exploration'
  }],
  stream: true
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

3. Function Calling (Available on select models)

const response = await openai.chat.completions.create({
  model: '@cf/meta/llama-3.1-70b-instruct',
  messages: [{
    role: 'user',
    content: 'What\'s the weather like in London?'
  }],
  functions: [{
    name: 'get_weather',
    description: 'Get current weather for a location',
    parameters: {
      type: 'object',
      properties: {
        location: {
          type: 'string',
          description: 'City name'
        }
      },
      required: ['location']
    }
  }]
});

Integration Examples

Express.js Chat API

import express from 'express';
import OpenAI from 'openai';

const app = express();
app.use(express.json());

const openai = new OpenAI({
  apiKey: process.env.NEUREDGE_API_KEY,
  baseURL: 'https://api.neuredge.dev/v1/'
});

app.post('/chat', async (req, res) => {
  try {
    const { messages, stream = false } = req.body;
    
    if (stream) {
      const stream = await openai.chat.completions.create({
        model: '@cf/meta/llama-3.1-8b-instruct',
        messages,
        stream: true
      });

      res.setHeader('Content-Type', 'text/event-stream');
      for await (const chunk of stream) {
        res.write(`data: ${JSON.stringify(chunk)}\n\n`);
      }
      res.end();
    } else {
      const completion = await openai.chat.completions.create({
        model: '@cf/meta/llama-3.1-8b-instruct',
        messages
      });
      
      res.json(completion);
    }
  } catch (error) {
    res.status(500).json({ error: error.message });
  }
});

Next.js API Route with Streaming

// pages/api/chat.js
import { OpenAIStream, StreamingTextResponse } from 'ai';
import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: process.env.NEUREDGE_API_KEY,
  baseURL: 'https://api.neuredge.dev/v1/'
});

export default async function handler(req, res) {
  const { messages } = await req.json();

  const response = await openai.chat.completions.create({
    model: '@cf/meta/llama-3.1-8b-instruct',
    messages,
    stream: true
  });

  const stream = OpenAIStream(response);
  return new StreamingTextResponse(stream);
}

Best Practices

Model Selection
- Choose based on task complexity
- Consider latency requirements
- Balance cost and performance
- Test different sizes
Prompt Engineering
- Be specific and clear
- Use system messages effectively
- Include examples when needed
- Consider temperature setting
Performance Optimization
- Implement streaming for long responses
- Cache common responses
- Use appropriate max_tokens
- Monitor token usage
Error Handling
- Implement retry logic
- Handle rate limits
- Provide fallbacks
- Monitor responses

Token Management

Plan	Monthly Token Quota (Base Models)
Free Tier	300K tokens
$29 Plan	3M tokens
$49 Plan	4.5M tokens

Note: Token quotas vary by model size. See individual model pages for details.

When to Use Each Model Size

Base Models (≤3B)

✅ Ideal For:

Development and testing
Quick prototypes
Simple interactions
Learning and exploration

Small Models (3.1B-8B)

✅ Ideal For:

Production applications
Customer support
Content generation
General business use

Medium Models (8.1B-20B)

✅ Ideal For:

Specialized tasks
Technical content
Mathematical computations
Domain expertise

XLarge Models (40B+)

✅ Ideal For:

Complex reasoning
Research applications
Professional content
Maximum accuracy needs

Getting Started

Install the SDK:

npm install @neuredge/sdk
# or
pip install neuredge-sdk

Initialize the client:

// Using OpenAI SDK
import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: 'your-neuredge-key',
  baseURL: 'https://api.neuredge.dev/v1/'
});

// Or using native SDK
import { Neuredge } from '@neuredge/sdk';

const neuredge = new Neuredge({
  apiKey: 'your-api-key'
});

Learn more:

OpenAI Compatibility​

Available Models​

Model Selection Guide​

Core Capabilities​

1. Chat Completions​

2. Streaming Responses​

3. Function Calling (Available on select models)​

Integration Examples​

Express.js Chat API​

Next.js API Route with Streaming​

Best Practices​

Token Management​

When to Use Each Model Size​

Base Models (≤3B)​

Small Models (3.1B-8B)​

Medium Models (8.1B-20B)​

XLarge Models (40B+)​

Getting Started​

OpenAI Compatibility

Available Models

Model Selection Guide

Core Capabilities

1. Chat Completions

2. Streaming Responses

3. Function Calling (Available on select models)

Integration Examples

Express.js Chat API

Next.js API Route with Streaming

Best Practices

Token Management

When to Use Each Model Size

Base Models (≤3B)

Small Models (3.1B-8B)

Medium Models (8.1B-20B)

XLarge Models (40B+)

Getting Started