Which models are supported?

Llama 3, Mistral and other vLLM-compatible open-source models, served on dedicated LPUs.

How does flat-rate work?

You rent a dedicated LPU slice for a fixed monthly price, no per-token billing, no token shock.

Neural, Dedicated LPUs for flat-rate inference

Overview

Dedicated LPU (Language Processing Unit) accelerators optimised for low-latency inference and high token throughput on open-source models (Llama, Mistral and more). Up to 70% less energy than traditional GPUs, and a predictable flat-rate bill instead of per-token shock.

Dedicated LPUs, open-source models

Neural serves open-source models such as Llama and Mistral on dedicated, non-shared LPU accelerators, with real-time token streaming and edge latency under 20 ms. LPUs draw up to 70% less energy than comparable GPU inference.

Flat rate beats token shock

Instead of per-token pricing that punishes high-utilisation agent workloads, Neural sells a predictable monthly slice. Above roughly 30% sustained utilisation, flat-rate inference is simply cheaper, and always forecastable.

Compared to Together AI, Fireworks, Baseten and AWS Bedrock.