Back to products
Beta

Citadea Neural

Dedicated LPUs for flat-rate inference

Sovereign inference on dedicated LPUs, billed at a flat rate — so an agent reasoning loop never produces a frightening invoice.

Overview

Dedicated LPU (Language Processing Unit) accelerators optimised for low-latency inference and high token throughput on open-source models (Llama, Mistral and more). Up to 70% less energy than traditional GPUs — and a predictable flat-rate bill instead of per-token shock.

Dedicated LPUs, open-source models

Neural serves open-source models such as Llama and Mistral on dedicated, non-shared LPU accelerators, with real-time token streaming and edge latency under 20 ms. LPUs draw up to 70% less energy than comparable GPU inference.

Flat rate beats token shock

Instead of per-token pricing that punishes high-utilisation agent workloads, Neural sells a predictable monthly slice. Above roughly 30% sustained utilisation, flat-rate inference is simply cheaper — and always forecastable.

Compared to Together AI, Fireworks, Baseten and AWS Bedrock.

What's included

  • Dedicated LPUs (non-shared)
  • Open-source models (Llama, Mistral)
  • Real-time token streaming
  • Predictable flat-rate pricing
  • EU AI Act-aligned
  • Edge latency < 20 ms

Quickstart

$ citadea neural infer --model <id>

Benchmarks

-70%
Energy vs GPU
< 20 ms
P99 latency

Use cases

Llama / Mistral inferenceProduction RAGEnterprise chatbotsReal-time streaming

Integrations

OpenAI-compatible APIvLLMTGILangChain

Frequently asked