Question 1

What is private AI?

Accepted Answer

Private AI is artificial intelligence deployed on infrastructure you own and control, so that data processed by the model — prompts, documents, embeddings, and responses — never leaves your network. It is the opposite of hosted AI APIs, which send your data to a third party's servers for processing.

Question 2

What does 'zero data egress' mean for AI?

Accepted Answer

Zero data egress means no part of an AI interaction — the input, the model's intermediate computations, or the response — is transmitted outside your network boundary. There are no API calls to external services, no logging by a vendor, and no data retained by a third party.

Question 3

How do you deploy a private AI model on-premises?

Accepted Answer

Private AI deployment on-premises requires GPU-capable servers (NVIDIA A100, H100, or equivalent), a model-serving runtime such as vLLM or Ollama, and a platform to manage inference, RAG pipelines, and access control. Cube AI packages these components into a single deployable platform with an OpenAI-compatible API.

Question 4

What is the difference between private AI and confidential AI?

Accepted Answer

Private AI means the model runs on your own hardware, so the data never leaves your network. Confidential AI goes further: the model runs inside a Trusted Execution Environment (TEE) where even the infrastructure operator — including your own privileged admins — cannot read the model weights or inference data. For most organizations, private AI satisfies the requirement; confidential AI is used for the highest-assurance workloads.

Question 5

Which open-weight models can I use for private AI deployment?

Accepted Answer

Any open-weight model that runs on the vLLM or Ollama runtimes is supported: Llama 3, Mistral, Mixtral, Phi-3, Qwen, DeepSeek, and others. Cube AI is model-agnostic — you choose the model based on your capability and hardware budget requirements.

Question 6

What GPU hardware is needed for self-hosted LLM inference?

Accepted Answer

Hardware requirements depend on model size. A 7B parameter model runs well on a single NVIDIA A10 or RTX 4090. A 70B model typically needs multiple A100 80GB or H100 GPUs. Cube AI supports both consumer-grade and data-centre GPUs and can run quantized models to reduce hardware requirements.

Question 7

Does private AI deployment comply with GDPR?

Accepted Answer

Private AI on your own infrastructure eliminates the cross-border data transfer risk that makes hosted AI APIs problematic under GDPR Article 44. Because no data leaves your perimeter, there are no international transfers to third-country cloud providers and no data processing agreements required with an AI vendor. Data protection by design (Article 25) is satisfied at the infrastructure level.

Frontier models,
zero data leaving your perimeter.

Private AI runs frontier models inside your infrastructure — no API, no egress.

Hosted AI means
data leaving your control.

Leading with Cube AI.

Cube AI

Prism AI

Common questions,
answered precisely.

Private inference,
on your terms.

Frontier models,zero data leaving your perimeter.