Get started with AI inference

Discover how to build smarter, more efficient AI inference systems. Learn about quantization, sparsity, and advanced techniques like vLLM with Red Hat AI.

This e-book introduces the fundamentals of inference performance engineering and model optimization, with a focus on quantization, sparsity, and other techniques that help reduce compute and memory requirements, as well as runtime systems like Virtual Large Language Model (vLLM), which offer benefits for efficient inference.

Fill the form to download the whitepaper.

Company Size

Red Hat may use your personal data to inform you about its products, services, and events. You may withdraw your consent any time (see Privacy Statement for details).

By clicking/downloading the asset, you agree to allow the sponsor to use your contact data to keep you informed of products, services, and offerings by Phone, Email, and Postal Mail. You may unsubscribe from receiving marketing emails from us by clicking the unsubscribe link in each such email. More information on the processing of your personal data by the sponsor can be found in the sponsor's Privacy Statement. By clicking the download button, I acknowledge that I have read and understood the sponsor's Privacy Statement.