Apple embraces Nvidia GPUs to accelerate LLM inference via its open source ReDrafter tech




  • ReDrafter delivers 2.7x more tokens per second compared to traditional auto-regression
  • ReDrafter could reduce latency for users while using fewer GPUs
  • Apple hasn’t said when ReDrafter will be deployed on rival AI GPUs from AMD and Intel

Apple has announced a collaboration with Nvidia to accelerate large language model inference using its open source technology, Recurrent Drafter (or ReDrafter for short).

The partnership aims to address the computational challenges of auto-regressive token generation, which is crucial for improving efficiency and reducing latency in real-time LLM applications.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *