Large language models on a mobile device?

We are starting to see papers coming from Apple that highlight their efforts in the machine learning/artificial intelligence area.

Recently we have seen MLX a machine learning framework for Apple Silicon and Generating Molecular Conformer Fields.

Now two papers on arxiv describe work reducing the memory requirements required to run LLMs on devices with limited resources.

This paper tackles the challenge of efficiently running LLMs that exceed the available DRAM capacity by storing the model parameters on flash memory but bringing them on demand to DRAM…..

Models. We use OPT 6.7B and a sparsified Falcon 7B model for our evaluations…..

We have demonstrated the ability to run LLMs up to twice the size of available DRAM, achieving an acceleration in inference speed by 4-5x compared to traditional loading methods in CPU, and 20-25x in GPU. This breakthrough is particularly crucial for deploying advanced LLMs in resource-limited environments, thereby expanding their applicability and accessibility.

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Also, whilst most image models have been trained on static images they don’t always generalise to moving figures.

… a neural rendering framework that trains on 50-100 frames of a monocular video containing a human in a scene. HUGS enables novel view rendering with novel human poses at 60 FPS by learning a disentangled representation that can also render the human in other scenes

HUGS: Human Gaussian Splats

Code will apparently be made available on the Apple GitHub repository

Related Posts