Guide: Self-hosting open source GPT chat with no GPU using GPT4All

GPT4All is easy for anyone to install and use. It allows you to download from a selection of ggml GPT models curated by GPT4All and provides a native GUI chat interface.


CPU: Any cpu will work but the more cores and mhz per core the better.
RAM: Varies by model requiring up to 16GB


GUI installer works out of the box on Ubuntu and then you just double-click the icon on your desktop.

wget ''
chmod +x './'

# Complete the GUI installer

# A gpt4all app icon should appear on your Desktop and in your app menu
# Run gpt4all from terminal the first time so you can catch errors
# It's also just nice to run it this way for more feedback
# This assumes you used the default install location

If it didn’t launch

On Debian and special builds of Ubuntu there may be missing dependencies. Strictly in my case I needed these on a custom build of Ubuntu Server/XFCE but you may need something else. Just run the app in terminal as shown above and read the errors (or post the errors here).

sudo apt install \
	libxcb-icccm4 \
	libxcb-image0 \
	libxcb-keysyms1 \
	libxcb-render-util0 \
	libxcb-xinerama0 \
	libxcb-xkb1 \


You’ll be presented with a list of available models, just download the ones you’d like to try.

In my experience the first model (“Hermes”) is the best but it can leave you hanging for 60~30 seconds before it starts responding.

After downloading is complete you’ll have access to the chat interface.


Will it run on the GPU or will it run only using the CPU?

This is a new gui version I have not tried yet but, it looks interesting and the python surface seems nice to build personalized tools on.

With some digging I found gptJ which is very similar but geared toward running as a command: GitHub - kuvaus/LlamaGPTJ-chat: Simple chat program for LLaMa, GPT-J, and MPT models.

Installed here on an older laptop -i7-7700 with 16 Gigs of RAM. Ubuntu Studio 22.04 KDE. It warned me about Hermes being at the upper limit for this machine. But it installed and works – but generates responses at about 300 buad. Kind of reminds me of my Compuserve days circa 1983 where you could see the individual letters filling in the screen.

For a test prompt I asked it to cross reference and match by manufacture part numbers commercial 400 AMP Electrical Panels. It took about 60 seconds like you said, but it generated a very good response and was quicker than scrolling through Google’s din of responses.

It’s CPU-only. It’s uses ggml quantized models which can run on both CPU and GPU but the GPT4All software is only designed to use the CPU.

That’s new to me, thanks for sharing!

Ikr! It’s the worst but if it’s all you got then it’s good :stuck_out_tongue:

I have a GPU guide i’m almost ready to post that runs Hermes, Vicuna and WizardLM at ChatGPT speeds (faster than you can read) with a web gui and backend API. Will post soon.