Launching ChatGPT Analogues on Personal Computer
Find out the most effective and modern ways to launch ChatGPT analogues on your PC in a couple of clicks! Ask DataOx experts.
Ask us to scrap the website and receive free data samle in XLSX, CSV, JSON or Google Sheet in 3 days
Scraping is the our field of expertise: we completed more than 800 scraping projects (including protected resources)
Table of contents
Estimated reading time: 4 minutes
Introduction
People who are familiar with text neural networks have been on edge lately. Once the Llama model, aka "DIY ChatGPT", was leaked, people realised that neural networks can be run locally, and it doesn't require much: at least 16GB of regular RAM and a good processor.
While the technical blogs are just starting to get on with what's going on and post some outdated guides from a month ago, I'll show you a couple of new things.
How do you launch ChatGPT?
1. Via the processor
I could insert a link to the llama.cpp repository, which could be launched with anything but the coffee maker, and tell you to use it. That being said, this method is for geeks. We, on the other side, are going to make it in a couple of clicks and without the command line. And it should work fine.
So, here is the way to run a good model (13B parameters) on 16GB of regular RAM without unnecessary brainstorming: koboldcpp.
koboldcpp is a fork of the llama.cpp repository, with several additions, and in particular the integrated Kobold AI Lite interface, which allows you to "communicate" with the neural network in several modes, create characters and scenarios, save chats, and much more.
Download any stable version of the compiled exe, launch it, select the model (find below the info on how to get them), go to the browser, and use it. That’s all!
If you have 32GB of RAM, then you can run a 30B model. The quality will be much better, but the speed will be lower.
This method accepts models in ggml format and does not require a video card.
P.S. If you have any doubts regarding running the exe, you can always check the sources and build everything yourself – this is an open programme.
2. Via video card
This method requires a lot of VRAM, but the generation speed is higher. Although the launch is a little more difficult, it doesn’t need much brainpower either.
Download this repository and read the attached instructions – you will need to run several batch files.
Another repository will be downloaded to the same folder, and it will pull all the necessary dependencies with it. The installation is clean, as it uses a virtual environment.
Next, you will have to set the launch parameters to run llama models on a home video card.
Specify the bitness of the model --wbits 4 (all models listed here work in 4bit). Add also --groupsize 128 if it was specified during the model conversion. As a rule, groupsize can be found while downloading the model.
Unfortunately, only a 7B model in 4bit mode can fit in the ubiquitous 8GB VRAM, which in fact will be worse than the 13B model from the first method. 13B can only fit into a 16GB VRAM graphics card. And if you have 24GB VRAM (RTX 4090, that’s the one), even a 30B model will fit! But this applies to a minority of users.
Also, you can split the loaded model into VRAM and RAM - CPU Offloading. To do this, we prescribe --pre_layer "number of separated layers", e.g., 20. But this way, it will probably work even worse than at full load on RAM in the first method.
This launch method accepts models in gptq format.
The interface is slightly less convenient and functional than the one in the first method. It is also a little dilatory. The only upside is that there are extensions, such as built-in Google Translate.
Out of these two options, I personally advise you to choose the first one, as it is much more stable and less difficult, and most users will be able to launch it. However, if you’ve got a nice video card with at least 16GB VRAM – try the second option.
How to find the right model
Nowadays, there are three quality models that are really worth checking out – Llama, Alpaca, and Vicuna.
- Llama is the origin of the model leaked in the early days. According to Facebook, the 13B version is equivalent to ChatGPT (135B) in tests. To my mind, 80% of this may be true, but not with our 4bit model.
- Alpaca is Llama with development on data with instructions (do me this, tell me that, etc.). In chat mode, this model is better than Llama.
- Vicuna is Llama with development directly on ChetGPT dialogues. It is the most similar to ChatGPT (which means it includes censorship). For now, there is only the 13b version.
You can download any of them here.
Before downloading, mind the format – ggml or gptq.
Presently, I recommend using Alpaca. It doesn’t have censorship, it is available in 30B and is able to chat perfectly well.
Use cases
Both interfaces allow you to create a character in the role in which the AI will work.
Therefore, there can be quite a few use cases.
Tell the character that it is an AI assistant programmer, and he will help with the code.
Tell it that it is a cook, and it'll help you with the recipes.
Tell it that it is a nice girl, and the rest you can figure out yourself...
Moreover, in the first interface, there are Adventure and Story modes, which allow you to play with the neural network or write stories.
Conclusion
In general, everything is the same as in ChatGPT; chat interaction is not much different.
Advanced users can connect to the APIs of running models and use them in their projects. Both interfaces allow it.
If you’ve got any questions left, you can always schedule a free consultation with our expert!
Publishing date: Fri Apr 28 2023
Last update date: Thu May 04 2023