Launching ChatGPT Analogues on Personal Computer

Find out the most effective and modern ways to launch ChatGPT analogues on your PC in a couple of clicks! Ask DataOx experts.
I want to order scraping

Introduction

People who are familiar with text neural networks have been on edge lately. Once the Llama model, aka "DIY ChatGPT", was leaked, people realised that neural networks can be run locally, and it doesn't require much: at least 16GB of regular RAM and a good processor.
While the technical blogs are just starting to get on with what's going on and post some outdated guides from a month ago, I'll show you a couple of new things.

How do you launch ChatGPT?

1. Via the processor

I could insert a link to the llama.cpp repository, which could be launched with anything but the coffee maker, and tell you to use it. That being said, this method is for geeks. We, on the other side, are going to make it in a couple of clicks and without the command line. And it should work fine.
So, here is the way to run a good model (13B parameters) on 16GB of regular RAM without unnecessary brainstorming: koboldcpp. koboldcpp is a fork of the llama.cpp repository, with several additions, and in particular the integrated Kobold AI Lite interface, which allows you to "communicate" with the neural network in several modes, create characters and scenarios, save chats, and much more.
Download any stable version of the compiled exe, launch it, select the model (find below the info on how to get them), go to the browser, and use it. That’s all! If you have 32GB of RAM, then you can run a 30B model. The quality will be much better, but the speed will be lower. This method accepts models in ggml format and does not require a video card.
P.S. If you have any doubts regarding running the exe, you can always check the sources and build everything yourself – this is an open programme.
Kobold AI Lite, Alpaca 13B. No dogs were hurt

2. Via video card

This method requires a lot of VRAM, but the generation speed is higher. Although the launch is a little more difficult, it doesn’t need much brainpower either.
Download this repository and read the attached instructions – you will need to run several batch files. Another repository will be downloaded to the same folder, and it will pull all the necessary dependencies with it. The installation is clean, as it uses a virtual environment.
Next, you will have to set the launch parameters to run llama models on a home video card. Specify the bitness of the model --wbits 4 (all models listed here work in 4bit). Add also --groupsize 128 if it was specified during the model conversion. As a rule, groupsize can be found while downloading the model.
How can we help you?
Get tailored web data scraping solutions to meet your business goals with DataOx.
One-time data delivery
One-time data delivery
Need data from any web source? Describe what you need, and we will provide you with structured and clean data in no time.
One-time data delivery
Custom delivery
One-time data delivery
Data quality guarantee
One-time data delivery
Custom data format
One-time data delivery
Expert consultation
starting at
$300
per delivery
Regular data delivery
Regular data delivery
For customers who require data on a regular basis, we offer scraped and cleansed data as frequently as you need – every month, week or even day.
Regular data delivery
Regular custom data delivery
Regular data delivery
Hourly, daily, monthly or weekly period
Regular data delivery
Data quality guarantee
Regular data delivery
Incremental data delivery
Regular data delivery
Expert consultation
starting at
$250
per month
Custom data solutions
Custom data solutions
We provide solutions for data-driven products and startups. This service is adapted to your business based on complex web data scraping solutions.
Custom data solutions
Custom requirements
Custom data solutions
Source code ownership
Custom data solutions
Maintenance
Custom data solutions
Training for your team
Custom data solutions
Regular expert consultation
Custom data solutions
Software integration
starting at
$1,500
per project
Schedule a call with our expert
Go
Get online estimation
Go
Unfortunately, only a 7B model in 4bit mode can fit in the ubiquitous 8GB VRAM, which in fact will be worse than the 13B model from the first method. 13B can only fit into a 16GB VRAM graphics card. And if you have 24GB VRAM (RTX 4090, that’s the one), even a 30B model will fit! But this applies to a minority of users.
Also, you can split the loaded model into VRAM and RAM - CPU Offloading. To do this, we prescribe --pre_layer "number of separated layers", e.g., 20. But this way, it will probably work even worse than at full load on RAM in the first method. This launch method accepts models in gptq format. The interface is slightly less convenient and functional than the one in the first method. It is also a little dilatory. The only upside is that there are extensions, such as built-in Google Translate.
oobabooga. screenshot from the project page on github
Out of these two options, I personally advise you to choose the first one, as it is much more stable and less difficult, and most users will be able to launch it. However, if you’ve got a nice video card with at least 16GB VRAM – try the second option.

How to find the right model

Nowadays, there are three quality models that are really worth checking out – Llama, Alpaca, and Vicuna.
  1. Llama is the origin of the model leaked in the early days. According to Facebook, the 13B version is equivalent to ChatGPT (135B) in tests. To my mind, 80% of this may be true, but not with our 4bit model.
  2. Alpaca is Llama with development on data with instructions (do me this, tell me that, etc.). In chat mode, this model is better than Llama.
  3. Vicuna is Llama with development directly on ChetGPT dialogues. It is the most similar to ChatGPT (which means it includes censorship). For now, there is only the 13b version.
You can download any of them here. Before downloading, mind the format – ggml or gptq.
Presently, I recommend using Alpaca. It doesn’t have censorship, it is available in 30B and is able to chat perfectly well.

Use cases

Both interfaces allow you to create a character in the role in which the AI will work. Therefore, there can be quite a few use cases.
Tell the character that it is an AI assistant programmer, and he will help with the code. Tell it that it is a cook, and it'll help you with the recipes. Tell it that it is a nice girl, and the rest you can figure out yourself... Moreover, in the first interface, there are Adventure and Story modes, which allow you to play with the neural network or write stories.

Conclusion

In general, everything is the same as in ChatGPT; chat interaction is not much different. Advanced users can connect to the APIs of running models and use them in their projects. Both interfaces allow it. If you’ve got any questions left, you can always schedule a free consultation with our expert!
Publishing date: Fri Apr 28 2023
Last update date: Thu May 04 2023