GPT-SoVITS: AI Cloning with 1-Minute Voice Samples

GPT-SoVITS is an advanced model capable of voice transformation and text-to-speech timbre cloning with minimal sample input. It facilitates voice inference in Mandarin, English, and Japanese. According to developer tests, a voice sample as brief as five seconds allows for the creation of a voice clone with 80% to 95% similarity. Providing a one-minute voice sample significantly enhances the quality, closely mimicking a real human voice and enabling the development of superior text-to-speech models.

gpt-sovits-webui

Preliminary Steps

This guide outlines the installation of GPT-SoVITS and its application in synthesizing realistic AI voices using a one-minute voice clip on an 8GB VRAM graphics card. We utilize the following open-source repositories:

GPT-SoVITS Voice Synthesis Tool

GPT-SoVITS: GitHub Repository

Features:

Instantaneous Text-to-Speech (TTS) with Zero Samples: Input a five-second voice sample to instantly convert text to speech.
TTS with Minimal Samples: Refine the model with only one minute of training data to enhance fidelity and realism.
Multilingual Support: Performs inference in languages not present in the training set, currently including English, Japanese, and Chinese.
WebUI Toolset: Features tools such as voice accompaniment separation, automatic training set segmentation, Chinese automatic speech recognition (ASR), and text annotation, aiding novices in dataset preparation and GPT/SoVITS model creation.

Installation of GPT-SoVITS

Configuring a GPU Instance

For detailed interactive guidance, visit the LooPIN Liquidity Pool.

Steps:

Acquire GPU Resources via LooPIN Liquidity Pool: Visit LooPIN Network Pool and use $LOOPIN tokens to purchase GPU time, choosing the appropriate GPU model based on individual needs and budget, such as the RTX 3080, at GPU UserBenchmark.
Exchange Tokens for GPU Resources: Select the desired $LOOPIN token amount, adjust the GPU quantity via a slider, and finalize the transaction.
Accessing Jupyter Notebook: Post-transaction, navigate to the Server section under Rented Servers and access Jupyter Notebook via your remote server. Instance initialization typically requires 2-4 minutes.
Verify GPU Activation: In Jupyter Notebook, initiate a new terminal window, execute the nvidia-smi command to ensure GPU activation.

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3080        Off |   00000000:01:00.0 Off |                  N/A |
|  0%   39C    P8             21W /  350W |      12MiB /  12288MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+

GPT-SoVITS Framework Installation:

Dependencies:

Python 3.10.6
FFMPEG 16.0
CUDA >11.8

Installation Steps:

After the dependency software is installed, follow these steps to manually install GPT-SoVITS:

Open Linux Terminal, switch to the directory where you want to install GPT-SoVITS
Install Miniconda

curl -LO https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

Copy the project clone command, execute in Terminal:

git clone https://github.com/RVC-Boss/GPT-SoVITS

Copy the directory switch command, execute in Terminal:

cd GPT-SoVITS

Copy the Linux installation command, execute in Terminal:

conda create -n GPTSoVits python=3.9
conda activate GPTSoVits
bash install.sh

During the installation, continuously allow Proceed ([y]/n)?, enter y to continue installation

At this point, the GPT-SoVITS program is installed, next is to manually download the model files needed:

cd /workspace/GPT-SoVITS/GPT_SoVITS/pretrained_models
git clone https://huggingface.co/lj1995/GPT-SoVITS
# If in Chinese area use this
# git clone https://hf-mirror.com/lj1995/GPT-SoVITS
mkdir damo_asr && mkdir damo_asr/models && cd /workspace/GPT-SoVITS/tools/damo_asr/models
git clone https://www.modelscope.cn/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch.git
git clone https://www.modelscope.cn/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch.git
git clone https://www.modelscope.cn/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch.git
cd /workspace/GPT-SoVITS/tools/uvr5 && rm -r uvr5_weights
git clone https://huggingface.co/Delik/uvr5_weights
# If in Chinese area use this
# git clone https://hf-mirror.com/Delik/uvr5_weights
git config core.sparseCheckout true
mv /workspace/GPT-SoVITS/GPT_SoVITS/pretrained_models/GPT-SoVITS/* /workspace/GPT-SoVITS/GPT_SoVITS/pretrained_models/

Finally, run webui:

cd /workspace/GPT-SoVITS/ && python webui.py

We will continue to introduce how to use GPT-SoVITS in subsequent tutorials.

Conclusion

GPT-SoVITS stands out as a robust, user-friendly AI voice cloning tool that employs the GPT framework and reference audio prompts to address traditional voice cloning challenges. It supports multiple languages, is open-source, and offers both an integrated one-click installation package and a manual setup option, making

it accessible to beginners. Its capability to generate highly realistic voice clones with minimal training data positions it as a leader among free open-source voice cloning tools.

Updated at May 7, 2024

GPT-SoVITS: AI Cloning with 1-Minute Voice Samples

Preliminary Steps​

GPT-SoVITS Voice Synthesis Tool​

Features:​

Installation of GPT-SoVITS​

Configuring a GPU Instance​

Steps:​

GPT-SoVITS Framework Installation:​

Conclusion​