Step 1: Subscribe to the Creator Plan or Higher
To access the professional voice cloning feature, you'll need to subscribe to the Creator plan or a higher tier. This unlocks the functionality required to create a high-quality digital replica of your voice.
Without this subscription, you’ll be limited to basic features that don't include professional-grade voice cloning.
Step 2: Prepare Your Audio Data
Gather at least one hour of high-quality audio data featuring your voice. This data should be free from background noise, music, and other distractions. Clean audio samples are crucial for achieving a realistic voice clone. Ensure your audio is clear, consistent, and accurately represents your vocal characteristics.
Step 3: Navigate to the Voice Cloning Section
After logging into ElevenLabs, navigate to the voice cloning section within the platform. Look for options like 'Create a new voice' or 'Add a new voice.' This will lead you to the interface where you can begin the cloning process.
Ensure that you have selected professional voice cloning instead of the instant cloning option to leverage the higher-quality model training capabilities.
Step 4: Upload and Label Your Audio Samples
Upload your prepared audio files to the platform. ElevenLabs recommends providing at least 30 minutes of speaker training data for optimal results, though three hours is optimal. You’ll also need to label your audio samples with appropriate descriptors, such as accent, style, and other Relevant characteristics. This helps the AI understand and replicate your voice more accurately.
Ensure that the language used in the samples is correctly identified for best results.
Step 5: Verify Your Voice
Once your audio is uploaded and processed, ElevenLabs will ask you to verify your voice. This involves recording and reading aloud from a provided Prompt to confirm your identity. This step ensures that you are the legitimate owner of the voice being cloned. To verify smoothly, use the same or similar equipment used to Record the samples, and speak in a tone and delivery similar to what was Present in the samples.

Step 6: Fine-Tuning and Voice Generation
After the verification process, the AI model will take some time to train. The duration depends on the number of people in the queue before you and a few other factors. Once the training is complete, you’ll have access to your cloned voice. You can use this voice to generate speech from any text input. The fine-tuning of voice style should correspond to the specific delivery needs such as audiobook.
ElevenLabs recommends waiting somewhere between 2 to 6 hours until voice is ready