Pease consider these guidelines when recording and preparing your audio so that the automated speech-to-text engine can do its best.
Maximize signal-to-noise ratio
- Minimize background noise
- Place the microphone close to the voices you are recording
- Speak clearly and loudly
- Record at the highest volume levels you can without clipping the audio
Optimize audio settings
When you prepare your audio (saving and encoding), certain settings can ensure a more accurate transcript:
- Bit Depth (a.k.a. Sample Size or Bits Per Sample) measures how many bits of information are recorded for each audio sample. A higher bit depth helps reduce noise in your recording.
- Optimal setting: 16 bits
- 8 bits has traditionally been a standard for voice recording, but this setting will result in suboptimal transcriptions.
- 24 bits (used in professional audio) is acceptable, but will result in a much larger file size, with limited marginal benefit for a voice recording.
- Optimal setting: 16 kHz
- 8 kHz (typically the standard for phone transmissions) will result in suboptimal transcriptions, and is not recommended.
- 44.1 kHz (CD quality) and 48 kHz (professional audio) are acceptable rates - we will store and play back your audio in your original sample rate, but the speech-to-text engine which generates the transcript will only accept sample rates up to 16 kHz (we will create a 16 kHz copy of your file to feed into the engine).
- Optimal setting: WAV or Decent-quality MP3 (128 kbps or higher)
- If you have recorded at a good bit depth and sample rate, the encoding format is of lesser importance. However, you should avoid uploading MP3s that are lower than 128 kbps bit rate - in general, the lower the MP3 bit rate, the lower the quality of the audio, and thus of the transcript.
- FLAC files are great for archiving, and can be uploaded for storage and audio transcription, but right now Pop Up Archive doesn't support FLAC playback in our audio player.