Number of raw audio samples that were clipping (reaching min/max value). Clipping may indicate the user is too close to the microphone.
Estimated signal-to-noise ratio in dB (meanSpeechRmsValue - meanNonSpeechRmsValue). Low values will affect recognition performance. Zero if insufficient data.
Mean frame RMS level for speech segments in dB. Zero if no speech detected.
Mean frame RMS level for noise (non-speech) segments in dB.
Estimated peak RMS level of speech in dB (98th percentile, filtering outliers). Low values indicate faint speech or user too far from mic. Zero if no speech detected.
True if high RMS was detected during the initial audio segment, which may indicate device audio playback, late recognizer start, or high ambient noise.
Per-frame RMS values in dB (25ms frames with 10ms shift). Computed as 20*log10(sqrt(sum of squared samples)). Approximately -100 dB for silent frames.
Audio quality metrics computed during recognition. Provided as part of the response to help assess recording conditions (noise, clipping, signal level).