java.lang.Object

com.keenresearch.keenasr.KASRAudioQualityResult

public class KASRAudioQualityResult extends Object

KASRAudioQualityResult class contains various metrics for audio quality estimation, returned as part of the KASRResponse, including Signal to Noise Ratio (SNR) and various signal level metrics.

Method Summary

Modifier and Type

Method

Description

long

getClippedSampleCount()

Returns the number of raw samples in processed audio that were clipped (values outside of a clipping window defining a maximum and minimum threshold).

double[]

getFrameRMSValues()

Returns root mean square (RMS) values for each frame (25ms long with 10ms shift) in decibels in the processed audio.

boolean

getInitialSegmentRMSWarning()

Returns a flag indicating that a high root mean square (RMS) value was detected during the initial part of the processed audio.

double

getMeanNonSpeechRmsValue()

Returns the mean frame root mean square (RMS) decibel level for the non-speech segments (noise) in the processed audio.

double

getMeanSpeechRmsValue()

Returns the mean frame root mean square (RMS) decibel level for the speech segments in the processed audio.

double

getPeakSpeechRmsValue()

Returns the estimated peak root mean square (RMS) decibel level of speech in the processed audio.

double

getSnrValue()

Returns the estimated signal to noise ratio (SNR) in decibels for the processed audio.

String

toJSON()

Returns JSON representation of the KASRAudioQualityResult.

String

toString()

A descriptive representation of the KASRAudioQualityResult (useful for debugging purposes and logging).

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait

Method Details
- getFrameRMSValues
  
  public double[] getFrameRMSValues()
  
  Returns root mean square (RMS) values for each frame (25ms long with 10ms shift) in decibels in the processed audio. Values are computed as 20·log₁₀( √( Σ(sample value)² ) ). Where a frame has a zero signal level, the RMS dB value is reported as around -100dB.
  
  Returns:
  
  an array of RMS values.
- getClippedSampleCount
  
  public long getClippedSampleCount()
  
  Returns the number of raw samples in processed audio that were clipped (values outside of a clipping window defining a maximum and minimum threshold). Clipping indicates that the user might be too close to the microphone during audio capture or fidgeting with the microphone. Clipped samples can have negative impact on speech recognition performance.
  
  Returns:
  
  number of clipped samples.
- getSnrValue
  
  public double getSnrValue()
  
  Returns the estimated signal to noise ratio (SNR) in decibels for the processed audio. SNR is computed as the difference between mean speech RMS value and mean noise RMS value. SNR accounts for some transient noise but primarily models it does stationary background noise. Low values of SNR have negative impact on speech recognition performance.
  
  NOTE: SNR value will be NaN if there isn't sufficient data to compute it.
  
  Returns:
  
  estimated SNR in dB.
- getMeanSpeechRmsValue
  
  public double getMeanSpeechRmsValue()
  
  Returns the mean frame root mean square (RMS) decibel level for the speech segments in the processed audio.
  
  NOTE: This value will be NaN if no speech segments were available in the response.
  
  Returns:
  
  mean RMS level in dB of speech segments.
- getMeanNonSpeechRmsValue
  
  public double getMeanNonSpeechRmsValue()
  
  Returns the mean frame root mean square (RMS) decibel level for the non-speech segments (noise) in the processed audio.
  
  Returns:
  
  mean RMS level in dB of non-speech segments.
- getPeakSpeechRmsValue
  
  public double getPeakSpeechRmsValue()
  
  Returns the estimated peak root mean square (RMS) decibel level of speech in the processed audio. This value is computed as the 98th percentile of all the RMS speech levels, to filter outliers. Low values would indicate faint speech or the user being too far from the microphone. This metric may not work well for responses with very short speech segments.
  
  NOTE: This value will be NaN if no speech segments were available in the response.
  
  Returns:
  
  peak speech RMS value in dB.
- getInitialSegmentRMSWarning
  
  public boolean getInitialSegmentRMSWarning()
  Returns a flag indicating that a high root mean square (RMS) value was detected during the initial part of the processed audio. This could indicate that:
  
  The device is playing audio (which is captured by the microphone).
  
  The user began speaking earlier than expected (and recognizer started to listen too late).
  
  There are high levels of noise, in general.
  Returns:
  
  true if high initial segment RMS value was detected; false otherwise.
- toJSON
  
  public String toJSON()
  
  Returns JSON representation of the KASRAudioQualityResult.
  
  Returns:
  
  string containing a JSON representation of KASRAudioQualityResult.
- toString
  
  public String toString()
  
  A descriptive representation of the KASRAudioQualityResult (useful for debugging purposes and logging).
  
  Overrides:
  
  toString in class Object
  
  Returns:
  
  string that contains information about this audio quality result.

Class KASRAudioQualityResult

Method Summary

Methods inherited from class java.lang.Object

Method Details

getFrameRMSValues

getClippedSampleCount

getSnrValue

getMeanSpeechRmsValue

getMeanNonSpeechRmsValue

getPeakSpeechRmsValue

getInitialSegmentRMSWarning

toJSON

toString