Class KASRAudioQualityResult

java.lang.Object
com.keenresearch.keenasr.KASRAudioQualityResult

public class KASRAudioQualityResult extends Object
KASRAudioQualityResult class contains various metrics for audio quality estimation, returned as part of the KASRResponse, including Signal to Noise Ratio (SNR) and various signal level metrics.
  • Method Summary

    Modifier and Type
    Method
    Description
    long
    Returns the number of raw samples in processed audio that were clipping, i.e.
    double[]
    Returns root mean square (RMS) values for each frame (25ms long with 10ms shift) in decibels in the processed audio.
    boolean
    Returns a flag indicating that a high root mean square (RMS) value was detected during the initial part of the processed audio.
    double
    Returns the mean frame root mean square (RMS) level for the non-speech segments (noise) for the processed audio in decibels.
    double
    Returns the mean frame root mean square (RMS) level for the speech segments in the processed audio in decibels.
    double
    Returns the estimated peak root mean square (RMS) level of speech in the processed audio, in decibels.
    double
    Returns the estimated signal to noise ratio (SNR) in decibels for processed audio.
    Returns JSON representation of the KASRAudioQualityResult.
    Descriptive representation of the KASRAudioQualityResult (can be useful for debugging purposes and logging).

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Method Details

    • getFrameRMSValues

      public double[] getFrameRMSValues()
      Returns root mean square (RMS) values for each frame (25ms long with 10ms shift) in decibels in the processed audio. Values are computed as 20*log10(√(sum of squared sample values)). Where a frame has a zero signal level, the RMS dB value is reported as around -100dB.
      Returns:
      an array of RMS values
    • getClippedSampleCount

      public long getClippedSampleCount()
      Returns the number of raw samples in processed audio that were clipping, i.e. reaching either maximum or minimum value. Clipping indicates that the user might be too close to the microphone during audio capture or fidgeting with the microphone. It can have negative implications on speech recognition performance.
      Returns:
      number of clipped samples
    • getSnrValue

      public double getSnrValue()
      Returns the estimated signal to noise ratio (SNR) in decibels for processed audio. SNR is computed as the difference between mean speech rms value and mean noise rms value. The way SNR is currently computed may not take transient noise as effectively into account as it does stationary background noise. Low values will affect speech recognition performance.
      Returns:
      estimated SNR in dB
    • getMeanSpeechRmsValue

      public double getMeanSpeechRmsValue()
      Returns the mean frame root mean square (RMS) level for the speech segments in the processed audio in decibels.
      Returns:
      mean RMS level in dB of speech segments
    • getMeanNonSpeechRmsValue

      public double getMeanNonSpeechRmsValue()
      Returns the mean frame root mean square (RMS) level for the non-speech segments (noise) for the processed audio in decibels.
      Returns:
      mean RMS level in dB of non-speech segments
    • getPeakSpeechRmsValue

      public double getPeakSpeechRmsValue()
      Returns the estimated peak root mean square (RMS) level of speech in the processed audio, in decibels. This value is computed as the 98th percentile of all the RMS speech levels to filter outliers. Low values would indicate faint speech or user being too far from the microphone. This metric may not work well for responses with very short speech segments.
      Returns:
      peak speech RMS value in dB
    • getInitialSegmentRMSWarning

      public boolean getInitialSegmentRMSWarning()
      Returns a flag indicating that a high root mean square (RMS) value was detected during the initial part of the processed audio. This could either indicate that: a) device is playing audio (which is captured by the microphone), b) the user has already started speaking (and recognizer started to listen too late), c) high levels of noise in general.
      Returns:
      high initial segment RMS value flag
    • toJSON

      public String toJSON()
      Returns JSON representation of the KASRAudioQualityResult.
      Returns:
      string containing JSON representation of KASRResult
    • toString

      public String toString()
      Descriptive representation of the KASRAudioQualityResult (can be useful for debugging purposes and logging).
      Overrides:
      toString in class Object
      Returns:
      string that contains information about this audio quality result