Class KASRDecodingGraph

java.lang.Object
com.keenresearch.keenasr.KASRDecodingGraph

public class KASRDecodingGraph extends Object
KASRDecodingGraph class manages decoding graphs in the filesystem. For more details on the concept of decoding graphs in automated speech recognition see this link.

For ASR tasks in which domain and vocabulary are defined ahead of time and not dependent on information available only during the runtime, it is recommended that decoding graph is created offline and packaged in the ASR bundle directory.

If user specific information is necessary to create decoding graphs, you can use various KASRDecodingGraph class methods to dynamically create decoding graphs, which will be saved in the filesystem on the device. Typically, you will provide a list of sentences/phrases to createDecodingGraphFromSentences(String[], KASRRecognizer, String) method, which will then create a custom decoding graph. Later on, you can refer to the custom decoding graph by its name. Alternatively, instead of list of sentences/phrases you can provide an ARPA language model (bundled with your app), which will be used to build a custom decoding graph.

If your app needs to support continuous listening with trigger phrase support you will need to build the decoding graph using createDecodingGraphFromSentencesWithTriggerPhrase(String[], String, KASRRecognizer, String) method.

Decoding graphs can only be built dynamically if the lang/ subdirectory in the ASR bundle exists.

Note: When dynamically creating decoding graphs, any words that do not have phonetic representation in the lexicon (ASRBUNDLE/lang/lexicon.txt) will be assigned one algorithmically. For English language algorithmic representation is imperfect, thus you should aim to manually augment the lexicon text file with pronunciations for as many additional words that are likely to be encountered in your app. For example, if your app is dealing with ASR of names you would augment the lexicon with additional names and their proper pronunciation before releasing your app.

In the current version of the framework, creating of decoding graph can take on the order of 10-30sec for medium size vocabulary task (more than thousand words). For larger language models we recommend you create decoding graph ahead of time and bundle it with your app.

  • Constructor Details

    • KASRDecodingGraph

      public KASRDecodingGraph()
  • Method Details

    • createDecodingGraphFromPhrases

      public static boolean createDecodingGraphFromPhrases(String[] phrases, KASRRecognizer recognizer, String dgName)
      Create custom decoding graph from an array of sentences/phrases and save it in the filesystem under for later use. Custom decoding graphs can be referenced by their name by various methods in the framework.
      Parameters:
      phrases - an array of String objects that specify phrases recognizer should listen for. These sentences are used to create an ngram language model, from which decoding graph is created. Text in sentences should be normalized (e.g. numbers and dates should be represented by words, so 'two hundred dollars' not $200)
      recognizer - KASRRecognizer object that will be used to perform recognition with this decoding graph. Note that decoding graph is persisted in the filesystem and can be resued at the later time with a different KASRRecognizer object as long as such recognizer uses the same ASR bundle as the KASRRecognizer object used to create the decoding graph.
      dgName - a name of the custom decoding graph. All graph resources will be stored in a directy named DECODING_GRAPH_NAME-ASR_BUNDLE_NAME in context.getApplicationInfo().dataDir + " + dgName + "-" + asrBundleName
      Returns:
      True on success, false otherwise
    • createDecodingGraphFromPhrases

      public static boolean createDecodingGraphFromPhrases(String[] phrases, KASRRecognizer recognizer, ArrayList<KASRAlternativePronunciation> alternativePronunciations, KASRDecodingGraph.KASRSpeakingTask task, float spokenNoiseProbability, String dgName)
      Create custom decoding graph from an array of sentences/phrases, for a specific task, using provided array of word mispronunciations and save it in the filesystem under for later use. Custom decoding graphs can be referenced by their name by various methods in the framework.
      Parameters:
      phrases - an array of String objects that specify phrases recognizer should listen for. These sentences are used to create an ngram language model, from which decoding graph is created. Text in sentences should be normalized (e.g. numbers and dates should be represented by words, so 'two hundred dollars' not $200)
      recognizer - KASRRecognizer object that will be used to perform recognition with this decoding graph. Note that decoding graph is persisted in the filesystem and can be reused at the later time with a different KASRRecognizer object as long as such recognizer uses the same ASR bundle as the KASRRecognizer object used to create the decoding graph.
      alternativePronunciations - An array of KASRAlternativePronunciation objects specifying alternative pronunciation for the words, and (optional) tags that can be used to identify those pronunciations. If recognized, these words will be reported in partial/final result with #tag tag appended to the word.
      task - one of KASRDecodingGraph.KASRSpeakingTask specifying a type of interaction.
      spokenNoiseProbability - a value between 0 and 1 that determines how likely <SPOKEN_NOISE> background word will be. Default value is 0.5. Setting this value to 0 means that <SPOKEN_NOISE> will practically not appear in the result regardless of what was said. Setting it to 1.0 means that even slightly mispronounced words might be mapped to <SPOKEN_NOISE>.
      dgName - a name of the custom decoding graph. All graph resources will be stored in a directory named DECODING_GRAPH_NAME-ASR_BUNDLE_NAME. NOTE: decoding graph name (dgName input parameter) cannot contain - characters. We recommend it only contains alphanumeric characters.
      Returns:
      True on success, false otherwise.
    • createContextualDecodingGraphFromPhrases

      public static boolean createContextualDecodingGraphFromPhrases(ArrayList<ArrayList<String>> contextualPhrases, KASRRecognizer recognizer, ArrayList<KASRAlternativePronunciation> alternativePronunciations, KASRDecodingGraph.KASRSpeakingTask task, float spokenNoiseProbability, String dgName)
      Create custom decoding graph from an array of sentences/phrases, for a specific task, using provided array of word mispronunciations and save it in the filesystem under for later use. Custom decoding graphs can be referenced by their name by various methods in the framework.
      Parameters:
      contextualPhrases - an ArrayList of ArrayList of String objects that specify per-context phrases recognizer should listen for. These sentences are used to create an ngram language model, from which decoding graph is created. Text in sentences should be normalized (e.g. numbers and dates should be represented by words, so 'two hundred dollars' not $200)
      recognizer - KASRRecognizer object that will be used to perform recognition with this decoding graph. Note that decoding graph is persisted in the filesystem and can be reused at the later time with a different KASRRecognizer object as long as such recognizer uses the same ASR bundle as the KASRRecognizer object used to create the decoding graph.
      alternativePronunciations - An array of KASRAlternativePronunciation objects specifying alternative pronunciation for the words, and (optional) tags that can be used to identify those pronunciations. If recognized, these words will be reported in partial/final result with #tag tag appended to the word.
      task - one of KASRDecodingGraph.KASRSpeakingTask specifying a type of interaction.
      spokenNoiseProbability - a value between 0 and 1 that determines how likely <SPOKEN_NOISE> background word will be. Default value is 0.5. Setting this value to 0 means that <SPOKEN_NOISE> will practically not appear in the result regardless of what was said. Setting it to 1.0 means that even slightly mispronounced words might be mapped to <SPOKEN_NOISE>
      dgName - a name of the custom decoding graph. All graph resources will be stored in a directory named DECODING_GRAPH_NAME-ASR_BUNDLE_NAME in context.getApplicationInfo().dataDir + dgName + "-" + asrBundleName
      Returns:
      True on success, false otherwise.
    • createDecodingGraphFromPhrasesWithTriggerPhrase

      public static boolean createDecodingGraphFromPhrasesWithTriggerPhrase(String[] phrases, String triggerPhrase, KASRRecognizer recognizer, String dgName)
      Create custom decoding graph from an array of sentences/phrases, using specified triggerPhase, and save it in the filesystem under for later use. Custom decoding graphs can be referenced by their name by various methods in the framework. When using decoding graphs created with the trigger phrase support, upon calling StartListening method the SDK will listen continuously until it hears the trigger phrase; only then will partial results start occurring.
      Parameters:
      phrases - an array of String objects that specify sentences/phrases recognizer should listen for. These phrases are used to create an ngram language model, from which decoding graph is created. Text in phrases should be normalized (e.g. numbers and dates should be represented by words, so 'two hundred dollars' not $200)
      triggerPhrase - a String representing a trigger phrase used to initiate recognition when using this decoding graph, for example "Hey computer". When using decoding graph with trigger phrase, recognizer will continuously listen until it hears the trigger phrase. No partial callback results will be provided until trigger phrase is recognized.
      recognizer - KASRRecognizer object that will be used to perform recognition with this decoding graph. Note that decoding graph is persisted in the filesystem and can be reused at the later time with a different KASRRecognizer object as long as such recognizer uses the same ASR bundle as the KASRRecognizer object used to create the decoding graph.
      dgName - a name of the custom decoding graph. All graph resources will be stored in a directy named DECODING_GRAPH_NAME-ASR_BUNDLE_NAME in context.getApplicationInfo().dataDir + dgName + "-" + asrBundleName
      Returns:
      True on success, false otherwise.
    • decodingGraphWithNameExists

      public static boolean decodingGraphWithNameExists(String dgName, KASRRecognizer recognizer)
      Returns true if valid custom decoding graph with the given name exists in the filesystem
      Parameters:
      dgName - name of the custom decoding graph
      recognizer - KASRRecognizer object equivalent to the KASRRecognizer object that was used to create the decoding graph.
      Returns:
      True if decoding graph with such name exists, false otherwise. This method will also check for existence of all the necessary files in the decoding graph directory.
    • decodingGraphExistsAtPath

      public static boolean decodingGraphExistsAtPath(String dgPath)
      Returns TRUE if a valid decoding graph exists at the given absolute filepath.
      Parameters:
      dgPath - absolute path to the decoding graph directory.
      Returns:
      true if decoding graph with such name exists, false otherwise. This method will also check for existence of all the necessary files in the decoding graph directory.
    • getDecodingGraphCreationDate

      public static Date getDecodingGraphCreationDate(String dgName, KASRRecognizer recognizer)
      Returns date when custom decoding graph was created.
      Parameters:
      dgName - name of the decodingGraph
      recognizer - KIOSRecognizer object equivalent to the KIOSRecognizer object that was used to create the decoding graph (initialized with the same ASR Bundle).
      Returns:
      date when decoding graph was created and saved in the filesystem. null if not available.
    • isValidPronunciation

      public static boolean isValidPronunciation(String pronunciation, KASRRecognizer recognizer)
      Verify if pronunciation specified in the input string is composed of valid phones that are supported for the given recognizer. Returns true if pronunciation is valid, false otherwise.
      Parameters:
      pronunciation - string that represents pronunciation of a word. For example @"k ae t"
      recognizer - KIOSRecognizer object equivalent to the KIOSRecognizer object that was used to create the decoding graph.
      Returns:
      True if pronunciation is valid, false otherwise.
    • createDecodingGraphFromSentences

      @Deprecated public static boolean createDecodingGraphFromSentences(String[] sentences, KASRRecognizer recognizer, String dgName)
      Create custom decoding graph from an array of sentences/phrases and save it in the filesystem under for later use. Custom decoding graphs can be referenced by their name by various methods in the framework.
      Parameters:
      sentences - an array of String objects that specify sentences/phrases recognizer should listen for. These sentences are used to create an ngram language model, from which decoding graph is created. Text in sentences should be normalized (e.g. numbers and dates should be represented by words, so 'two hundred dollars' not $200)
      recognizer - KASRRecognizer object that will be used to perform recognition with this decoding graph. Note that decoding graph is persisted in the filesystem and can be resued at the later time with a different KASRRecognizer object as long as such recognizer uses the same ASR bundle as the KASRRecognizer object used to create the decoding graph.
      dgName - a name of the custom decoding graph. All graph resources will be stored in a directy named DECODING_GRAPH_NAME-ASR_BUNDLE_NAME in context.getApplicationInfo().dataDir + " + dgName + "-" + asrBundleName
      Returns:
      True on success, false otherwise
    • createDecodingGraphFromSentencesWithTriggerPhrase

      @Deprecated public static boolean createDecodingGraphFromSentencesWithTriggerPhrase(String[] sentences, String triggerPhrase, KASRRecognizer recognizer, String dgName)
      Create custom decoding graph from an array of sentences/phrases, using specified triggerPhase, and save it in the filesystem under for later use. Custom decoding graphs can be referenced by their name by various methods in the framework. When using decoding graphs created with the trigger phrase support, upon calling StartListening method the SDK will listen continuously until it hears the trigger phrase; only then will partial results start occurring.
      Parameters:
      sentences - an array of String objects that specify sentences/phrases recognizer should listen for. These sentences are used to create an ngram language model, from which decoding graph is created. Text in sentences should be normalized (e.g. numbers and dates should be represented by words, so 'two hundred dollars' not $200)
      triggerPhrase - a String representing a trigger phrase used to initiate recognition when using this decoding graph, for example "Hey computer". When using decoding graph with trigger phrase, recognizer will continuously listen until it hears the trigger phrase. No partial callback results will be provided until trigger phrase is recognized.
      recognizer - KASRRecognizer object that will be used to perform recognition with this decoding graph. Note that decoding graph is persisted in the filesystem and can be reused at the later time with a different KASRRecognizer object as long as such recognizer uses the same ASR bundle as the KASRRecognizer object used to create the decoding graph.
      dgName - a name of the custom decoding graph. All graph resources will be stored in a directy named DECODING_GRAPH_NAME-ASR_BUNDLE_NAME in context.getApplicationInfo().dataDir + dgName + "-" + asrBundleName
      Returns:
      True on success, false otherwise.