An AWS Streaming Transcribe implementation for Unity

TLDR; Find it on github.

I’ve been working on a speech-to-text package for Unity projects. It has modules for different platforms (Android, Windows, etc.) and speech-to-text services (Google, AWS). The voice recognition manager chooses the appropriate module based on the environment. For example, it will first try the native voice recognition on Windows, then automatically switch over to a cloud solution if the native solution isn’t available. The manager provides standardized events for receiving transcription messages so users don’t have to deal with the differences in speech recognition implementations. 

While implementing Amazon Transcribe I found that the AWS .NET SDK does not support streaming (real-time) transcription. The options available were HTTP/2 and WebSocket connections. Using NativeWebSocket I was able to get a streaming solution working. You can find my code here.

Creating this module meant learning more about encoding for network messages. Dealing with CRCs, big-endians vs. little-endians, encoding individual bits as headers. If you end up using it, please let me know if you see improvements that can be made!

Next
Next

AWS .NET SDK in Unity