Speech to speech: Mute functionality#5688
Speech to speech: Mute functionality#5688pranavjoshi001 wants to merge 9 commits intomicrosoft:mainfrom
Conversation
…BotFramework-WebChat into feature/s2s-mute
| * The session remains active and can be unmuted to resume listening. | ||
| */ | ||
| export default function useMuteVoice(): () => void { | ||
| return useWebChatAPIContext().muteVoice; |
There was a problem hiding this comment.
Why we need to extend the main context instead of relying on a separate provider?
There was a problem hiding this comment.
following the same pattern which we did for "useStartVoice" and "useStopVoice"
There was a problem hiding this comment.
I think EugeneO's concern is valid. We are slowly dissecting the APIComposer, CoreComposer into smaller XXXComposer to improve performance and enabling plug-and-play.
Why performance... if one thing in the composer context changed, it will propagate to every component that call useContext(context) to subscribe.
Says, if styleOptions changed, the APIComposer context would change, and every component that relies on useMuteVoice() will be re-rendered even if they did not call useStyleOptions.
Can you move it to <SpeechToSpeechComposer>?
| * Hook to unmute voice mode (resumes microphone input after muting). | ||
| * This reactivates speech-to-speech listening. | ||
| */ | ||
| export default function useUnmuteVoice(): () => void { |
There was a problem hiding this comment.
Have you considered a single hook? useRecordVoice/useControlVoiceRecording with passes the current state of the recording voice.
There was a problem hiding this comment.
Initially I created single hook to return voice with state but then William suggested to keep individual hook for each export and same followed for start voice, stop voice and voice state
There was a problem hiding this comment.
Deeper reason... one-liner: useStartVoice will do some expensive operation that takes time to complete. Says, new AudioContext() is a UI blocker and it takes time to complete.
Let's say, in parallel universe, we have useVoice(): [boolean, (enabled: boolean) => void], what would happen:
- Call
useVoice()[1](true)to start the microphone - Assume it takes 1 second to turn on the microphone
- Within 1 second, we call
useVoice()[0], it will returnfalse - 1 second later,
useVoice()[0]will returntrue
The behavior will become quite undeterministic... says, we have a microphone button that essentially a push button like <input type="checkbox"> and is backed by this useVoice() boolean.
When the user click on the push button, it will not be pushed/checked because the getter is still returning false. 1 second later, it is pushed/checked. The user will be confused, because it breaks WYSIWYG.
This is the main reason why useStartVoice/useStopVoice is preferred over useVoice... not to mention voice has more state than true/false. Primarily, the setter takes time to complete the operation, the getter and setter will be momentarily out-of-sync, this is the main reason we prefer callback functions than getter/setter.
In this case, if the "mute" is instant/synchronous, which I believe so, we should do it the normal way, i.e. useMicrophoneMuted(): readonly [boolean, (value: Dispatch<SetStateAction<boolean>>) => void].
| ); | ||
|
|
||
| const { record } = useRecorder(handleAudioChunk); | ||
| const { record, mute } = useRecorder(handleAudioChunk); |
There was a problem hiding this comment.
Sort.
| const { record, mute } = useRecorder(handleAudioChunk); | |
| const { mute, record } = useRecorder(handleAudioChunk); |
| const { record, mute } = useRecorder(handleAudioChunk); | ||
|
|
||
| useEffect(() => { | ||
| if (muted) { |
There was a problem hiding this comment.
Curious, how about unmute()?
Can we put it here so things is symmetric and easier to debug?
|
|
||
| const muteRecording = useCallback(() => { | ||
| // Stop MediaStream (mic indicator OFF) and disconnect source | ||
| stopMediaStream(); |
There was a problem hiding this comment.
Disconnect source is a good idea.
Did you test it on iPhone, Android, and Firefox? Few aspects:
- How long does it takes to acquire/connect again?
- If it's > 500ms, that means we will lost what the user says but the user never know about it. It may be a better idea to keep the microphone continue to be connected.
- Will the browser prompt the privacy dialog again? It will be undesirable
After muted for 1 minute, should we stop recording?
| Date, | ||
| acquireAndConnectMediaStream, | ||
| audioCtxRef, | ||
| chunkIntervalMs, | ||
| initAudio, | ||
| onAudioChunk, | ||
| sampleRate, | ||
| workletRef |
| // Should stop media stream tracks (mic indicator OFF) | ||
| expect(mockTrack.stop).toHaveBeenCalledTimes(1); | ||
| // Should disconnect source node | ||
| expect(mockSourceNode.disconnect).toHaveBeenCalledTimes(1); |
There was a problem hiding this comment.
It should also test if the stream is zeroed out.
| await waitFor(() => { | ||
| expect(mockMediaDevices.getUserMedia).toHaveBeenCalledTimes(1); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
It should test it no longer send out zeroes.
| this.bufferSize = options.processorOptions.bufferSize; | ||
| this.muted = false; | ||
| this.recording = false; | ||
| this.silentFrame = new Float32Array(RENDER_QUANTUM); // Pre-allocated zeros |
There was a problem hiding this comment.
Consider move this.silentFrame to module-level constant const SILENT_FRAME = new Float32Array(RENDER_QUANTUM).
| this.buffer.push(...inputs[0][0]); | ||
| if (this.recording) { | ||
| // Use real audio when not muted, otherwise silenced chunk to keep connection alive (all zeros). | ||
| const audioData = !this.muted && inputs[0] && inputs[0].length ? inputs[0][0] : this.silentFrame; |
There was a problem hiding this comment.
Could simplify this condition.
Changelog Entry
Description
This PR adds mute/unmute functionality for the Speech-to-Speech (S2S) feature as core API only, without UI changes. When muted, the microphone is turned off (browser indicator disappears) but silent audio chunks continue to be sent to keep the server connection alive. This prevents connection timeouts while allowing consumers to implement their own mute UI.
Design
The mute functionality works at multiple levels:
AudioWorklet Level:
MUTEandUNMUTEcommandsuseRecorder Hook (
useRecorder.ts):mute()function that:MUTEcommand to the workletunmute()function that:UNMUTEcommand to the workletgetUserMediaVoiceRecorderBridge (
VoiceRecorderBridge.tsx):mutefunction to the voice state machinemuted, callsmute()and stores theunmutefunctionlistening, calls the storedunmutefunctionRedux Actions & Hooks:
muteVoiceRecordingandunmuteVoiceRecordingRedux actionsuseMuteVoiceanduseUnmuteVoicehooks for consumersSpecific Changes
MUTEandUNMUTEcommand handling in AudioWorklet processor to generate silent chunks when mutedmutefunction touseRecorder.tshook that disconnects audio and stops MediaStream while continuing to send silent chunksVoiceRecorderBridge.tsxto handle mute/unmute based on voice state changesmuteVoiceRecording.tsandunmuteVoiceRecording.tsRedux actionsvoiceActivity.tsreducer to handleVOICE_MUTE_RECORDINGandVOICE_UNMUTE_RECORDINGactionsuseMuteVoice.tsanduseUnmuteVoice.tshooks and exported them from API, component, and bundle packagesuseRecorder.spec.tsxCHANGELOG.mdReview Checklist
z-index)package.jsonandpackage-lock.jsonreviewed