Skip to content

Speech to speech: Mute functionality#5688

Open
pranavjoshi001 wants to merge 9 commits intomicrosoft:mainfrom
pranavjoshi001:feature/s2s-mute
Open

Speech to speech: Mute functionality#5688
pranavjoshi001 wants to merge 9 commits intomicrosoft:mainfrom
pranavjoshi001:feature/s2s-mute

Conversation

@pranavjoshi001
Copy link
Contributor

@pranavjoshi001 pranavjoshi001 commented Feb 5, 2026

Changelog Entry

  • Added mute/unmute functionality for speech-to-speech with silent chunks to keep server connection alive, in PR #5688, by @pranavjoshi

Description

This PR adds mute/unmute functionality for the Speech-to-Speech (S2S) feature as core API only, without UI changes. When muted, the microphone is turned off (browser indicator disappears) but silent audio chunks continue to be sent to keep the server connection alive. This prevents connection timeouts while allowing consumers to implement their own mute UI.

Design

The mute functionality works at multiple levels:

  1. AudioWorklet Level:

    • Added MUTE and UNMUTE commands
    • When muted, the processor generates silent (all zeros) Int16 audio chunks instead of real audio data
    • This keeps chunks flowing at the same interval to maintain the server connection
  2. useRecorder Hook (useRecorder.ts):

    • Added mute() function that:
      • Sends MUTE command to the worklet
      • Disconnects the source node from the audio graph
      • Stops all MediaStream tracks (turns off browser mic indicator)
    • Returns an unmute() function that:
      • Sends UNMUTE command to the worklet
      • Re-acquires the microphone via getUserMedia
      • Reconnects the source node to the audio graph
  3. VoiceRecorderBridge (VoiceRecorderBridge.tsx):

    • Wires the mute function to the voice state machine
    • When voice state transitions to muted, calls mute() and stores the unmute function
    • When voice state transitions back to listening, calls the stored unmute function
  4. Redux Actions & Hooks:

    • Added muteVoiceRecording and unmuteVoiceRecording Redux actions
    • Exposed useMuteVoice and useUnmuteVoice hooks for consumers

Specific Changes

  • Added MUTE and UNMUTE command handling in AudioWorklet processor to generate silent chunks when muted
  • Added mute function to useRecorder.ts hook that disconnects audio and stops MediaStream while continuing to send silent chunks
  • Updated VoiceRecorderBridge.tsx to handle mute/unmute based on voice state changes
  • Added muteVoiceRecording.ts and unmuteVoiceRecording.ts Redux actions
  • Updated voiceActivity.ts reducer to handle VOICE_MUTE_RECORDING and VOICE_UNMUTE_RECORDING actions
  • Added useMuteVoice.ts and useUnmuteVoice.ts hooks and exported them from API, component, and bundle packages
  • Added unit tests for mute functionality in useRecorder.spec.tsx
  • I have added tests and executed them locally
  • I have updated CHANGELOG.md
  • I have updated documentation

Review Checklist

This section is for contributors to review your work.

  • Accessibility reviewed (tab order, content readability, alt text, color contrast)
  • Browser and platform compatibilities reviewed
  • CSS styles reviewed (minimal rules, no z-index)
  • Documents reviewed (docs, samples, live demo)
  • Internationalization reviewed (strings, unit formatting)
  • package.json and package-lock.json reviewed
  • Security reviewed (no data URIs, check for nonce leak)
  • Tests reviewed (coverage, legitimacy)

* The session remains active and can be unmuted to resume listening.
*/
export default function useMuteVoice(): () => void {
return useWebChatAPIContext().muteVoice;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to extend the main context instead of relying on a separate provider?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

following the same pattern which we did for "useStartVoice" and "useStopVoice"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think EugeneO's concern is valid. We are slowly dissecting the APIComposer, CoreComposer into smaller XXXComposer to improve performance and enabling plug-and-play.

Why performance... if one thing in the composer context changed, it will propagate to every component that call useContext(context) to subscribe.

Says, if styleOptions changed, the APIComposer context would change, and every component that relies on useMuteVoice() will be re-rendered even if they did not call useStyleOptions.

Can you move it to <SpeechToSpeechComposer>?

* Hook to unmute voice mode (resumes microphone input after muting).
* This reactivates speech-to-speech listening.
*/
export default function useUnmuteVoice(): () => void {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered a single hook? useRecordVoice/useControlVoiceRecording with passes the current state of the recording voice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I created single hook to return voice with state but then William suggested to keep individual hook for each export and same followed for start voice, stop voice and voice state

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deeper reason... one-liner: useStartVoice will do some expensive operation that takes time to complete. Says, new AudioContext() is a UI blocker and it takes time to complete.

Let's say, in parallel universe, we have useVoice(): [boolean, (enabled: boolean) => void], what would happen:

  • Call useVoice()[1](true) to start the microphone
  • Assume it takes 1 second to turn on the microphone
  • Within 1 second, we call useVoice()[0], it will return false
  • 1 second later, useVoice()[0] will return true

The behavior will become quite undeterministic... says, we have a microphone button that essentially a push button like <input type="checkbox"> and is backed by this useVoice() boolean.

When the user click on the push button, it will not be pushed/checked because the getter is still returning false. 1 second later, it is pushed/checked. The user will be confused, because it breaks WYSIWYG.

This is the main reason why useStartVoice/useStopVoice is preferred over useVoice... not to mention voice has more state than true/false. Primarily, the setter takes time to complete the operation, the getter and setter will be momentarily out-of-sync, this is the main reason we prefer callback functions than getter/setter.


In this case, if the "mute" is instant/synchronous, which I believe so, we should do it the normal way, i.e. useMicrophoneMuted(): readonly [boolean, (value: Dispatch<SetStateAction<boolean>>) => void].

);

const { record } = useRecorder(handleAudioChunk);
const { record, mute } = useRecorder(handleAudioChunk);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort.

Suggested change
const { record, mute } = useRecorder(handleAudioChunk);
const { mute, record } = useRecorder(handleAudioChunk);

const { record, mute } = useRecorder(handleAudioChunk);

useEffect(() => {
if (muted) {
Copy link
Contributor

@compulim compulim Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, how about unmute()?

Can we put it here so things is symmetric and easier to debug?


const muteRecording = useCallback(() => {
// Stop MediaStream (mic indicator OFF) and disconnect source
stopMediaStream();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disconnect source is a good idea.

Did you test it on iPhone, Android, and Firefox? Few aspects:

  • How long does it takes to acquire/connect again?
    • If it's > 500ms, that means we will lost what the user says but the user never know about it. It may be a better idea to keep the microphone continue to be connected.
  • Will the browser prompt the privacy dialog again? It will be undesirable

After muted for 1 minute, should we stop recording?

Comment on lines +174 to +181
Date,
acquireAndConnectMediaStream,
audioCtxRef,
chunkIntervalMs,
initAudio,
onAudioChunk,
sampleRate,
workletRef
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort.

// Should stop media stream tracks (mic indicator OFF)
expect(mockTrack.stop).toHaveBeenCalledTimes(1);
// Should disconnect source node
expect(mockSourceNode.disconnect).toHaveBeenCalledTimes(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should also test if the stream is zeroed out.

await waitFor(() => {
expect(mockMediaDevices.getUserMedia).toHaveBeenCalledTimes(1);
});
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should test it no longer send out zeroes.

this.bufferSize = options.processorOptions.bufferSize;
this.muted = false;
this.recording = false;
this.silentFrame = new Float32Array(RENDER_QUANTUM); // Pre-allocated zeros
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider move this.silentFrame to module-level constant const SILENT_FRAME = new Float32Array(RENDER_QUANTUM).

this.buffer.push(...inputs[0][0]);
if (this.recording) {
// Use real audio when not muted, otherwise silenced chunk to keep connection alive (all zeros).
const audioData = !this.muted && inputs[0] && inputs[0].length ? inputs[0][0] : this.silentFrame;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could simplify this condition.

Copy link
Contributor

@compulim compulim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add end-to-end tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants