UniSpeech - Large Scale Self-Supervised Learning for Speech
-
Updated
Apr 5, 2024 - Python
UniSpeech - Large Scale Self-Supervised Learning for Speech
The dataset of Speech Recognition
This is a repository of neural full-rank spatial covariance analysis with speaker activity (neural FCASA).
Dự án công cụ chuyển đổi giọng nói dành cho người Việt
Repository for "LLM-based speaker diarization correction: A generalizable approach" paper
Template Project For iOS Apps using .onnx Speech Models for Speech Diarization
A demo to show Speech Diarization (seperating audio of different speaker) and converting them to text using Google Cloud Speech API.
Vid2Manga is an innovative application designed to bridge the gap between video content and manga-style storytelling. By leveraging advanced video processing and speech-to-text technologies, Vid2Manga extracts audio and visual components from video files to create a foundation for manga generation.
Speech transcription and speech diarization
A powerful, local speech-to-text transcription system that combines OpenAI's Whisper for accurate transcription with pyannote.audio for speaker diarization (identifying who spoke when). Perfect for meetings, interviews, podcasts, and any audio/video content that needs accurate transcription with speaker identification.
Add a description, image, and links to the speech-diarization topic page so that developers can more easily learn about it.
To associate your repository with the speech-diarization topic, visit your repo's landing page and select "manage topics."