Skip to content

Commit 75cd71d

Browse files
asumagicTParcollet
andauthored
Streaming ASR guide/tutorial (#2700)
* Add full streaming conformer tutorial/guide * Add notice on image assets * Add RNN-T transcription network to conformer model overview * Add summary of tutorial * Reword loss function paragraph * Rewrite positional encoding part after reading more on the subject --------- Co-authored-by: Parcollet Titouan <parcollet.titouan@gmail.com>
1 parent 05786f0 commit 75cd71d

11 files changed

Lines changed: 1651 additions & 0 deletions

docs/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,8 @@ The `docs/tutorials` directory exclusively contains tutorials in Jupyter Noteboo
3636
- It's OK if the user has to run the notebook to get some of the heavier outputs.
3737
- Preferably use Jupyter Notebook for final editing of your notebook.
3838
- Jupyter Notebook tends to have somewhat sane `.ipynb` output. This avoids Git diffs from being excessively large.
39+
- **Images can be put in the `docs/tutorials/assets` directory,** rather than embedded as base64. You can then refer to them in Markdown like `![alt text](../assets/myimage.png)`. These will work correctly when imported on Colab.
40+
- Pick descriptive names.
3941

4042
#### Integration in documentation
4143

149 KB
Loading
128 KB
Loading
54 KB
Loading
189 KB
Loading
109 KB
Loading
15.9 KB
Loading

docs/tutorials/assets/dcc-dcc.png

18.9 KB
Loading
28.1 KB
Loading

docs/tutorials/nn.rst

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ Neural Architectures
1111
nn/using-wav2vec-2.0-hubert-wavlm-and-whisper-from-huggingface-with-speechbrain.ipynb
1212
nn/complex-and-quaternion-neural-networks.ipynb
1313
nn/recurrent-neural-networks-and-speechbrain.ipynb
14+
nn/conformer-streaming-asr.ipynb
1415

1516

1617
.. rubric:: `🔗 Fine-tuning or using Whisper, wav2vec2, HuBERT and others with SpeechBrain and HuggingFace <nn/using-wav2vec-2.0-hubert-wavlm-and-whisper-from-huggingface-with-speechbrain.html>`_
@@ -67,3 +68,23 @@ Linear, Convolution, Recurrent and Normalisation.
6768
Recurrent Neural Networks (RNNs) offer a natural way to process sequences.
6869
This tutorial demonstrates how to use the SpeechBrain implementations of RNNs including LSTMs, GRU, RNN and LiGRU a specific recurrent cell designed
6970
for speech-related tasks. RNNs are at the core of many sequence to sequence models.
71+
72+
73+
.. rubric:: `🔗 Streaming Speech Recognition with Conformers <nn/conformer-streaming-asr.html>`_
74+
:heading-level: 2
75+
76+
.. list-table::
77+
:widths: 20 20 20 20 20
78+
:header-rows: 0
79+
80+
* - de Langen S.
81+
- Sep. 2024
82+
- Difficulty: medium
83+
- Time: 60min+
84+
- `🔗 Google Colab <https://colab.research.google.com/github/speechbrain/speechbrain/blob/develop/docs/tutorials/nn/conformer-streaming-asr.ipynb>`__
85+
86+
87+
Automatic Speech Recognition (ASR) models are often only designed to transcribe an entire large chunk of audio and are unsuitable for usecases like live stream transcription, which requires low-latency, long-form transcription.
88+
89+
This tutorial introduces the Dynamic Chunk Training approach and architectural changes you can apply to make the Conformer model streamable. It introduces the tooling for training and inference that SpeechBrain can provide for you.
90+
This might be a good starting point if you're interested in training and understanding your own streaming models, or even if you want to explore improved streaming architectures.

0 commit comments

Comments
 (0)