Skip to content

Commit 8ed1d36

Browse files
committed
Manually replace colab cross-references to new doc URL
1 parent 9f5e1d2 commit 8ed1d36

16 files changed

Lines changed: 80 additions & 78 deletions

docs/tutorials/advanced/data-loading-for-big-datasets-and-shared-filesystems.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@
3030
"In many compute clusters, the main data storage is a network filesystem (NFS), for example [Lustre](https://en.wikipedia.org/wiki/Lustre_(file_system)). The NFS can serve many users concurrently and provide high data throughput from a single file. However, opening or listing many different files is slow - and doing so may slow the whole system down for everyone, not just the offending user. Speech datasets usually consist of very many small recordings. Reading every file again and again is exactly the kind of data IO that can slow down an NFS.\n",
3131
"\n",
3232
"One solution is to copy the dataset into the **local SSD** of the computing node. This can be done relatively efficiently by compressing the dataset into a single file (e.g. `dataset.tar.gz`), copying it into the local node, and finally, uncompressing (untarring) the file. Reading files from the local SSD is very efficient and does not harm the performance of the shared filesystem.\n",
33-
"The standard SpeechBrain data IO works well in this case, see [this tutorial](https://colab.research.google.com/drive/1AiVJZhZKwEI4nFGANKXEe-ffZFfvXKwH?usp=sharing).\n",
33+
"The standard SpeechBrain data IO works well in this case, see [this tutorial](https://speechbrain.readthedocs.io/en/latest/tutorials/basics/data-loading-pipeline.html).\n",
3434
"However, there might be huge datasets that exceed the size of your local SSD. \n",
3535
"\n",
3636
"A possible workaround is to keep the data in the shared filesystem and bundle the small recordings into larger archives, which are usually called **shards**. Loading data off shards avoids opening too many files, so it is fast.\n",

docs/tutorials/advanced/dynamic-batching.ipynb

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@
4141
"id": "tILFmgtVDaJK"
4242
},
4343
"source": [
44-
"To illustrate this point, let's look, for example, at **MiniLibriSpeech** which is a subset of LibriSpeech. Let's download this dataset and other tools from the [data-io tutorial](https://colab.research.google.com/drive/1AiVJZhZKwEI4nFGANKXEe-ffZFfvXKwH?usp=sharing) which uses this same data."
44+
"To illustrate this point, let's look, for example, at **MiniLibriSpeech** which is a subset of LibriSpeech. Let's download this dataset and other tools from the [data-io tutorial](https://speechbrain.readthedocs.io/en/latest/tutorials/basics/data-loading-pipeline.html) which uses this same data."
4545
]
4646
},
4747
{
@@ -233,7 +233,7 @@
233233
"outputs": [],
234234
"source": [
235235
"# prepare LibriSpeech dataset using pre-made, downloaded parse_data.py script from\n",
236-
"# the data-io tutorial available here: https://colab.research.google.com/drive/1AiVJZhZKwEI4nFGANKXEe-ffZFfvXKwH?usp=sharing\n",
236+
"# the data-io tutorial available here: https://speechbrain.readthedocs.io/en/latest/tutorials/basics/data-loading-pipeline.html\n",
237237
"from parse_data import parse_to_json\n",
238238
"parse_to_json(\"/content/LibriSpeech/train-clean-5\")\n",
239239
"# this produced a manifest data.json file:"
@@ -308,7 +308,7 @@
308308
"source": [
309309
"We can use this `.json` manifest file to instantiate a SpeechBrain `DynamicItemDataset` object.\n",
310310
"\n",
311-
"If this is not clear refer to the [data-io tutorial](https://colab.research.google.com/drive/1AiVJZhZKwEI4nFGANKXEe-ffZFfvXKwH?usp=sharing).\n",
311+
"If this is not clear refer to the [data-io tutorial](https://speechbrain.readthedocs.io/en/latest/tutorials/basics/data-loading-pipeline.html).\n",
312312
"\n",
313313
"We also define a `data-io pipeline` to read the audio file."
314314
]
@@ -695,7 +695,7 @@
695695
"id": "_tM5iEaYolWS"
696696
},
697697
"source": [
698-
"**NOTE:** you should be highly familiar with SpeechBrain [data-io](https://colab.research.google.com/drive/1AiVJZhZKwEI4nFGANKXEe-ffZFfvXKwH?usp=sharing) to follow this tutorial."
698+
"**NOTE:** you should be highly familiar with SpeechBrain [data-io](https://speechbrain.readthedocs.io/en/latest/tutorials/basics/data-loading-pipeline.html) to follow this tutorial."
699699
]
700700
},
701701
{
@@ -2042,7 +2042,7 @@
20422042
"When working on an HPC cluster it is crucial to copy the dataset to the SSD of the local computing node. This step significantly improves the data-io performance and avoids slowing down a shared filesystem. In some cases, the dataset could be too big that might not fit into the SSD. This scenario is getting more common these days with the adoption of larger and larger datasets.\n",
20432043
"\n",
20442044
"SpeechBrain supports [Webdataset](https://github.com/webdataset/webdataset), which allows users to efficiently read datasets from the shared file system.\n",
2045-
"The proposed Webdataset-based solution also supports dynamic batching. For more information, please take a look at [this tutorial](https://colab.research.google.com/drive/1s171JSA53_ktvc1zQp6uMcM0TChtCcZ9?usp=sharing)."
2045+
"The proposed Webdataset-based solution also supports dynamic batching. For more information, please take a look at [this tutorial](https://speechbrain.readthedocs.io/en/latest/tutorials/advanced/data-loading-for-big-datasets-and-shared-filesystems.html)."
20462046
]
20472047
},
20482048
{

docs/tutorials/advanced/federated-speech-model-training-via-speechbrain-and-flower.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -402,8 +402,8 @@
402402
"## Integration details — coupling SpeechBrain to Flower\n",
403403
"Let's first see some details of the integration process to better understand the code. There are only four main steps required:\n",
404404
"\n",
405-
"1. Define a Brain class ([SpeechBrain Brain Class tutorial](https://colab.research.google.com/drive/1fdqTk4CTXNcrcSVFvaOKzRfLmj4fJfwa?usp=sharing)).\n",
406-
"2. Initialise the Brain class and dataset ([SpeechBrain dataio tutorial](https://colab.research.google.com/drive/1AiVJZhZKwEI4nFGANKXEe-ffZFfvXKwH?usp=sharing)).\n",
405+
"1. Define a Brain class ([SpeechBrain Brain Class tutorial](https://speechbrain.readthedocs.io/en/latest/tutorials/basics/brain-class.html)).\n",
406+
"2. Initialise the Brain class and dataset ([SpeechBrain dataio tutorial](https://speechbrain.readthedocs.io/en/latest/tutorials/basics/data-loading-pipeline.html)).\n",
407407
"3. Define a SpeechBrain Client ([Flower client documentation](https://flower.dev/docs/quickstart_pytorch.html#flower-client)).\n",
408408
"4. Define a Flower Strategy on the server side ([Flower strategies](https://flower.dev/docs/strategies.html#strategies))."
409409
]

docs/tutorials/advanced/hyperparameter-optimization.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -494,7 +494,7 @@
494494
"### Multiple GPUs\n",
495495
"Since Orion simply wraps the execution of the training script and launches it for each set of hyperparameters using the OS shell, training scripts that support Data-Parallel (DP) or Distributed Data Parallel (DDP) execution can be used with hyperparameter fitting without modification.\n",
496496
"\n",
497-
"For information on how to set up DP/DDP experiments, refer to the [SpeechBrain documentation](https://speechbrain.readthedocs.io/en/latest/multigpu.html#) and the [Multi-GPU Considerations](https://colab.research.google.com/drive/13pBUacPiotw1IvyffvGZ-HrtBr9T6l15?usp=sharing) tutorial.\n",
497+
"For information on how to set up DP/DDP experiments, refer to the [SpeechBrain documentation](https://speechbrain.readthedocs.io/en/latest/multigpu.html#) and the [Multi-GPU Considerations](https://speechbrain.readthedocs.io/en/latest/multigpu.html) tutorial.\n",
498498
"\n",
499499
"### Parallel or Distributed Oríon\n",
500500
"\n",

docs/tutorials/advanced/inferring-on-your-own-speechbrain-models.ipynb

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -28,20 +28,23 @@
2828
"In this tutorial, we will learn the different ways of inferring on a trained model. Please understand that this is not related to loading pretrained models for further training or transfer learning. If interested in these topics, refer to the corresponding [tutorial](https://colab.research.google.com/drive/1LN7R3U3xneDgDRK2gC5MzGkLysCWxuC3?usp=sharing).\n",
2929
"\n",
3030
"## Prerequisites\n",
31-
"- [SpeechBrain Introduction](https://colab.research.google.com/drive/12bg3aUdr9mTfOGqcB5pSMABoIKPgiwcM?usp=sharing)\n",
32-
"- [YAML tutorial](https://colab.research.google.com/drive/1Pg9by4b6-8QD2iC0U7Ic3Vxq4GEwEdDz?usp=sharing)\n",
33-
"- [Brain Class tutorial](https://colab.research.google.com/drive/1fdqTk4CTXNcrcSVFvaOKzRfLmj4fJfwa?usp=sharing)\n",
34-
"- [Pretraining tutorial](https://colab.research.google.com/drive/1LN7R3U3xneDgDRK2gC5MzGkLysCWxuC3?usp=sharing)\n",
31+
"- [SpeechBrain Introduction](https://speechbrain.readthedocs.io/en/latest/tutorials/basics/introduction-to-speechbrain.html)\n",
32+
"- [YAML tutorial](https://speechbrain.readthedocs.io/en/latest/tutorials/basics/hyperpyyaml.html)\n",
33+
"- [Brain Class tutorial](https://speechbrain.readthedocs.io/en/latest/tutorials/basics/brain-class.html)\n",
34+
"- [Pretraining tutorial](https://speechbrain.readthedocs.io/en/latest/tutorials/advanced/pre-trained-models-and-fine-tuning-with-huggingface.html\n",
35+
")\n",
3536
"\n",
3637
"## Context\n",
3738
"\n",
38-
"In this example, we will consider a user that would like to use a custom pretrained speech recognizer **that has been trained by him** to transcribe some audio files. If you are interested in using online-available pretrained models, please refer to the [Pretraining tutorial](https://colab.research.google.com/drive/1LN7R3U3xneDgDRK2gC5MzGkLysCWxuC3?usp=sharing). The following can be extended to any SpeechBrain supported task as we provide an homogeneous way of dealing with all of them.\n",
39+
"In this example, we will consider a user that would like to use a custom pretrained speech recognizer **that has been trained by him** to transcribe some audio files. If you are interested in using online-available pretrained models, please refer to the [Pretraining tutorial](https://speechbrain.readthedocs.io/en/latest/tutorials/advanced/pre-trained-models-and-fine-tuning-with-huggingface.html\n",
40+
"). The following can be extended to any SpeechBrain supported task as we provide an homogeneous way of dealing with all of them.\n",
3941
"\n",
4042
"## Different options available\n",
4143
"\n",
4244
"At this point, three options are available to you:\n",
4345
"1. Define a custom python function in your ASR class (extended from Brain). This introduces strong coupling between the training recipe and your transcripts. It is pretty convenient for prototyping and obtaining simple transcripts on your datasets. However, it is not recommended for deployment.\n",
44-
"2. Use already available Interfaces (such as `EncoderDecoderASR`, introduction in the [pretraining tutorial](https://colab.research.google.com/drive/1LN7R3U3xneDgDRK2gC5MzGkLysCWxuC3?usp=sharing)). This is probably the most elegant and convenient way. However, your model should be compliant with some constraints to fit the proposed interface.\n",
46+
"2. Use already available Interfaces (such as `EncoderDecoderASR`, introduction in the [pretraining tutorial](https://speechbrain.readthedocs.io/en/latest/tutorials/advanced/pre-trained-models-and-fine-tuning-with-huggingface.html\n",
47+
")). This is probably the most elegant and convenient way. However, your model should be compliant with some constraints to fit the proposed interface.\n",
4548
"3. Build your own Interface perfectly fitting to your custom ASR model.\n",
4649
"\n",
4750
"**Important: All these solutions also apply to other tasks (speaker recognition, source separation ...)**\n",

docs/tutorials/advanced/pre-trained-models-and-fine-tuning-with-huggingface.ipynb

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -42,10 +42,10 @@
4242
"2. Use pretrained models as a component of a new pipeline (e.g language models, finetuning, speaker embeddings extraction ...).\n",
4343
"\n",
4444
"## Prerequisites\n",
45-
"- [SpeechBrain Introduction](https://colab.research.google.com/drive/12bg3aUdr9mTfOGqcB5pSMABoIKPgiwcM?usp=sharing)\n",
46-
"- [YAML tutorial](https://colab.research.google.com/drive/1Pg9by4b6-8QD2iC0U7Ic3Vxq4GEwEdDz?usp=sharing)\n",
47-
"- [Brain Class tutorial](https://colab.research.google.com/drive/1fdqTk4CTXNcrcSVFvaOKzRfLmj4fJfwa?usp=sharing)\n",
48-
"- [DataIOBasics](https://colab.research.google.com/drive/1AiVJZhZKwEI4nFGANKXEe-ffZFfvXKwH)\n"
45+
"- [SpeechBrain Introduction](https://speechbrain.readthedocs.io/en/latest/tutorials/basics/introduction-to-speechbrain.html)\n",
46+
"- [YAML tutorial](https://speechbrain.readthedocs.io/en/latest/tutorials/basics/hyperpyyaml.html)\n",
47+
"- [Brain Class tutorial](https://speechbrain.readthedocs.io/en/latest/tutorials/basics/brain-class.html)\n",
48+
"- [DataIOBasics](https://speechbrain.readthedocs.io/en/latest/tutorials/basics/data-loading-pipeline.html)\n"
4949
]
5050
},
5151
{
@@ -1735,7 +1735,7 @@
17351735
"source": [
17361736
"First we must set up the data pipeline for downloaded MiniLibriSpeech data.\n",
17371737
"\n",
1738-
"If you are not familiar with **SpeechBrain dataIO** you may want to take a look at the [tutorial](https://colab.research.google.com/drive/1AiVJZhZKwEI4nFGANKXEe-ffZFfvXKwH)."
1738+
"If you are not familiar with **SpeechBrain dataIO** you may want to take a look at the [tutorial](https://speechbrain.readthedocs.io/en/latest/tutorials/basics/data-loading-pipeline.html)."
17391739
]
17401740
},
17411741
{

docs/tutorials/advanced/text-tokenizer.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -298,7 +298,7 @@
298298
},
299299
"source": [
300300
"## Use SpeechBrain SentencePiece with Pytorch\n",
301-
"We designed our SentencePiece wrapper to be used jointly to our data transform pipeline [(see the tutorial)](https://colab.research.google.com/drive/1AiVJZhZKwEI4nFGANKXEe-ffZFfvXKwH?usp=sharing) and therefore deal with tensors.\n",
301+
"We designed our SentencePiece wrapper to be used jointly to our data transform pipeline [(see the tutorial)](https://speechbrain.readthedocs.io/en/latest/tutorials/basics/data-loading-pipeline.html) and therefore deal with tensors.\n",
302302
"For that purpose, two options are available:\n",
303303
"1. Option 1: Generating token tensors directly from a word tensors + an external dictionary named `int2lab` (which maps your tensors to words).\n",
304304
"1. Option 2: If you use our DynamicDataset, the DynamicItem will automatically generate the token tensors.\n"
@@ -396,7 +396,7 @@
396396
"source": [
397397
"### Example for option 2\n",
398398
"\n",
399-
"**Note:** please first read our dataio [tutorial](https://colab.research.google.com/drive/1AiVJZhZKwEI4nFGANKXEe-ffZFfvXKwH?usp=sharing) to perfectly grasp the next lines.\n",
399+
"**Note:** please first read our dataio [tutorial](https://speechbrain.readthedocs.io/en/latest/tutorials/basics/data-loading-pipeline.html) to perfectly grasp the next lines.\n",
400400
"\n",
401401
"Here, we use a tokenizer to tokenize on-the-fly the text obtained from a .csv file. In the following example, we combined it with the data_io pipeline of SpeechBrain.\n",
402402
"\n",

docs/tutorials/basics/data-loading-pipeline.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1093,7 +1093,7 @@
10931093
"As the default DataLoader the Brain class instantiates, a SpeechBrain custom DataLoader: `speechbrain.dataio.dataloader.SaveableDataLoader`.\n",
10941094
"\n",
10951095
"\n",
1096-
"This DataLoader is identical to the plain one except that it allows for intra-epoch saving. So if for some reason training stops in the middle of an epoch it is possible to resume from exactly that step. See the [Checkpointing Tutorial](https://colab.research.google.com/drive/1VH7U0oP3CZsUNtChJT2ewbV_q1QX8xre?usp=sharing).\n",
1096+
"This DataLoader is identical to the plain one except that it allows for intra-epoch saving. So if for some reason training stops in the middle of an epoch it is possible to resume from exactly that step. See the [Checkpointing Tutorial](https://speechbrain.readthedocs.io/en/latest/tutorials/basics/checkpointing.html).\n",
10971097
"The default `collate_fn` for this DataLoader is `PaddedBatch`.\n"
10981098
]
10991099
},

docs/tutorials/basics/introduction-to-speechbrain.ipynb

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -94,9 +94,7 @@
9494
"There are essentially two ways to install SpeechBrain:\n",
9595
"* **Local installation**: it is suggested if you want to modify the toolkit or train a full speech processing system from scratch.\n",
9696
"\n",
97-
"* **Install via PyPI**: it is suggested when you wanna just use some core functionality of SpeechBrain in your project.\n",
98-
"\n",
99-
"**Note:** SpeechBrain expects a python version >=3.7. However, it also works with the python 3.6 available in Colab.\n"
97+
"* **Install via PyPI**: it is suggested when you wanna just use some core functionality of SpeechBrain in your project.\n"
10098
]
10199
},
102100
{
@@ -239,7 +237,7 @@
239237
"\n",
240238
"The YAML file contains all the information to initialize the classes when loading them. In SpeechBrain we load it with a special function called `load_hyperpyyaml`, which initializes for us all the declared classes. This makes the code extremely **readable** and **compact**.\n",
241239
"\n",
242-
"Our hyperpyyaml is an extension of the standard YAML. For an overview of all the supported functionalities, please take a look at the [YAML tutorial](https://colab.research.google.com/drive/1Pg9by4b6-8QD2iC0U7Ic3Vxq4GEwEdDz?usp=sharing).\n",
240+
"Our hyperpyyaml is an extension of the standard YAML. For an overview of all the supported functionalities, please take a look at the [YAML tutorial](https://speechbrain.readthedocs.io/en/latest/tutorials/basics/hyperpyyaml.html).\n",
243241
"\n",
244242
"Note that all the hyperparameters can be overridden from the command line. For instance, to change the dropout factor:\n",
245243
"\n",
@@ -327,7 +325,7 @@
327325
" yield phn_encoded\n",
328326
"\n",
329327
"```\n",
330-
"Here, we read the phoneme list, separate each entry by space, and convert the list of phonemes to their corresponding indexes (using the label_encoder described [in this tutorial](https://colab.research.google.com/drive/1AiVJZhZKwEI4nFGANKXEe-ffZFfvXKwH?usp=sharing)).\n",
328+
"Here, we read the phoneme list, separate each entry by space, and convert the list of phonemes to their corresponding indexes (using the label_encoder described [in this tutorial](https://speechbrain.readthedocs.io/en/latest/tutorials/basics/data-loading-pipeline.html)).\n",
331329
"\n",
332330
"As you can see, we directly expose in the main script the data reading pipeline because this adds a lot of transparency and flexibility.\n",
333331
"\n",
@@ -386,7 +384,8 @@
386384
" # Evaluation is run separately (now just evaluating on valid data)\n",
387385
" ctc_brain.evaluate(valid_data)\n",
388386
"```\n",
389-
"For a more detailed description, take a look at the [Brain class tutorial here](https://colab.research.google.com/drive/1fdqTk4CTXNcrcSVFvaOKzRfLmj4fJfwa?usp=sharing).\n"
387+
"For a more detailed description, take a look at the [Brain class tutorial here](https://speechbrain.readthedocs.io/en/latest/tutorials/basics/brain-class.html\n",
388+
").\n"
390389
]
391390
},
392391
{

docs/tutorials/nn/complex-and-quaternion-neural-networks.ipynb

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -28,10 +28,10 @@
2828
"This tutorial demonstrates how to use the SpeechBrain implementation of complex-valued and quaternion-valued neural networks for speech technologies. It covers the basics of highdimensional representations and the associated neural layers : Linear, Convolution, Recurrent and Normalisation.\n",
2929
"\n",
3030
"## Prerequisites\n",
31-
"- [SpeechBrain Introduction](https://colab.research.google.com/drive/12bg3aUdr9mTfOGqcB5pSMABoIKPgiwcM?usp=sharing)\n",
32-
"- [YAML tutorial](https://colab.research.google.com/drive/1Pg9by4b6-8QD2iC0U7Ic3Vxq4GEwEdDz?usp=sharing)\n",
33-
"- [Brain Class tutorial](https://colab.research.google.com/drive/1fdqTk4CTXNcrcSVFvaOKzRfLmj4fJfwa?usp=sharing)\n",
34-
"- [Speech Features tutorial](https://colab.research.google.com/drive/1CI72Xyay80mmmagfLaIIeRoDgswWHT_g?usp=sharing)\n",
31+
"- [SpeechBrain Introduction](https://speechbrain.readthedocs.io/en/latest/tutorials/basics/introduction-to-speechbrain.html)\n",
32+
"- [YAML tutorial](https://speechbrain.readthedocs.io/en/latest/tutorials/basics/hyperpyyaml.html)\n",
33+
"- [Brain Class tutorial](https://speechbrain.readthedocs.io/en/latest/tutorials/basics/brain-class.html)\n",
34+
"- [Speech Features tutorial](https://speechbrain.readthedocs.io/en/latest/tutorials/preprocessing/speech-features.html)\n",
3535
"\n",
3636
"## Introduction and Background\n",
3737
"\n",

0 commit comments

Comments
 (0)