Hello,
What I want to do :
The user uploads an audio record (taken from windows 10/11's audio recorder, or recorded during a Teams meeting, or recorded with a smartphone or whatever) => the script converts it if needed to mp3 (not need for an incredible audio quality, just enough to recognize words) => then it transcripts this audio record to text => finally it displays the transcripted text and gives the possibility to the user to download it as a .txt file.
What I currently have : a working script which is able to transcript from a wav or mp3 (not m4a) and then gives a download button with Streamlit.
My working script :
My understanding is that I do not use the good call at this line :
Is that I need to import something more?
Is that "just" that I have to handle one more step before calling my transcriptor function?
Thank you for your help :)
What I want to do :
The user uploads an audio record (taken from windows 10/11's audio recorder, or recorded during a Teams meeting, or recorded with a smartphone or whatever) => the script converts it if needed to mp3 (not need for an incredible audio quality, just enough to recognize words) => then it transcripts this audio record to text => finally it displays the transcripted text and gives the possibility to the user to download it as a .txt file.
What I currently have : a working script which is able to transcript from a wav or mp3 (not m4a) and then gives a download button with Streamlit.
My working script :
import torch
from tempfile import NamedTemporaryFile
from datasets import load_dataset
from transformers import pipeline
## Initialize environment ##
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
pipe = pipeline("automatic-speech-recognition", model="bofenghuang/whisper-small-cv11-french", device=device)
pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(language="fr", task="transcribe")
## Display ##
st.title("Télécharger un enregistrement de réunion pour obtenir sa transcription en texte")
col1, col2 = st.columns(2)
audio_source=st.sidebar.file_uploader(label="Choisir votre fichier", type=["wav","mp3"])
## Variables ##
suffix = ""
predicted_sentence = ""
## Processing ##
if audio_source is not None:
waveform = audio_source.getvalue()
predicted_sentence = pipe(waveform, max_new_tokens=225)
col1.write("Transcription :point_right:")
col2.write(predicted_sentence)
result = str(predicted_sentence)
col1.download_button(label="Télécharger la transcription", data=result, file_name="transcript.txt",mime="text/plain")My non-working code :## Imports ##
from pathlib import Path
import streamlit as st
import torch
from tempfile import NamedTemporaryFile
from datasets import load_dataset
from transformers import pipeline
from pydub import AudioSegment
## Initialize environment ##
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
pipe = pipeline("automatic-speech-recognition", model="bofenghuang/whisper-small-cv11-french", device=device)
pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(language="fr", task="transcribe")
## Functions ##
def convtomp3(file):
"""Convert an audio file from m3a, wav or wma to mp3"""
match Path(file.name).suffix:
case "m4a":
wav_audio = AudioSegment.from_file(file, format="m4a")
result = wav_audio.export("audio1.mp3", format="mp3")
return result
case "wav":
wav_audio = AudioSegment.from_file(file, format="wav")
result = wav_audio.export("audio1.mp3", format="mp3")
return result
case "wma":
wav_audio = AudioSegment.from_file(file, format="wma")
result = wav_audio.export("audio1.mp3", format="mp3")
return result
def transcriptor(file):
"""Transcript from an audio file to a string"""
if(Path(file.name).suffix != "mp3"):
file=convtomp3(file)
waveform = file.getvalue()
predicted_sentence = pipe(waveform, max_new_tokens=225)
return str(predicted_sentence)
## Display ##
st.title("Convertisseur / Transcripteur")
audio_source = st.sidebar.file_uploader(label = "Fichiers audio uniquement", type=["mp3","m4a","wav","wma"])
if audio_source is not None:
st.toast("Lancement de la transcription")
with NamedTemporaryFile(suffix=Path(audio_source.name).suffix) as temp_file:
temp_file.write(audio_source.getvalue())
temp_file.seek(0)
cpte_rendu = transcriptor(temp_file)
st.write("Transcription : ")
st.write(cpte_rendu)
st.sidebar.download_button(label = "Télécharger le compte-rendu", data = cpte_rendu, file_name = "cr.txt", mime = "text/plain")My error message :Error:AttributeError: 'NoneType' object has no attribute 'getvalue'
Traceback:
File "/home/ild/.local/lib/python3.12/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 542, in _run_script
exec(code, module.__dict__)
File "/home/ild/conv-cripteur.py", line 57, in <module>
cpte_rendu = transcriptor(temp_file)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ild/conv-cripteur.py", line 44, in transcriptor
waveform = file.getvalue()
^^^^^^^^^^^^^I'm getting the same error with an m4a and an mp3 file.My understanding is that I do not use the good call at this line :
cpte_rendu = transcriptor(temp_file)But I don't know enough Python to fix what I missed.
Is that I need to import something more?
Is that "just" that I have to handle one more step before calling my transcriptor function?
Thank you for your help :)

