Mar-13-2021, 01:34 PM
Hello, Python community!
I would like to record audio from my microphone and transcribe it in (almost) real-time via a speech-to-text API. The STT API available to me is VoxSigma by Vocapia. It allows me to send a .wav file with the speech recording and to receive an XML-file with the transcript within seconds:
Any ideas on what I could do to make it work properly?
Thanks in advance for any suggestions!
I would like to record audio from my microphone and transcribe it in (almost) real-time via a speech-to-text API. The STT API available to me is VoxSigma by Vocapia. It allows me to send a .wav file with the speech recording and to receive an XML-file with the transcript within seconds:
def vocapia(wavfile):
cmd = "curl -ksS -u password REST-URL -F method=vrbs_trans -F " \
"model=eng -F audiofile=@" + wavfile + " > ../resources/static/XMLs/dynamic_recording.xml"
os.system(cmd)
try:
# parse xml document to retrieve transcript
mydoc = minidom.parse("../resources/static/XMLs/dynamic_recording.xml")
words = mydoc.getElementsByTagName('Word')
sentence = ""
for elem in words:
sentence = sentence + elem.firstChild.data[1:]
return sentence
# catch "xml.parsers.expat.ExpatError: no element found: line 1, column 0"
# -> empty xml means nothing was transcribed
except ExpatError:
print("Nothing was transcribed yet. Resuming...")
return ""The problem is that my approach with pythons sound-device constantly writes to a .wav file and only seems to save it after the recording has finished. Because of this, I seemingly do not have access to the recordings in real-time. If I send the .wav file to Vocapia while recording, nothing is transcribed (.wav file is empty). Here is the code that handles the recording:# Thread Function for parallel STT
def thread_func(text):
while(True):
time.sleep(5) # give time for some dialogue to happen
new_text = text + vocapia("recording.wav")
print("new_text")
# sound-device file and queue
wav_file = "recording.wav"
q = queue.Queue()
def callback(indata, frames, time, status):
if status:
print(status, file=sys.stderr)
q.put(indata.copy())
try:
# delete any prior recordings
os.remove(wav_file)
device_info = sd.query_devices(0, 'input')
# soundfile expects an int, sounddevice provides a float:
samplerate = int(device_info['default_samplerate'])
# start thread that handles the stt by sending wav_file to vocapia
th = threading.Thread(target=thread_func, args=(session_text,))
th.start()
with sf.SoundFile(wav_file, mode='x', samplerate=samplerate, channels=1) as file:
with sd.InputStream(samplerate=samplerate, callback=callback, channels=1):
print('Started recording. Press Ctrl+C to stop the recording.')
while True:
file.write(q.get())
except KeyboardInterrupt:
print('\nRecording finished.Running this code I will simply receive and catch the "xml.parsers.expat.ExpatError: no element found: line 1, column 0" error every five seconds. As soon as I stop the recording loop though, I can run vocapia(wav_file) and get the full transcript.Any ideas on what I could do to make it work properly?
Thanks in advance for any suggestions!
