Skip to content

Add Phoneme labels and timestamps - take two#1377

Open
madhephaestus wants to merge 13 commits into
alphacep:masterfrom
CommonWealthRobotics:master
Open

Add Phoneme labels and timestamps - take two#1377
madhephaestus wants to merge 13 commits into
alphacep:masterfrom
CommonWealthRobotics:master

Conversation

@madhephaestus

@madhephaestus madhephaestus commented Jun 1, 2023

Copy link
Copy Markdown

The first PR seems to have died. rutujaubale Made the original effort to add the feature. Nathravorn fixed the build in their branch. I am now making a new PR to get this feature merged in.

this PR replaces #528

and closes #687 with a solution

@Shallowmallow

Shallowmallow commented Jul 20, 2023

Copy link
Copy Markdown

Really nice. But it doesn't seem to work when you use alternatives ? It would be really cool if it was the case :)

@tobiasalanboyd

Copy link
Copy Markdown

Hello! I am trying to make a version of test_microphone.py that recognizes phonemes rather than words/sentences. However, I am struggling to figure out what the python equivalent would be to
vosk_recognizer_set_result_options(recognizer, "phones");
from test_phone_results.c
I thought perhaps that would be
SetResultOptions(rec, "phones")
but when I add this line I get the message that SetResultOptions is not defined.
Apologies if the answer to this is obvious, I am new to working with this type of code. Thank you in advance!

@madhephaestus

Copy link
Copy Markdown
Author

it looks like the method would be SetResultOptions(self, options), so no need to pass in an instance of the recognizer since that seems to be a private class variable not a parameter in the Python API.

@tobiasalanboyd

tobiasalanboyd commented Oct 9, 2024

Copy link
Copy Markdown

Thanks for getting back to me! I have tried pretty much every variation on the above that I can think of, and am not sure if the issue is due to me being new to Python or if there's something else happening here.
All of the below examples were inserted below this line in my copy of test_microphone.py:
rec = KaldiRecognizer(model, args.samplerate)
Examples of what I have tried adding so far with no success:
SetResultOptions("phones")
SetResultOptions(rec, "phones")
SetResultOptions(rec._handle, "phones")
rec.SetResultOptions("phones")
rec._handle.SetResultOptions("phones")
rec.SetResultOptions(rec, "phones")
rec.SetResultOptions(rec._handle, "phones")
rec.SetResultOptions()
rec._handle.SetResultOptions()

If this is helpful to know, I am running the program in CMD with the following command:
C:\Users\myusername\vosk-api\python\example>py .\test_microphone_phon.py

EDIT: Before realizing that regular Vosk would not provide individual phonemes, I installed it via pypi - is it possible this is contributing to the difficulties?

@321Proteus

321Proteus commented Dec 3, 2024

Copy link
Copy Markdown

Hello, I'm currently trying to build your version of Vosk, but I keep getting the same error as in #1082 :

recognizer.cc: In member function ‘const char* Recognizer::PartialResult()’:
recognizer.cc:855:13: error: ‘WordAlignLatticePartial’ was not declared in this scope
  855 |             WordAlignLatticePartial(clat, *model_->trans_model_, *model_->winfo_, 0, &aligned_lat);
      |             ^~~~~~~~~~~~~~~~~~~~~~~

I'm using the AlphaCephei branch of Kaldi (with OpenFST 1.7.2, tried also 1.8.3 from Kaldi-ASR with the same result). Any idea what's going on?

@nshmyrev

nshmyrev commented Dec 4, 2024

Copy link
Copy Markdown
Collaborator

I'm using the AlphaCephei branch of Kaldi

WordAlignLatticePartial is there. Probably you are using some old version. Please recheck.

@321Proteus

Copy link
Copy Markdown

OK, I did it. I redownloaded the Dockerfile and ran it on my machine instead of building everything locally (normally I'd just build Kaldi and OpenFST using Docker, then copy them to local and build Vosk from there). Now everything compiles just fine!

@slambang

Copy link
Copy Markdown

@madhephaestus is there any update on this? I am hoping to have this feature compiled and running on Android

Guy-Dvir pushed a commit to Guy-Dvir/vosk-api that referenced this pull request Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Is it possible to get the timing of phonemes, instead of full words?

8 participants