I found a bug (i think) in the new 241 version. When selecting the most accurate (biggest) model, it outputs random garbage text for quite a while, then after a few minutes into the video it starts picking up the correct language, and translates it correctly. On the default middle setting it translates the video fast and correct! PS: on the most accurate setting it also often crashes the app at the very end of loading the video, before it begins transcribing it (.mp4 720p vids). The video file itself is not the issue, since the middle setting does the whole video correctly without crashing. The program has no error-log when it crashes, so I cant be helpful in providing that info, so maybe an idea to implement such a log in the future? (see image below for results)
This is something that comes from the model itself, so I will have a proper look at it. Maybe I will need to expose some extra settings that will be available as "Advanced settings" to make it work properly.
Hey! Great work! The new GPU version (v241) is waaay faster than the older (v214) (PS: important question below the picture, so view the rest of this post)
Is it possible to have it translate to other languages than "English" ? I assume the AI is first transcribing the audio to text in original language, then the AI translates the sentence to English also using the AI, right?
If that is the case then a dropdown list of languages the AI supports would make it possible to translate a german video into norwegian subtitles, or as now getting the german video into an english subtitle. Can you possibly add this? Or would it be more complicated than I think?
Yeah, the GPU enabled version is much faster. Glad you got this update.
In terms of the translation, it actually doesn't work exactly like that. The model itself basically only does transcription to the same language. The model just has a "bonus" feature that allows direct translation from audio to English, no intermediary text to translate.
Having a full translation from any language to any other language is outside the scope for this app. If there's interest, I could publish an independent app that does full translation of subtitles and text from any language to any other one, which would complement this app.
An independent app that does full translation of subtitles and text from any language to any other one, is indeed very interesting! It should of course have GPU support as well as CPU support. It would be truly awesome if it could also have the support for converting .SRT into .VTT (WebVTT). That format has a header (3 lines) and all dots in the time-stamp unlike .SRT that has one comma in their timestamp. It also has the ability to place text at different positions on the screen, but the normal/default captions/subs position we're all used to is: align:middle size:95% line:95 The incrementing block-numbering before each block of "timestamp+position+text" as one has in .SRT, is optional in .VTT Example (WebVTT) .VTT Subtitle/Captions File:
WEBVTT Kind: captions Language: en
00:00:00.000 --> 00:00:08.380 align:middle size:95% line:95 So how do you make a web page that looks like this where you have a video and it shows a
00:00:08.380 --> 00:00:14.680 align:middle size:95% line:95 nice picture where you can put the title of the video or information about what the video
00:00:14.680 --> 00:00:23.560 align:middle size:95% line:95 is about and it has obviously this video player which has this hamburger menu over here
So as you see, very similar to the .SRT format, but the benefit of this format is that it can be used with HTML5 players, so you can upload it directly to youtube (or other services supporting sub/caption) and on your own homepage, if you need video with subtitles. PS: I've made a python script doing this .SRT to .VTT conversion, that i can share it with you, if need be. (not C++ code, but just as easy to read, he he)
Could you please add the following features: translation of SRT subtitles into other languages, an option to adjust the number of words displayed per line, and the ability to keep complete sentences on a single line if desired? Additionally, it would be great to include an AI-powered summarization function. Thank you.
Very nice app! Love the idea of a fast offline transcriber/translator! I am currently testing the demo, and after doing so I paid and bought this software!
I have two questions I'd like to ask: Do you have an approx. e.t.a (month/year) for when the GPU / CPU switching possibility will arrive in an updated version?
And also I wonder if the app is temporarily writing the text it translates from speech to txt, to a txt file, like every minute or so, just in case the app might crash for unforeseen reasons. So that you may read some of it, if you have to restart the procedure due to a random app-crash?
I will integrate GPU support in the next update, so it shouldn't be too long.
The app keeps the transcription in memory. In theory I could add an "autosave" of the transcription if the app is interrupted for any reason, before exiting. I could add this as well for the next update.
Hi tok-ai, can you describe the environment, such as your OS (Windows, macOS, Linux), and the type of audio file you are using? If it's possible to share with me an audio file that is causing you this issue I could try to replicate the problem. You can msg me directly on reddit (/u/samontab)
Hey Samontab ! PrivateTranscriber looks quite awesome. However, I find it very hard to evaluate with the demo version whether the app provides an accurante transcription of the audio I've submitted. Could it be possible to either A/ change the demo version to gather one every two minutes, rather than by segments of 10 secs ; or B/ to send you ONE audio file to get the full transcription ? I do have the feeling that would be more helpful to evaluate. Many thanks ! All best.
They are both completely independent versions so you only need one, v1.5.2 has more models so that's why it's larger. It includes everything you need to run it.
why is it translating English to welsh? and copying the same thing over and over after hours of waiting for it? "Mae'n gwaith unrhyw." and other welsh phrases/sentences over and over? how do i fix this? some of the other transcripts have been fine..
Since it is an AI transcription, it might not get it right all the time, and sometimes you might see what you describe.
You can try fixing it by using a different model, either a more accurate one or a faster one. You can select this in Edit->Settings under Transcription Model
Love the app, works fast and remarkably well for the processing speed! I normally use whisperx with pyannote for segmentation and diarization but that process is not well suited to rapidly transcribing video in low latency distribution! My primary use case is to scan video for obscene language prior to general distribution. The drag and drop pipeline with post processing cleanup grid has cut my worktime in half! Quick question, Is there any way to inject into the translation pipeline so I could add Speaker Diarization?
This is fantastic, thank you for providing this. I'm deaf, tryibng to study from videos and it's driving me cray-cray!
Is there way to hook this into system sound so that I can get a rolling transcription of videos that i can'f download or get the URL of? As a student I use Learning Mangement Systems that embed videos but lock out pretty much everything except PLAY!
Google translate (Engish - to - English) is what I'm using now, but i have to pause, screen capture, Google Lens or Snagit to get text, dump in Notepad to clean out non-text, copy to asy Word or OneNote and aaarrrrgh! There has to be a simpler way!
Hi Wad Mabbit Society, happy to hear you liked this software.
At the moment, it can only transcribe media files, not directly in real time from the microphone.
One option you can do is to record the sounds coming from your system, and then feed that recording into the software. This will get you the best quality, as real time transcription requires a simpler model, and is also not currently planned for the short term at least.
Hi rebork5555, you can download a demo of the program to see how it works. Just click on the Download button next to PrivateTranscriberPro DEMO v1.5.2.
Hi Rebork5555, you can keep the transcription in Spanish by selecting "Keep original language" in the Settings menu.. You can see that option by going to the Edit menu, then clicking on Settings. You'll see something like this:
You are absolutely correct Kijkeenolifant, the current version only uses CPU. In a future version there will be an option to accelerate it with your GPU.
The point was to first make it available to everyone, and then making it better over time with new releases. Anyone that buys it will have access to future versions anyway, forever!
Nice work on the application. Great all in one lightweight package. Would it be possible to include the larger models in the application as well? That's the only downside for me at the moment compared to installing the normal way.
Thanks Kijkeenolifant, great to hear you liked it.
What you say is a valid point. I originally planned to include all the models in the application, but ended up with a file that is way larger than itch.io's maximum allowed download file size (1GB), so I ended up including only a subset of them to make the application pass this constraint.
Having said that, it looks like I can manually request itch.io for a larger maximum file size, so I will update this tool if they increase this limit.
I ended up discovering a different way of uploading files (butler) which is much nicer and doesn't have the 1GB restriction. So, as promised, I just updated the app to include five different model sizes, which are now available in v1.5.2.
I didn't want to include all the 3 versions of the large model as the download is already at about 5GB so I only added large-v1 which seems to be the one with least amount of issues in general, but if you want to use any of the other 2 large models (v2 or v3), you can simply copy the model you want to use to the models folder and change its name to large-v1 and it will use that model instead when you select the most accurate setting.
← Return to tool
Comments
Log in with itch.io to leave a comment.
I found a bug (i think) in the new 241 version.

When selecting the most accurate (biggest) model, it outputs random garbage text for quite a while, then after a few minutes into the video it starts picking up the correct language, and translates it correctly.
On the default middle setting it translates the video fast and correct!
PS: on the most accurate setting it also often crashes the app at the very end of loading the video, before it begins transcribing it (.mp4 720p vids).
The video file itself is not the issue, since the middle setting does the whole video correctly without crashing.
The program has no error-log when it crashes, so I cant be helpful in providing that info, so maybe an idea to implement such a log in the future? (see image below for results)
Thanks for the detailed message.
This is something that comes from the model itself, so I will have a proper look at it. Maybe I will need to expose some extra settings that will be available as "Advanced settings" to make it work properly.
Hey! Great work!

The new GPU version (v241) is waaay faster than the older (v214)
(PS: important question below the picture, so view the rest of this post)
Is it possible to have it translate to other languages than "English" ?
I assume the AI is first transcribing the audio to text in original language, then the AI translates the sentence to English also using the AI, right?
If that is the case then a dropdown list of languages the AI supports would make it possible to translate a german video into norwegian subtitles, or as now getting the german video into an english subtitle.
Can you possibly add this? Or would it be more complicated than I think?
Hi AlexData-Hawkhill,
Yeah, the GPU enabled version is much faster. Glad you got this update.
In terms of the translation, it actually doesn't work exactly like that. The model itself basically only does transcription to the same language. The model just has a "bonus" feature that allows direct translation from audio to English, no intermediary text to translate.
Having a full translation from any language to any other language is outside the scope for this app. If there's interest, I could publish an independent app that does full translation of subtitles and text from any language to any other one, which would complement this app.
An independent app that does full translation of subtitles and text from any language to any other one, is indeed very interesting!
It should of course have GPU support as well as CPU support.
It would be truly awesome if it could also have the support for converting .SRT into .VTT (WebVTT). That format has a header (3 lines) and all dots in the time-stamp unlike .SRT that has one comma in their timestamp. It also has the ability to place text at different positions on the screen, but the normal/default captions/subs position we're all used to is: align:middle size:95% line:95
The incrementing block-numbering before each block of "timestamp+position+text" as one has in .SRT, is optional in .VTT
Example (WebVTT) .VTT Subtitle/Captions File:
WEBVTT
Kind: captions
Language: en
00:00:00.000 --> 00:00:08.380 align:middle size:95% line:95
So how do you make a web page that looks like this where you have a video and it shows a
00:00:08.380 --> 00:00:14.680 align:middle size:95% line:95
nice picture where you can put the title of the video or information about what the video
00:00:14.680 --> 00:00:23.560 align:middle size:95% line:95
is about and it has obviously this video player which has this hamburger menu over here
So as you see, very similar to the .SRT format, but the benefit of this format is that it can be used with HTML5 players, so you can upload it directly to youtube (or other services supporting sub/caption) and on your own homepage, if you need video with subtitles.
PS: I've made a python script doing this .SRT to .VTT conversion, that i can share it with you, if need be. (not C++ code, but just as easy to read, he he)
Could you please add the following features: translation of SRT subtitles into other languages, an option to adjust the number of words displayed per line, and the ability to keep complete sentences on a single line if desired? Additionally, it would be great to include an AI-powered summarization function. Thank you.
Thanks firefox66 for those great suggestions!
Very nice app! Love the idea of a fast offline transcriber/translator!
I am currently testing the demo, and after doing so I paid and bought this software!
I have two questions I'd like to ask: Do you have an approx. e.t.a (month/year) for when the GPU / CPU switching possibility will arrive in an updated version?
And also I wonder if the app is temporarily writing the text it translates from speech to txt, to a txt file, like every minute or so, just in case the app might crash for unforeseen reasons. So that you may read some of it, if you have to restart the procedure due to a random app-crash?
Hi AlexData-Hawkhill,
Thanks for the kind words.
I will integrate GPU support in the next update, so it shouldn't be too long.
The app keeps the transcription in memory. In theory I could add an "autosave" of the transcription if the app is interrupted for any reason, before exiting. I could add this as well for the next update.
Nice! Looking forward to your next update!
v2.4.1 just released, which includes GPU acceleration.
Hello, i downloaded the demo, and when i import the audio file, the software shutdown unexpectedly.
Hi tok-ai, can you describe the environment, such as your OS (Windows, macOS, Linux), and the type of audio file you are using? If it's possible to share with me an audio file that is causing you this issue I could try to replicate the problem. You can msg me directly on reddit (/u/samontab)
Hi tok-ai,
This issue should now be resolved in v2.1.4 which I just released
Hi thanks for the fix! Now i can import my wav files. I also bought the transcriber and ready to begin to work.
Glad it's working for you now, all the best!
Hey Samontab ! PrivateTranscriber looks quite awesome. However, I find it very hard to evaluate with the demo version whether the app provides an accurante transcription of the audio I've submitted. Could it be possible to either A/ change the demo version to gather one every two minutes, rather than by segments of 10 secs ; or B/ to send you ONE audio file to get the full transcription ? I do have the feeling that would be more helpful to evaluate. Many thanks ! All best.
Hi LotekDotItch,
Sure!, just comment here with a link to download the audio file, or if you prefer you can send me a PM at reddit (/u/samontab)
Great, thanks !
It works! Oh wow!!!! Used it on a video from Youtube-it transcribed it correctly (saved to SRT file). THANK YOU!
I'm glad it worked for you, wtinjalanugraha :)
hi there, i just purchased, & have a question: do i load the v1.4.2 (583 MB) file in addition to the v1.5.2 file (4.6 GB)?
thanks!
Hi pityadd,
They are both completely independent versions so you only need one, v1.5.2 has more models so that's why it's larger. It includes everything you need to run it.
Enjoy your transcriptions!
thank you, it is working quite well!
why is it translating English to welsh? and copying the same thing over and over after hours of waiting for it? "Mae'n gwaith unrhyw." and other welsh phrases/sentences over and over? how do i fix this? some of the other transcripts have been fine..
Hi 19sofia99,
Since it is an AI transcription, it might not get it right all the time, and sometimes you might see what you describe.
You can try fixing it by using a different model, either a more accurate one or a faster one. You can select this in Edit->Settings under Transcription Model
Love the app, works fast and remarkably well for the processing speed! I normally use whisperx with pyannote for segmentation and diarization but that process is not well suited to rapidly transcribing video in low latency distribution! My primary use case is to scan video for obscene language prior to general distribution. The drag and drop pipeline with post processing cleanup grid has cut my worktime in half! Quick question, Is there any way to inject into the translation pipeline so I could add Speaker Diarization?
Hi tbruinsma, thanks for the nice comments!
Speaker Diarization is a frequently requested feature, and I will add it in the next update of the application. Hope that helps!
hello, i wanted to ask if this software is able to transcribe live audio- or does it only transcribe from a video file? thank you for your time!
edit: my bad, i saw that you answered this question below!
Hi,
This is fantastic, thank you for providing this. I'm deaf, tryibng to study from videos and it's driving me cray-cray!
Is there way to hook this into system sound so that I can get a rolling transcription of videos that i can'f download or get the URL of? As a student I use Learning Mangement Systems that embed videos but lock out pretty much everything except PLAY!
Google translate (Engish - to - English) is what I'm using now, but i have to pause, screen capture, Google Lens or Snagit to get text, dump in Notepad to clean out non-text, copy to asy Word or OneNote and aaarrrrgh! There has to be a simpler way!
So, can it? Or anything in the pipleine?
Hi Wad Mabbit Society, happy to hear you liked this software.
At the moment, it can only transcribe media files, not directly in real time from the microphone.
One option you can do is to record the sounds coming from your system, and then feed that recording into the software. This will get you the best quality, as real time transcription requires a simpler model, and is also not currently planned for the short term at least.
Hello, there is free trial ? Thank you
Hi rebork5555, you can download a demo of the program to see how it works. Just click on the Download button next to PrivateTranscriberPro DEMO v1.5.2.
Sorry, I didn't see the demo. I've used the demo with a spanish sound archive, but it auto translates to english. Thank you.
Hi Rebork5555, you can keep the transcription in Spanish by selecting "Keep original language" in the Settings menu.. You can see that option by going to the Edit menu, then clicking on Settings. You'll see something like this:

Sorry, I didn't see the language options. Thank you very much. I'm be able to see the original language now. Very nice software.
Am I correct to understand that the application only uses the CPU and not the GPU? Is there a way to switch between?
You are absolutely correct Kijkeenolifant, the current version only uses CPU. In a future version there will be an option to accelerate it with your GPU.
The point was to first make it available to everyone, and then making it better over time with new releases. Anyone that buys it will have access to future versions anyway, forever!
Hey Kijkeenolifant, version 2.4.1 is just released, which includes GPU acceleration for much faster transcriptions. Check it out!
Nice work on the application. Great all in one lightweight package. Would it be possible to include the larger models in the application as well? That's the only downside for me at the moment compared to installing the normal way.
Thanks Kijkeenolifant, great to hear you liked it.
What you say is a valid point. I originally planned to include all the models in the application, but ended up with a file that is way larger than itch.io's maximum allowed download file size (1GB), so I ended up including only a subset of them to make the application pass this constraint.
Having said that, it looks like I can manually request itch.io for a larger maximum file size, so I will update this tool if they increase this limit.
I ended up discovering a different way of uploading files (butler) which is much nicer and doesn't have the 1GB restriction. So, as promised, I just updated the app to include five different model sizes, which are now available in v1.5.2.
I didn't want to include all the 3 versions of the large model as the download is already at about 5GB so I only added large-v1 which seems to be the one with least amount of issues in general, but if you want to use any of the other 2 large models (v2 or v3), you can simply copy the model you want to use to the models folder and change its name to large-v1 and it will use that model instead when you select the most accurate setting.
That is amazing! Thanks for the quick turn around. I will definitely buy a copy :).