tentacule / pgstosrt Goto Github PK
View Code? Open in Web Editor NEWPGS to Srt converter
PGS to Srt converter
I wanted to use PgsToSrt under Ubuntu 23.04 but it comes with libtesseract5 and I did not find what to install libtesseract4. Is it possible to make PgsToSrt use libtesseract5 instead of libtesseract4?
Do you plan on adding "Fix OCR errors" like subtitle edit option to resolve badly OCRd text ?
Hi,
I want to convert .sup (PGS) to .srt with your libs using docker :
docker run -it -v /share/CACHEDEV1_DATA/Multimedia/Movies/Test:/data -e INPUT=/data/Mission.Impossible.Fallout.2018.MULTi.TRUEFRENCH.2160p.UHD.BluRay.REMUX.DV.HEVC-BEO.6.en.sup -e LANGUAGE=eng tentacule/pgstosrt
But I have this error :
2020/12/11 11:36:28.418|ERROR|Error: Exception has been thrown by the target of an invocation. at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor, Boolean wrapExceptions)
at System.Reflection.RuntimeConstructorInfo.Invoke(BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
at System.RuntimeType.CreateInstanceImpl(BindingFlags bindingAttr, Binder binder, Object[] args, CultureInfo culture)
at System.Activator.CreateInstance(Type type, BindingFlags bindingAttr, Binder binder, Object[] args, CultureInfo culture, Object[] activationAttributes)
at System.Activator.CreateInstance(Type type, Object[] args)
at Tncl.NativeLoader.NativeInstance.CreateInstance(NativeLoader loader, Type interfaceType)
at PgsToSrt.TesseractApi.Initialize()
at PgsOcr.DoOcr() Exception has been thrown by the target of an invocation.
I have a QNAP NAS.
Thanks in advance for your help.
Erwan
$ dotnet PgsToSrt.dll --input test.sup --output test.srt --tesseractlanguage eng
PgsToSrt 1.3.0.0
2021/01/31 13:38:00.145|INFO|Detected tesseract language data for language 'eng'.
2021/01/31 13:38:00.180|INFO|Starting OCR for 285 items...
2021/01/31 13:38:00.228|ERROR|Error: Exception has been thrown by the target of an invocation. at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor, Boolean wrapExceptions)
at System.Reflection.RuntimeConstructorInfo.Invoke(BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
at System.RuntimeType.CreateInstanceImpl(BindingFlags bindingAttr, Binder binder, Object[] args, CultureInfo culture)
at System.Activator.CreateInstance(Type type, BindingFlags bindingAttr, Binder binder, Object[] args, CultureInfo culture, Object[] activationAttributes)
at System.Activator.CreateInstance(Type type, Object[] args)
at Tncl.NativeLoader.NativeInstance.CreateInstance(NativeLoader loader, Type interfaceType)
at PgsToSrt.TesseractApi.Initialize()
at PgsOcr.DoOcr() Exception has been thrown by the target of an invocation.
I am using:
$ ldconfig -p -v | grep libdl
libdl.so.2 (libc6,x86-64, OS ABI: Linux 3.2.0) => /lib/x86_64-linux-gnu/libdl.so.2
libdl.so.2 (libc6, OS ABI: Linux 3.2.0) => /lib/i386-linux-gnu/libdl.so.2
libdl.so (libc6,x86-64, OS ABI: Linux 3.2.0) => /lib/x86_64-linux-gnu/libdl.so
Simiar to issue #6, caused by the missing libtesseract3 package
Build failed on Linux without specifying the framework, please consider update README.md
dotnet publish -c Release -o out --framework net6.0
On ArchLinux the required library liblept.so
is actually called libleptonica.so
. I had to make the following symlink in order to make PgsToSrt to read it: ln -s /usr/lib/libleptonica.so.6 /usr/lib/liblept.so.5
. Would be good if the program could read the name libleptonica.so
by itself.
It will be easier to install like dotnet tool install -g PgsToSrt
.
Guide is here: https://learn.microsoft.com/en-us/dotnet/core/tools/global-tools-how-to-create
Thanks a lot!
Bonjour,
J'ai bien utilisé votre création qui fonctionne très bien.
Je vous avoue l'avoir utilisé pour mon logiciel "TAO-MKV" mais je suis passé par un autre processus en tesseract 5.0.X LSTM.
Cependant même si votre programme ne soit pas rapide, il reste très efficace ET surtout léger (j'ai pu le réduire à 19Mo contre 250Mo pour l'officiel "sans les tessdata bien sur" ).
Je n'ai aucune idée de comment extraire les bmp sous-titres et les timestamps (sous forme texte) mais si vous souhaitez inclure votre savoir faire, rapidité et efficacité dans TAO-MKV, https://github.com/serpafi/TAO-MKV
on sera ravi de votre travail qui ne sera pas spoiler mais mis en avant (textes, liens ou autres seront publiés directement dans le logiciel).
Cordialement
fixed by #12
I see the examples for MKV and sup files, but I used MKVToolNix GUI to extract my subtitles, which gave me MKS files. Also, how can I have PgsToSrt convert all tracks in the MKS file?
Is it possible to tell PgsToSrt to convert all files in a directory? Does it depend on the container?
You changed the code-style of the original code which makes merging upstream changes a lot harder on yourself.
In one case, IMO get => TimeSpan.Milliseconds;
is more readable than what you changed it to:
get
{
return TimeSpan.Milliseconds;
}
You can refactor your modifications so you override functionality and thus you can use the upstream nuget package and make updates a lot easier for those dependencies. Git submodules can be utilized if you must have a clone of the code in your project.
The net5 directory is there, lots of files are there, but PgsToSrt.exe and .dll are missing.
That's really good project I really like to use it whenever i have to work with pgs, can you also add vosub support cuz there is no good alternative like pgstosrt does
It doesn't seem like the Tesseract trained data set is optional (i.e. 'fast' vs 'best') and as far as I can tell, you are using 'fast'. Is that the case?
There may also be corruption somewhere in the trained data you have (at least, for eng) as I just noticed totally nonsensical series of characters in the conversion of a single basic word when it is multi-line. Something like...
The brown fox jumps over the lazy
qj2]a%sLo1
Hi - Trying to use your script and got the following error:
PgsToSrt 1.0.0.0
2019/11/26 19:57:26.699|INFO|Detected tesseract language data for language 'spa'.
2019/11/26 19:57:26.783|INFO|Detected tesseract language data for language 'eng'.
2019/11/26 19:57:27.011|INFO|Starting OCR for 606 items...
2019/11/26 19:57:27.114|ERROR|Error: Exception has been thrown by the target of an invocation. Exception has been thrown by the target of an invocation.
Not sure how to generate more debug info. Any assistance is appreciated .. Thanks.
The Dockerfile in the current master branch uses a .NET SDK 5.0 base image which can't target .NET 6 targets.
Relevant output for docker build -t pgstosrt .
Step 4/9 : RUN cd /src && dotnet restore && dotnet publish -c Release -o /src/PgsToSrt/out && mv /src/entrypoint.sh /entrypoint.sh && chmod +x /entrypoint.sh && mv /src/PgsToSrt/out /app
---> Running in 94e0c48afbe5
Determining projects to restore...
/usr/share/dotnet/sdk/5.0.101/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.TargetFrameworkInference.targets(141,5): error NETSDK1045: The current .NET SDK does not support targeting .NET Core 6.0. Either target .NET Core 5.0 or lower, or use a version of the .NET SDK that supports .NET Core 6.0. [/src/PgsToSrt/PgsToSrt.csproj]
The command '/bin/sh -c cd /src && dotnet restore && dotnet publish -c Release -o /src/PgsToSrt/out && mv /src/entrypoint.sh /entrypoint.sh && chmod +x /entrypoint.sh && mv /src/PgsToSrt/out /app' returned a non-zero code: 1
Steps to reproduce:
git clone https://github.com/Tentacule/PgsToSrt.git
cd PgsToSrt
# checkout the latest release v1.4.2 or master at 38fd03e57f
git checkout v1.4.2
docker build -t pgstosrt .
Quick fix:
Change the .NET SDK base image to 6.0
Specify the framework to target net6.0 (because the project specifies both 5.0 or 6.0 as potential targets, one must be explicitly chosen)
Alternatively, leave the .NET SDK base image to 5.0.101
Specify the framework to target net5.0
Hi!
I am seeking your help today as I see the following error when executing :
dotnet PgsToSrt.dll --input /video/input/Sieben.mkv --track 3 --output /video/input/test.srt
PgsToSrt 1.1.0.0
2020/06/15 10:09:20.485|INFO|Detected tesseract language data for language 'deu'.
2020/06/15 10:09:21.191|INFO|Starting OCR for 1729 items...
2020/06/15 10:09:21.244|ERROR|Error: Exception has been thrown by the target of an invocation. at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor, Boolean wrapExceptions)
at System.Reflection.RuntimeConstructorInfo.Invoke(BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
at System.RuntimeType.CreateInstanceImpl(BindingFlags bindingAttr, Binder binder, Object[] args, CultureInfo culture, Object[] activationAttributes)
at Tncl.NativeLoader.NativeInstance.CreateInstance[T](NativeLoader loader)
at Tesseract.Interop.TessApi.Initialize(NativeLoader loader) in /root/PgsToSrt/Tesseract/Interop/BaseApi.cs:line 355
at Tesseract.TesseractEngine..ctor(String datapath, String language, EngineMode engineMode, IEnumerable1 configFiles, IDictionary
2 initialOptions, Boolean setOnlyNonDebugVariables) in /root/PgsToSrt/Tesseract/TesseractEngine.cs:line 66
at PgsOcr.DoOcr() Exception has been thrown by the target of an invocation.
My system details are:
[root@nvidia out]# tesseract --version
tesseract 3.04.00
leptonica-1.72
libgif 4.1.6(?) : libjpeg 6b (libjpeg-turbo 1.2.90) : libpng 1.5.13 : libtiff 4.0.3 : zlib 1.2.7 : libwebp 0.3.0
[root@nvidia out]# uname -a
Linux nvidia.home 3.10.0-1127.10.1.el7.x86_64 #1 SMP Wed Jun 3 14:28:03 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
[root@nvidia out]# dotnet --version
2.1.807
Do you have any idea about the root cause?
Am I missing something?
BR
when executing
dotnet D:\pgstosrt\PgsToSrt.dll --input "D:\23.sup" --output "D:\23.srt" --tesseractlanguage tha
on a specific sup file I got
2019/12/07 17:49:26.754|INFO|Starting OCR for 8 items...
read_params_file: parameter not found:
yes I have the tha tesseractlanguage
you can find the sup file here
23.zip
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.