Rank: Administration
Groups: Translators, Members, Administrators Joined: 1/11/2018(UTC) Posts: 865  Location: Tampa, FL Thanks: 20 times Was thanked: 222 time(s) in 193 post(s)
|
DISCLAIMER:I am not versed in Tesseract nor OCR beyond understanding what they are in general. I simply pulled together the necessary files and two supporting methods in the SPTesseract class to be able to make this work. I have only tested this in the most basic way, selecting an area with a light background and dark, legible, text. Plug-In and Required Files:https://www.strokesplus.net/files/plugins/SPTesseract_1.0.0.0.zip (177 MB) IMPORTANT: - You must have a minimum version of StrokesPlus.net 0.4.1.1 - Extract all files/folders directly into StrokesPlus.net\Plug-Ins folder using 7-Zip or some other method which doesn't block the DLLs - If you update the current working directory used by S+, this might fail to load the trained data files - Should look like the image below  Put this function in Global Actions > Load/Unload > Load (script):This is a wrapper for extracting text based on a language and rectangleCode:function GetTesseractText(lang, rect) {
//Capture an image from the screen using the Rectangle passed in
var memoryImage = new drawing.System.Drawing.Bitmap(rect.Width, rect.Height);
var memoryGraphics = drawing.System.Drawing.Graphics.FromImage(memoryImage);
memoryGraphics.CopyFromScreen(rect.X, rect.Y, 0, 0, new Size(rect.Width, rect.Height));
memoryGraphics.Dispose();
/*
Get Tesseract Page object
First param in Bitmap image
- From code above
Second param is language
- See Plug-Ins\TesseractTrainedData folder for other trained data files
- Appears to be just the first part of the file name; "eng" for English
Third param is Tesseract.EngineMode enum value:
- TesseractOnly = "0"
- LstmOnly = "1"
- TesseractAndLstm = "2"
- Default = "3"
Last param is Tesseract.PageSegMode
- OsdOnly = "0"
- AutoOsd = "1"
- AutoOnly = "2"
- Auto = "3"
- SingleColumn = "4"
- SingleBlockVertText = "5"
- SingleBlock = "6"
- SingleLine = "7"
- SingleWord = "8"
- CircleWord = "9"
- SingleChar = "10"
- SparseText = "11"
- SparseTextOsd = "12"
- RawLine = "13"
- Count = "14"
SPTesseract class also has the function below defined:
List<Rectangle> GetPageSegmentedRegions(Page page, string iteratorLevel)
Pass Page object from the SPTesseract.GetPage method with one of the PageIteratorLevel values:
Tesseract.PageIteratorLevel
- Block = "0"
- Para = "1"
- TextLine = "2"
- Word = "3"
- Symbol = "4"
All other methods/properties are standard Tesseract 4.1.1 from:
- https://www.nuget.org/packages/Tesseract/
- https://github.com/charlesw/tesseract
*/
var tpage = SPTesseract.GetPage(memoryImage, lang, "3", "3");
return tpage.GetText();
}
Example action script to test recognition and show source rectangle:Code://Use the square gesture, to draw a box around the desired area
//Shows the text recognized by Tesseract (English data file) and the dimensions of the rectangle
sp.MessageBox(`Text: ${GetTesseractText("eng", new Rectangle(action.Bounds.X, action.Bounds.Y, action.Bounds.Width, action.Bounds.Height))}
Rect.X: ${action.Bounds.X}
Rect.Y: ${action.Bounds.Y}
Rect.Width: ${action.Bounds.Width}
Rect.Height: ${action.Bounds.Height}`,
"Text, Rect");
Edited by user Wednesday, January 20, 2021 7:33:29 AM(UTC)
| Reason: Not specified
|
|
|
|
Forum Jump
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.
Important Information:
The StrokesPlus.net Forum uses cookies. By continuing to browse this site, you are agreeing to our use of cookies.
More Details
Close