Rank: Administration
Groups: Translators, Members, Administrators Joined: 1/11/2018(UTC) Posts: 1,308  Location: Tampa, FL Thanks: 28 times Was thanked: 410 time(s) in 351 post(s)
|
DISCLAIMER:I am not versed in Tesseract nor OCR beyond understanding what they are in general. I simply pulled together the necessary files and two supporting methods in the SPTesseract class to be able to make this work. I have only tested this in the most basic way, selecting an area with a light background and dark, legible, text. Plug-In and Required Files:https://www.strokesplus.net/files/plugins/SPTesseract_1.1.0.0.zip (177 MB) IMPORTANT: - You must have a minimum version of StrokesPlus.net 0.4.1.1 - Extract all files/folders directly into StrokesPlus.net\Plug-Ins folder using 7-Zip or some other method which doesn't block the DLLs - If you update the current working directory used by S+, this might fail to load the trained data files - Should look like the image below IMPORTANT: You will also need to ensure you have the Microsoft VC Runtime installed: MSVC Runtime Put this function in Global Actions > Load/Unload > Load (script):This is a wrapper for extracting text based on a language and rectangleCode:function GetTesseractText(lang, rect) {
//Capture an image from the screen using the Rectangle passed in
var memoryImage = new drawing.System.Drawing.Bitmap(rect.Width, rect.Height);
var memoryGraphics = drawing.System.Drawing.Graphics.FromImage(memoryImage);
memoryGraphics.CopyFromScreen(rect.X, rect.Y, 0, 0, new Size(rect.Width, rect.Height));
memoryGraphics.Dispose();
/*
Get Tesseract Page object
First param in Bitmap image
- From code above
Second param is language
- See Plug-Ins\TesseractTrainedData folder for other trained data files
- Appears to be just the first part of the file name; "eng" for English
Third param is Tesseract.EngineMode enum value:
- TesseractOnly = "0"
- LstmOnly = "1"
- TesseractAndLstm = "2"
- Default = "3"
Last param is Tesseract.PageSegMode
- OsdOnly = "0"
- AutoOsd = "1"
- AutoOnly = "2"
- Auto = "3"
- SingleColumn = "4"
- SingleBlockVertText = "5"
- SingleBlock = "6"
- SingleLine = "7"
- SingleWord = "8"
- CircleWord = "9"
- SingleChar = "10"
- SparseText = "11"
- SparseTextOsd = "12"
- RawLine = "13"
- Count = "14"
SPTesseract class also has the function below defined:
List<Rectangle> GetPageSegmentedRegions(Page page, string iteratorLevel)
Pass Page object from the SPTesseract.GetPage method with one of the PageIteratorLevel values:
Tesseract.PageIteratorLevel
- Block = "0"
- Para = "1"
- TextLine = "2"
- Word = "3"
- Symbol = "4"
All other methods/properties are standard Tesseract 4.1.1 from:
- https://www.nuget.org/packages/Tesseract/
- https://github.com/charlesw/tesseract
*/
var tpage = SPTesseract.GetPage(memoryImage, lang, "3", "3");
var ocrText = tpage.GetText();
memoryImage.Dispose();
return ocrText;
}
Example action script to test recognition and show source rectangle:Code://Use the square gesture, to draw a box around the desired area
//Shows the text recognized by Tesseract (English data file) and the dimensions of the rectangle
sp.MessageBox(`Text: ${GetTesseractText("eng", new Rectangle(action.Bounds.X, action.Bounds.Y, action.Bounds.Width, action.Bounds.Height))}
Rect.X: ${action.Bounds.X}
Rect.Y: ${action.Bounds.Y}
Rect.Width: ${action.Bounds.Width}
Rect.Height: ${action.Bounds.Height}`,
"Text, Rect");
Edited by user Tuesday, August 3, 2021 2:53:42 PM(UTC)
| Reason: Updated download to version 1.1 - fixed memory leak
|
 3 users thanked Rob for this useful post.
|
|
|
Rank: Administration
Groups: Translators, Members, Administrators Joined: 1/11/2018(UTC) Posts: 1,308  Location: Tampa, FL Thanks: 28 times Was thanked: 410 time(s) in 351 post(s)
|
IMPORTANTReplace the 1.0 with the new download link (1.1) in the original post. There was a huge memory leak that resulted in each OCR request creating a new OCR engine and page, never disposing of them. Existing scripts should still be compatible, just that behind the scenes new calls to GetPage will: - conditionally reuse the existing engine (if language or engine mode hasn't changed) or release/create a new one
- release the previous Page and create a new one
Also, there are additional calls available to the plugin: - void SetEngine(string lang, string engineMode)
- void ReleaseEngine()
- void ReleasePage()
None of these calls are required, but I would recommend calling ReleasePage after you're done extracting text. If you're not using OCR calls frequently, you can also then call ReleaseEngine - but note that initializing the OCR engine takes some time, so if you release the engine after each call, scripts that use OCR will have a delay as a new engine is instantiated.
|
|
|
|
Rank: Administration
Groups: Translators, Members, Administrators Joined: 1/11/2018(UTC) Posts: 1,308  Location: Tampa, FL Thanks: 28 times Was thanked: 410 time(s) in 351 post(s)
|
Updated function, if you want to pass in a scaling multiplier, e.g. 2 to increase the captured image size by 200% Code:function GetTesseractText(lang, rect, scale) {
//Capture an image from the screen using the Rectangle passed in
var memoryImage = new System.Drawing.Bitmap(rect.Width, rect.Height);
var memoryGraphics = System.Drawing.Graphics.FromImage(memoryImage);
memoryGraphics.CopyFromScreen(rect.X, rect.Y, 0, 0, new System.Drawing.Size(rect.Width, rect.Height));
memoryGraphics.Dispose();
if(!isNaN(scale)) {
//Original Image attributes
var originalWidth = memoryImage.Width;
var originalHeight = memoryImage.Height;
// now we can get the new height and width
var newHeight = parseInt(originalHeight * scale);
var newWidth = parseInt(originalWidth * scale);
var scaledImage = new System.Drawing.Bitmap(newWidth, newHeight);
var scaledGraphics = System.Drawing.Graphics.FromImage(scaledImage);
scaledGraphics.InterpolationMode = System.Drawing.Drawing2D.InterpolationMode.HighQualityBicubic;
scaledGraphics.SmoothingMode = System.Drawing.Drawing2D.SmoothingMode.HighQuality;
scaledGraphics.PixelOffsetMode = System.Drawing.Drawing2D.PixelOffsetMode.HighQuality;
scaledGraphics.CompositingQuality = System.Drawing.Drawing2D.CompositingQuality.HighQuality;
scaledGraphics.Clear(Color.Transparent);
scaledGraphics.DrawImage(memoryImage, 0, 0, newWidth, newHeight);
scaledGraphics.Dispose();
memoryImage.Dispose();
memoryImage = scaledImage;
}
/*
Get Tesseract Page object
First param in Bitmap image
- From code above
Second param is language
- See Plug-Ins\TesseractTrainedData folder for other trained data files
- Appears to be just the first part of the file name; "eng" for English
Third param is Tesseract.EngineMode enum value:
- TesseractOnly = "0"
- LstmOnly = "1"
- TesseractAndLstm = "2"
- Default = "3"
Last param is Tesseract.PageSegMode
- OsdOnly = "0"
- AutoOsd = "1"
- AutoOnly = "2"
- Auto = "3"
- SingleColumn = "4"
- SingleBlockVertText = "5"
- SingleBlock = "6"
- SingleLine = "7"
- SingleWord = "8"
- CircleWord = "9"
- SingleChar = "10"
- SparseText = "11"
- SparseTextOsd = "12"
- RawLine = "13"
- Count = "14"
SPTesseract class also has the function below defined:
List<Rectangle> GetPageSegmentedRegions(Page page, string iteratorLevel)
Pass Page object from the SPTesseract.GetPage method with one of the PageIteratorLevel values:
Tesseract.PageIteratorLevel
- Block = "0"
- Para = "1"
- TextLine = "2"
- Word = "3"
- Symbol = "4"
All other methods/properties are standard Tesseract 4.1.1 from:
- https://www.nuget.org/packages/Tesseract/
- https://github.com/charlesw/tesseract
*/
var tpage = SPTesseract.SPTesseract.GetPage(memoryImage, lang, "3", "3");
var ocrText = tpage.GetText();
memoryImage.Dispose();
SPTesseract.SPTesseract.ReleasePage();
return ocrText;
}
EDIT: Added release page call NOTE - the above script is intended for use in version 0.5.0.0 (beta as of this writing) or higher. For < 0.5.0.0, just replace SPTesseract.SPTesseract with SPTesseract. Edited by user Tuesday, August 3, 2021 11:07:11 PM(UTC)
| Reason: Not specified
|
|
|
|
Forum Jump
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.
Important Information:
The StrokesPlus.net Forum uses cookies. By continuing to browse this site, you are agreeing to our use of cookies.
More Details
Close