Welcome Guest! To enable all features please Login or Register.



Go to last post Go to first unread
#1 Posted : Wednesday, January 20, 2021 7:10:10 AM(UTC)

Rank: Administration


Groups: Translators, Members, Administrators
Joined: 1/11/2018(UTC)
Posts: 865
United States
Location: Tampa, FL

Thanks: 20 times
Was thanked: 222 time(s) in 193 post(s)

I am not versed in Tesseract nor OCR beyond understanding what they are in general. I simply pulled together the necessary files and two supporting methods in the SPTesseract class to be able to make this work.
I have only tested this in the most basic way, selecting an area with a light background and dark, legible, text.

Plug-In and Required Files: (177 MB)

- You must have a minimum version of
- Extract all files/folders directly into\Plug-Ins folder using 7-Zip or some other method which doesn't block the DLLs
- If you update the current working directory used by S+, this might fail to load the trained data files
- Should look like the image below

Put this function in Global Actions > Load/Unload > Load (script):
This is a wrapper for extracting text based on a language and rectangle
function GetTesseractText(lang, rect) {

    //Capture an image from the screen using the Rectangle passed in

    var memoryImage = new drawing.System.Drawing.Bitmap(rect.Width, rect.Height);
    var memoryGraphics = drawing.System.Drawing.Graphics.FromImage(memoryImage);
    memoryGraphics.CopyFromScreen(rect.X, rect.Y, 0, 0, new Size(rect.Width, rect.Height));

    Get Tesseract Page object 
    First param in Bitmap image
      - From code above

    Second param is language
      - See Plug-Ins\TesseractTrainedData folder for other trained data files
      - Appears to be just the first part of the file name; "eng" for English

    Third param is Tesseract.EngineMode enum value: 
      - TesseractOnly = "0"
      - LstmOnly = "1"
      - TesseractAndLstm = "2"
      - Default = "3"

    Last param is Tesseract.PageSegMode
      - OsdOnly = "0"
      - AutoOsd = "1"
      - AutoOnly = "2"
      - Auto = "3"
      - SingleColumn = "4"
      - SingleBlockVertText = "5"
      - SingleBlock = "6"
      - SingleLine = "7"
      - SingleWord = "8"
      - CircleWord = "9"
      - SingleChar = "10"
      - SparseText = "11"
      - SparseTextOsd = "12"
      - RawLine = "13"
      - Count = "14"

    SPTesseract class also has the function below defined:

    List<Rectangle> GetPageSegmentedRegions(Page page, string iteratorLevel)

    Pass Page object from the SPTesseract.GetPage method with one of the PageIteratorLevel values:
      - Block = "0"
      - Para = "1"
      - TextLine = "2"
      - Word = "3"
      - Symbol = "4"

    All other methods/properties are standard Tesseract 4.1.1 from:

    var tpage = SPTesseract.GetPage(memoryImage, lang, "3", "3");
    return tpage.GetText();

Example action script to test recognition and show source rectangle:
//Use the square gesture, to draw a box around the desired area
//Shows the text recognized by Tesseract (English data file) and the dimensions of the rectangle

sp.MessageBox(`Text: ${GetTesseractText("eng", new Rectangle(action.Bounds.X, action.Bounds.Y, action.Bounds.Width, action.Bounds.Height))}
Rect.X: ${action.Bounds.X}
Rect.Y: ${action.Bounds.Y}
Rect.Width: ${action.Bounds.Width}
Rect.Height: ${action.Bounds.Height}`, 
"Text, Rect");

Edited by user Wednesday, January 20, 2021 7:33:29 AM(UTC)  | Reason: Not specified

Users browsing this topic
Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.