Welcome to gbCapture, which can extract text from applications that display
multiple pages of text but which do not provide built-in methods of copying
and saving the displayed text of the entire document.

gbCapture Introduction
Here's an image of the gbCapture main window.
With gbCapture, the user displays the target application (the one that displays
the desired text) in the center of the desktop. gbCapture will capture an image
of the content of that application and use the free tesseract library to extract
the text from the image.
Tesseract can be downloaded from
UB Mannheim and must be installed in its default location.
To capture multiple pages, gbCapture starts by capturing an image of the
current page, extracting the text from the image and then sending a keyboard "next page" command to the target application,
That sequence is repeated for as many pages as the user
specifies.
gbCapture toolbar functions:
- Exit - close gbCapture
- Prev - send a "previous page" command to the target application
- Next - send a "next page" command to the target application
- Pages - display popup dialog for entering the number of pages to be captured
- Capture - capture pages from the target document
- Stop - stop capturing pages
- Target - display an image of the desktop, with an outline around the target window
- Files - display images and text files from the most recent capture
- Document - display the captured document (composite of all extracted text files)
- SaveAs - save the document text to a new location
- Settings - display current setting values (set using keyboard shortcuts)
- Help - display this local Help file
See the Keyboard Shortcuts section below for additional commands and settings supported by gbCapture

Target Application
gbCapture detects the application covering the center of the desktop and uses
that application as the source for text. To ensure that the user is capturing the desired text, gCapture can display an image of the PC desktop with the target application highlighted:

Captured Bitmaps and Extracted Bitmap Text
gbCapture captures an image of a target application window where text is displayed
and saves that image to a file. Then, tesseract is used to extract the text from
each saved image. That text is also saved to a file and later combined to create
the entire document. Bitmaps and text files are named simply a "0001" to "000X",
according to how many pages the user decides to capture. The number in the file names has no particular correlation to the page number that was captured from the target window.
gbCapture can display the most recent set of saved images and their corresponding text content. A list of bitmaps is shown on the left, with the selected
bitmap and its extracted text shown on the right.
files.

Document Text
The text extracted from each bitmap is appended into a single document file, which gbCapture can then display as shown in this next image. gbCapture can copy the document content to the clipboard or save the document file to a new location.

Keyboard Shortcuts
gbCapture supports several keyboard shortcuts, which perform less
frequently used actions. Using shortcuts helps minimize the footprint
and complexity of the the gbCapture main screen.
- CS-B - toggle use of the "Beep" sound
- CS-C - copy all text
- CS-D - capture method
- CS-E - clear all generated image and text files
- CS-I - set time allowed for capturing an image
- CS-K - change colors
- CS-L - toggle which navigation icons are used (up/down vs left/right)
- CS-P - toggle used of page numbers in the composite text file
- CS-S - save document as
- CS-U - online update of gbCapture
- CS-Esc - exit from gbCapture
- F1 - display this Help file
"CS-B" means to press and hold the Control and Shift keys while
pressing the "B" key.

Operating Notes
Tesseract
For tesseract to work most accurately, the text must be fully visible - meaning that there must be empty margins surrounding the text. Partially visible lines of text will be mis-read by tesseract.
Some document viewers, such as Word, WordPad and Kindle for the PC, provide that margin.
Other document viewers, such as NotePad, Browsers, and RichEdit controls, allow display of partial lines of text and are not suitable for use with gbCapture.
End of Document
gbCapture does not limit the number of Pages a user can request to be captured, but it will stop automatically when it reaches the end of the document.
The end-of-document is assumed when two consecutive pages result in exactly the same extracted text.
Partial Last Page
The last page of content in some document viewers, such as Word, present a partial page of content, with blank lines used to fill the page below the content. This allows gbCapture to correctly capture the final page of a document.
However, some document viewers fill the display of the last page with content from the previous page in order to avoid blank lines on the last page. This will cause gbCapture to incorrectly report the content of the last captured page.
GoDo List
Here's some of the items I want to address:
- Resizable windows
- Large menu fonts (context and toolbar button dropdown menus)
- Better/faster tesseract performance
- Better cleanup of the extracted text

Comments and suggestions are welcome. Send to Gary Beene at gbeene@airmail.net.