On this Application “potluck” we talked about some applications that we enjoy using. The best part about these episodes is that we all can try something new or at least have a new tool on the ready when we need it.
Thanks so much for your continued support in watching, sharing and subscribing to Linux Saloon.
Tesseract is a command line tool that does optical character recognition (or OCR) for extracting text from digital images.
How I use it is to interpret text inside screen-grabs which is especially useful when watching ‘YouTubers’ writing code or going to links on video, I then pipe it into a clipboard manager so I can grab the text from a screen-grab using a keyboard shortcut. Examples below:
This was a bit long for the show but if you want the best accuracy from screengrab | tesseract you want this script.
tesseract was designed for scanning documents which tend to be very large at 100% scale so it sometimes struggles with screen text. Upsizing screen-grabs by 400% fixes that.
tesseract also sometimes adds whitespace/newlines to the beginning or end so it’s nice to remove those.
The following script uses ffmpeg or mogrify for the upscaling if one of those packages is installed. Otherwise it OCRs without upscale. trailing/leading newlines and whitespace are removed and it supports Wayland (wlroots only?) and X11 based distros.
It also doesn’t use temp files, all the magic happens over pipes and process substitution.
/usr/local/bin/ocr2clip
#!/usr/bin/env bash
# License: BSD-0 Clause, Ulfnic
:<<-'Comment'
Dependencies:
- tesseract
Wayland dependencies:
- grim
- slurp
- wl-copy
X11 dependencies:
- maim
- xsel or xclip
Optional dependencies:
- ffmpeg or ImageMagick (for vastly better accuracy)
Comment
set -o errexit
print_stderr() {
[[ $2 ]] && printf "$2" "${@:3}" 1>&2
[[ $1 == '0' ]] || exit $1
}
# Define display server
if [[ $DISPLAY ]]; then
[[ $WAYLAND_DISPLAY ]] && display_server='xwayland' || display_server='x11'
else
display_server='wayland'
fi
# Check dependencies
type tesseract &> /dev/null || print_stderr 1 '%s\n' 'Missing dependency: tesseract'
if [[ $display_server == 'wayland' ]] || [[ $display_server == 'xwayland' ]]; then
type wl-copy &> /dev/null || print_stderr 1 '%s\n' 'Missing dependency: wl-copy'
type slurp &> /dev/null || print_stderr 1 '%s\n' 'Missing dependency: slurp'
type grim &> /dev/null || print_stderr 1 '%s\n' 'Missing dependency: grim'
clipboard_cmd='wl-copy'
elif [[ $display_server == 'x11' ]]; then
type maim &> /dev/null || print_stderr 1 '%s\n' 'Missing dependency: maim'
if type xsel &> /dev/null; then
clipboard_cmd='xsel --input --clipboard'
elif type xclip &> /dev/null; then
clipboard_cmd='xclip -in -selection clipboard'
else
print_stderr 1 '%s\n' 'Missing dependency: xsel or xclip'
fi
fi
# Stdout user's screen selection
function screen_select(){
if [[ $display_server == 'wayland' ]] || [[ $display_server == 'xwayland' ]]; then
# Get selection and honor escape key
grim -t png -l 9 -g "$(slurp)" -
elif [[ $display_server == 'x11' ]]; then
maim --select --hidecursor --format=png --quality=10 /dev/fd/1
fi
}
# OCR screen selection and deliver to clipboard
function ocr_selection(){
str=$( tesseract stdin stdout 2>/dev/null )
# Remove leading and trailing whitespace
str=${str#"${str%%[![:space:]]*}"}
str=${str%"${str##*[![:space:]]}"}
# Place in clipboard
printf '%s' "$str" | $clipboard_cmd
}
# Empty clipboard to avoid false positives
printf '' | $clipboard_cmd
# If a suitable program is available, upscale the image by 4x using either ffmpeg or ImageMagik to improve accuracy
if type ffmpeg &> /dev/null; then
ffmpeg \
-hide_banner \
-loglevel error \
-i <( screen_select ) \
-vf scale=iw*4:ih*4 \
-f image2 \
>( ocr_selection ) \
-y \
elif type mogrify &> /dev/null; then
screen_select \
| mogrify \
png:- \
-modulate 100,0 \
-resize 400% \
| ocr_selection
else
screen_select | ocr_selection
fi