Automating OCRPhoto from Unsplash

Originally Posted On: https://medium.com/@chathumalsangeeth/automating-ocr-testing-for-web-applications-with-tesseract-and-selenium-in-java-1f6954dc23da

Have you ever wondered how web applications are tested for their accuracy in reading text from images? One way to accomplish this is through Optical Character Recognition (OCR) testing. OCR testing involves extracting text from images and verifying its accuracy against the original text. With the rise of web applications, OCR testing has become a crucial part of web testing. In this article, we’ll explore how Tesseract OCR and Selenium can be used together in Java to automate OCR testing for web applications.

Step 1:

Download the Tesseract installer for Windows

Home

You can’t perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

github

Press enter or click to view image in full size

Step 2:

Create a Maven Java project and add tess4j to pom.xml

<dependency>
    <groupId>net.sourceforge.tess4j</groupId>
    <artifactId>tess4j</artifactId>
    <version>5.6.0</version>
</dependency>

https://mvnrepository.com/artifact/net.sourceforge.tess4j/tess4j

Additional Info:

For teams working with Selenium in C# instead of Java, IronOCR provides a native .NET solution that integrates smoothly with Selenium WebDriver. Unlike Tess4J, which requires external Tesseract installation and tessdata configuration, IronOCR bundles everything you need.

using IronOcr;
using OpenQA.Selenium;

// After capturing screenshot with Selenium
var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("screenshot.png");
var result = ocr.Read(input);
Assert.IsTrue(result.Text.Contains("Expected Text"));

IronOCR includes built-in image correction for screenshots with low contrast or noise, which is common when capturing web elements. This reduces the preprocessing work typically needed before running OCR on dynamic web content.

Learn more: https://ironsoftware.com/csharp/ocr/

Step 3:

  1. Use Selenium to navigate to the web page containing the image with text that needs to be OCR tested.
  2. Use Selenium to locate the image element and get a Screenshot, and save the image.
  3. Use Tess4J to perform OCR on the saved image and get the recognized text.
  4. Compare the recognized text with the expected text using an assertion or comparison method.

Here’s an example code snippet that demonstrates how to perform OCR testing using Tesseract OCR and Selenium in Java:

The objective is to utilize Selenium for browsing to the https://www.wiley.com/en-us webpage, taking a screenshot of its logo, and verifying whether the text in the image matches an anticipated value.

Press enter or click to view image in full size

import net.sourceforge.tess4j.ITesseract;
import net.sourceforge.tess4j.Tesseract;
import net.sourceforge.tess4j.TesseractException;
import org.openqa.selenium.By;
import org.openqa.selenium.OutputType;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;
import org.openqa.selenium.io.FileHandler;

import java.io.File;
import java.io.IOException;
import java.time.Duration;

public class OCRTest {

    public static void main(String[] args) throws TesseractException, IOException {

        // Create a new instance of the Chrome driver
        ChromeOptions options = new ChromeOptions();
        options.addArguments("--headless=new");
        WebDriver driver = new ChromeDriver(options);
        driver.manage().window().maximize();
        driver.manage().timeouts().implicitlyWait(Duration.ofSeconds(10));

        // Navigate to the web page containing the image with text
        driver.get("https://www.wiley.com/en-us");

        // Locate the image element and download the image
        WebElement imageElement = driver.findElement(By.xpath("//img[@alt='Wiley Consumer Logo']"));
        File src = imageElement.getScreenshotAs(OutputType.FILE);
        String filePath = System.getProperty("user.dir") + "\image.png";
        FileHandler.copy(src, new File(filePath));

        // Perform OCR on the downloaded image
        ITesseract tesseract = new Tesseract();
        
        // Set Tesseract-OCR tessdata path which you installed in Step 1
        tesseract.setDatapath("C:\Program Files\Tesseract-OCR\tessdata");
        String recognizedText = tesseract.doOCR(new File(filePath));
        recognizedText = recognizedText.replaceAll("\n", "");

        // Compare the recognized text with the expected text
        String expectedText = "WILEY";
        if (recognizedText.equals(expectedText)) {
            System.out.println("OCR test successful.");
        } else {
            System.out.println("OCR test failed.n Expected text: " + expectedText + "n Recognized text: " + recognizedText);
        }

        // Quit the driver
        driver.quit();
    }
}
Press enter or click to view image in full size

OCR test failed Logo actual text is mismatched with expected text
Press enter or click to view image in full size

Why I utilized below code statement?

recognizedText = recognizedText.replaceAll("\n", "");

replaceAll() method is called on the recognizedText string. The first argument is the regular expression to be replaced, which is "\n" in this case (the double backslash is needed to escape the backslash character in the regular expression). The second argument is the replacement string, which is an empty string ("") in this case.

After executing this code, the recognizedText string should no longer contain any newline characters.