Transform Handwritten Text into Digital Text with Hugging Face's Microsoft/Trocr-Large-Handwritten Model
Learn how to effortlessly convert handwritten text into editable digital text using the power of the Microsoft/Trocr-Large-Handwritten model from Hugging Face. With the help of Gradio, a user-friendly interface, you can streamline the process of extracting information from handwritten notes.
The provided set of commands is a series of instructions for creating a Python environment, activating it, installing dependencies, and running a Python script. Let me break down each step for you:
conda create --name trocr-large python=3.11
This command creates a new Conda virtual environment named "trocr-large" with Python version 3.11. Virtual environments are isolated environments where you can install specific packages and dependencies without affecting your system-wide Python installation.
conda activate trocr-large
After creating the environment, you need to activate it. This ensures that any subsequent package installations or code executions occur within the isolated environment you just created.
conda install -c huggingface transformers
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
conda install -c conda-forge gradio
These commands install the necessary dependencies into the "trocr-large" environment:
- The first command installs the "transformers" package from the Hugging Face repository.
- The second command installs PyTorch, torchvision, and torchaudio, specifying CUDA version 11.7 for GPU support.
- The third command installs Gradio, a Python library for creating user interfaces for machine learning models.
python handwritten.py
Finally, this command runs a Python script named "handwritten.py" within the "trocr-large" virtual environment. This script likely contains code that utilizes the installed packages and performs some functionality related to handwritten text processing, potentially using the Microsoft/Trocr-Large-Handwritten model.
This code sets up a handwritten text recognition application using the Hugging Face Transformers library, Gradio for the user interface, and Microsoft's TrOCR model for handwritten text recognition. Let's break down the code step by step:
- Importing Libraries: The code starts by importing the necessary libraries: transformers for working with pre-trained models and tokenizers. PIL (Python Imaging Library) for image processing. requests for fetching images from URLs. gradio for creating a user interface for the application.
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests
import gradio as gr
- Defining the Title:
The
titlevariable contains the title for the Gradio application, which is displayed as "Welcome on your first handwritten recognition app!".
title = "Welcome on your first handwritten recognition app!"
- Loading the Model: The code loads the TrOCR model and processor from Hugging Face's model hub: TrOCRProcessor is used to preprocess images for the TrOCR model. VisionEncoderDecoderModel is the main TrOCR model for handwritten text recognition.
processor = TrOCRProcessor.from_pretrained('microsoft/trocr-large-handwritten')
model = VisionEncoderDecoderModel.from_pretrained('microsoft/trocr-large-handwritten')
- Prediction Function:
The
predictfunction is defined to perform handwritten text recognition: It takes three inputs:ImageUrl(URL of an image),imgDraw(handwritten image), andimgUplod(uploaded image). Depending on which input is provided, it fetches the image in RGB format. It preprocesses the image using theprocessorand generates text predictions using themodel. The predicted text is returned.
#predict the image using microsoft/trocr-large-handwritten model loaded earlier
pixel_values = processor(images=image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
return generated_text
- Gradio Interface:
The code sets up the Gradio interface:
gr.Interface isused to create a user-friendly interface for the prediction function. The fn parameter specifies the prediction function (predict). inputs define the input components of the interface:"text"is a text input.gr.Sketchpadallows users to draw handwritten text.gr.Imagelets users upload an image. outputs specifies that the output should be in text format. The title parameter displays the title defined earlier.
interface = gr.Interface(fn=predict, inputs=["text",gr.Sketchpad(type="pil",shape=(500, 500)),gr.Image(type="pil")], outputs="text", title=title )
- Launching the Interface:
Finally, the interface is launched:
interface.launchstarts the Gradio interface on the server with the IP address "0.0.0.0" and port 8080.
interface.launch(server_name="0.0.0.0", server_port=8080)
In summary, this code creates a web-based handwritten text recognition application where users can input text, draw handwritten text, or upload an image containing handwritten text. The application then uses the Microsoft TrOCR model to recognize and display the text from these inputs in real-time through a user-friendly interface.