Programmatically Search and Highlight Text in PDFs using C# in .NET
Quick Start Guide | |
---|---|
What You Will Need |
Visual Studio Code .NET 8 NuGet Package: DS.Documents.Pdf 7.0.3 |
Controls Referenced | |
Tutorial Concept | This tutorial discusses programmatically conducting text searches and highlighting found text in PDFs using a C#/.NET PDF API. |
This tutorial delves into different ways to programmatically search, find, and highlight text within PDF documents using .NET/C# API. We will go over loading a PDF, conducting text searches, and creating highlight markups with nuanced colors and shapes. In this example, we will use Document Solutions for PDF (DsPdf, formerly GcPdf), which enables seamless integration for C#/.NET software developers seeking advanced PDF generation functionalities. This piece will showcase the generated PDFs using the included JavaScript Document Solutions PDF Viewer.
Learn More About Document Solutions for PDF by Downloading a Trial Today!
This blog will cover how to conduct the following PDF text searches programmatically using a C# .NET PDF API:
- Find and Highlight Text in a PDF Documents
- Search for Text on a Specific PDF Page
- Find and Highlight Text From a Specific Range of PDF Pages
- Search for Text in a PDF Based on Structure Tags
- Find and Markup Transformed Text in PDFs
To Follow Along, Download a Sample App for this Tutorial Here.
Find and Highlight Text in a PDF Document Using C#
DsPdf simplifies conducting programmatic text searches in PDF documents through its FindText method, enabling users to locate all instances of specific text. The highlighting of each found item can be achieved using the System.Drawing graphics class along with the bounds of the identified text. Users can customize text search parameters through the FindTextParams constructor, with options such as wholeWord and matchCase. These parameters provide flexibility, allowing users to determine whether the search should match whole words, be case-sensitive, or both.
Note: To follow along with this section, you must include the GrapeCity.Documents.Common namespace.
The following code will search for the whole word "wetlands" in a PDF and then highlight the found text:
Developers can do a multitude of searches and apply different types of markups. See our online documentation and demo explorer to learn more.
Search for Text on a Specific PDF Page using C#
In specific scenarios, users might opt to narrow down text searches to a particular page rather than scanning the entire PDF document. This can be achieved by accessing the text map interface of a specific page using its index and conducting a text search exclusively within that page's text map. For instance, the provided code demonstrates the following steps: instantiating a new FindTextParams class and performing a text search within the Text Map using the FindText method.
The following code demonstrates this by searching and highlighting the word “the” on the 2nd page of the PDF document.
Find and Highlight Text From a Specific Range of PDF Pages Using C#
Searching for text within a specific page range in a PDF is crucial for focused analysis. This targeted approach improves performance and isolates content for detailed examination. Developers can conduct this text search programmatically easily by defining the OutputRange class of the FindText methods. The OutputRange class provides the searchRange property.
Note: To follow along with this section, you must include the GrapeCity.Documents.Common namespace.
The code below will search and highlight text only on pages 2 and 3 of the provided PDF document.
Search for Text in a PDF Based on Structure Tags
Searching for text based on structural tags offers an alternative method for specifying parameters in a text search. For instance, to locate headers like H1, H2, or H3, users can employ the GetLogicalStructuremethod to retrieve the PDF document's structure. By specifying the desired tag item, such as "H1," users can initiate a process to obtain the PDF structure, searching the page root for the specified structural tag and iteratively navigating through the located tags to highlight the tag containing the desired text.
Note: To follow along with this section, you must include the GrapeCity.Documents.Pdf.Recognition.Structure namespace.
The following code will get the PDF’s H1 tags and search through them for the text “C1Olap”.
To learn more about reading PDF structure tags using C#, check out the online Read Structure Tags Demo.
Find and Markup Graphically Transformed Text in PDFs
PDFs are known to contain graphically transformed text; drawing text on top of an existing PDF using page graphics. This is typical when adding a logo or watermark to a PDF. DsPdf supports the ability to search for text specifically within graphically transformed text and highlight the found items.
To accomplish this, use DsPdf's FindText method to search for the wanted text.
Then, loop through each page containing the searched text and create a content stream using DsPdf's ContentStreams property. With this stream, get the graphics on the page using the GetGraphics method and apply the highlighting to the bounds of the found text from the returned graphics.
The provided code snippet conducts a search within a PDF document to identify graphically transformed text acting as a logo watermark for specified text, then highlighting the found instances with blue rectangles.
Try our online demo for Finding Transformed Text using a .NET PDF API to see another example.
Learn More About Document Solutions for PDF by Downloading a Trial Today!
Learn More About this .NET C# PDF API
This article scratches the surface of the full capabilities of Document Solutions for PDF. Learn how to create, extract, modify, redact, apply signatures, and more with this .NET C# PDF API. Document Solutions offers a full-fledged PDF solution, including a client-side JavaScript PDF viewer control. The JS PDF viewer control is showcased throughout this piece. To learn more about the .NET C# API and its JavaScript PDF viewer, check out our demos and documentation:
Document Solutions for PDF, .NET C# PDF API
Online Demo Explorer | Documentation
Document Solutions PDF Viewer, JavaScript PDF viewer control
Online Demo Explorer | Documentation