How to Programmatically Delete and Replace Text in PDF Files Using C#
The need for editing, deleting, or replacing text in a PDF is not new. However, due to the immutable format for final document versions that originally made PDFs the go-to documents for legal reasons, text editing within a PDF is not as cut and dry as one might think. To begin with, the text in a PDF is based on positioning within a page and is not necessarily linear.
In other words, what the finished product looks like, is not really how the text appears within the ‘code’ of the PDF document. Combine this with fonts, font sizes, and how to manage text that is not deleted, and it can be a daunting task to manage text in a PDF.
So the first question might be; why not just change the original document (word, text, or other word processing document)? The short answer is; that these documents may or may not be available. Many times, once the text is put into a PDF format, many people consider the ‘final’ version and sometimes delete original documents to save space.
This article will walk through the basics of the new features of Document Solutions for PDF (DsPdf, previously GcPdf) by showing a couple of examples of deleting text and replacing text programmatically using C#. In this blog, we will cover:
- What's Happening Behind the Scenes
- The API
- Sample C# Code
- Sample of Results From Demonstration Code
- C# Demonstration of Removing and Replacing Text within a PDF Document
Ready to test Document Solutions for PDF? Download a FREE trial today!
What’s Happening Behind the Scenes?
As mentioned in the opening paragraphs, managing text within a PDF programmatically or otherwise is not as easy as it seems. Before we get into examples, we need to understand what is happening behind the scenes and the limitations and benefits of the new features in this API.
First, when working with text, it is necessary to understand what other text is around the text that is being manipulated. If one deletes a fragment, it’s important to understand what the text after that needs to do. It is necessary to check how the deleted fragment and those that come after it correlate, whether they belong to the same PDF operator or different ones, and whether there are text positioning operators between them or not. When calculating the text position, the current transformation matrices must be considered for the deleted and "shifted" text.
Knowing this information, it may become obvious that it is not always possible to correctly recalculate the position of the text exactly, like if the fragment being deleted is contained within the FormXObject, and the text after it is not.
To try to manage this, DsPdf implements two text deletion modes to help manage this appropriately:
- Standard - This mode would work as expected, whereby the text following the deleted items will shift based on the deleted text. This also works with vertical and RTL text. However, in certain situations with complex textual layouts, this can lead to problems with the display and not work correctly.
- PreserveSpace - When using this mode, an empty space appears where the deleted text was. The text after the deleted text does NOT move. This mode should always work as designed and not cause any errors.
The API
With the new features, there are several new methods and properties added to the current API within the interface ITextMap:
- DeleteText(TextMapFragment range, DeleteTextMode mode); - Deletes the text within a specified range.
- ReplaceText(TextMapFragment range, string text); - Replaces specified range of text with new text.
- bool Invalid { get; } - Once DeleteText or ReplaceText have been called, the ITextMap becomes invalid and cannot be used. This property is used to check the status of this ITextMap.
It is important to note that the Page and GcPdfDocument classes have DeleteText() and ReplaceText() methods, but they all work via ITextMap, e.g., Page.DeleteText() creates a text map for the page and calls its DeleteText() method.
Here is a link to the complete API Reference.
Sample C# Code
The following code example explores the use case of a rental lease. We look at replacing names, addresses, and other pertinent information in one lease for the lease to be created for a different individual. In this case, we are changing names and information from “Jane Donahue’s” to “John Doe’s” information. The code is commented on heavily to help understand how the tasks are accomplished.
// delete word "wetlands" from the first page using DeleteTextMode.Standard
using (FileStream fs = new FileStream("Wetlands.pdf", FileMode.Open,
FileAccess.Read, FileShare.Read))
{
GcPdfDocument doc = new GcPdfDocument();
doc.Load(fs);
FindTextParams ftp = new FindTextParams("wetlands", true, false);
doc.Pages[0].DeleteText(ftp, DeleteTextMode.Standard);
doc.Save("wetlands_deleted.pdf");
}
// delete word "wetlands" from the first page using DeleteTextMode.PreserveSpace
using (FileStream fs = new FileStream("Wetlands.pdf", FileMode.Open,
FileAccess.Read, FileShare.Read))
{
GcPdfDocument doc = new GcPdfDocument();
doc.Load(fs);
FindTextParams ftp = new FindTextParams("wetlands", true, false);
doc.Pages[0].DeleteText(ftp, DeleteTextMode.PreserveSpace);
doc.Save("wetlands_deleted_PreserveSpace.pdf");
}
// delete word "wetlands" from the document
using (FileStream fs = new FileStream("Wetlands.pdf", FileMode.Open,
FileAccess.Read, FileShare.Read))
{
GcPdfDocument doc = new GcPdfDocument();
doc.Load(fs);
FindTextParams ftp = new FindTextParams("wetlands", true, false);
doc.DeleteText(ftp, DeleteTextMode.Standard);
doc.Save("wetlands_deleted_doc.pdf");
}
// replace word "wetlands" with "WETLANDS" in first page
using (FileStream fs = new FileStream("Wetlands.pdf", FileMode.Open,
FileAccess.Read, FileShare.Read))
{
GcPdfDocument doc = new GcPdfDocument();
doc.Load(fs);
FindTextParams ftp = new FindTextParams("wetlands", true, false);
doc.Pages[0].ReplaceText(ftp, "WETLANDS", null, null);
doc.Save("wetlands_FirstPage.pdf");
}
// replace word "wetlands" with "WETLANDS" in document
using (FileStream fs = new FileStream("Wetlands.pdf", FileMode.Open,
FileAccess.Read, FileShare.Read))
{
GcPdfDocument doc = new GcPdfDocument();
doc.Load(fs);
FindTextParams ftp = new FindTextParams("wetlands", true, false);
doc.ReplaceText(ftp, "WETLANDS", null, null, null);
doc.Save("wetlands_Document.pdf");
}
Sample of Results From Demonstration Code
This first demonstration replaces a name (and some other items) within a Lease Agreement. Below is a small piece of code showing the replacement names, followed by the original and updated PDF (screenshots only).
// Replace:
// "Jane Donahue" -> "John Doe"
// "(123)098-7654" -> "(007)123-4567"
// "janed@example.com" -> "johnd@example.com"
doc.ReplaceText(new FindTextParams("Jane Donahue", false, true), "John Doe");
doc.ReplaceText(new FindTextParams("(123)098-7654", false, true), "(007)123-4567");
doc.ReplaceText(new FindTextParams("janed@example.com", false, true), "johnd@example.com");
// "13-Dec-20 22:16:00" -> date now
// "13-Dec-22 22:16:00" -> date now + 2 years
var termStart = DateTime.Now;
var termEnd = DateTime.Now + TimeSpan.FromDays(365 * 2);
doc.ReplaceText(new FindTextParams("13-Dec-20 22:16:00", false, true), termStart.ToShortDateString() + " " + termStart.ToShortTimeString());
doc.ReplaceText(new FindTextParams("13-Dec-22 22:16:00", false, true), termEnd.ToShortDateString() + " " + termStart.ToShortTimeString());
PDF Before changes (Original PDF):
PDF After Changes:
To get the complete code for these demonstrations, be sure to download it here - Once downloaded, follow the instructions below to run the demonstration.
C# Demonstration of Removing and Replacing Text within a PDF Document
Step 1: Unzip the demo package
Step 2: Open the solution (DeleteText.sln or ReplaceText.sln)
Step 5: Run the application and check out the results!
Feel free to contact us with questions or comments, and happy coding!
Try Document Solutions for PDF today, download a FREE trial!