How to Redact Content from PDF Documents in C#
GrapeCity Documents for PDF (GcPdf) is a cross-platform library used to create, analyze, and modify PDF documents. We are pleased to announce the release of GcPdf v3.1, which allows users to redact and remove content from PDFs.
GcPdf v3.1 introduces the GcPdf Document redact method of content removal. With the redact method, you can mark specific numbers, phrases, or areas for redaction, and use simple code to remove information safely and permanently.
Removing sensitive information from a PDF is a two-step process. Both steps use the redact annotation option.
Step 1: Mark the content in need of redaction
For example, you may want to remove confidential dates, social security numbers, or other customer information. You can also redact sections of the PDF containing charts, graphs, and images.
Step 2: Redact the sensitive content with a secure overlay
Once redacted, the content appears blacked out and is unreadable. The redacted information cannot be extracted or copied and pasted into another document.
Use Case: GcPdf v3.1 Redact Method with Redact Annotation
Your company generates customer invoices before recording quarterly sales. Before circulating the invoices to the staff, you must remove customer information from the PDFs.
Using GcPdf v3.1, apply complete redaction, and remove the customer details from the invoices. In this instance, you will redact an area of the PDF.
1. Load the PDF in the GcPdfDocument instance.
var doc = new GcPdfDocument();
using (var fs = new FileStream(Path.Combine('Resources', 'PDFs', 'SalesInvoice.pdf'),
FileMode.Open, FileAccess.Read))
{
doc.Load(fs);
}
2. Mark the space containing the customer's details. Set the overlay color and text.
var redact = new RedactAnnotation();
var page = doc.Pages[0];
var tmap = page.GetTextMap();
foreach (ITextLine tline in tmap)
{
var rc = new RectangleF(90,230,122,70);
redact = new RedactAnnotation()
{ Rect = rc,
MarkBorderColor = Color.Red,
MarkFillColor = Color.Yellow,
OverlayText = 'Redacted.',
OverlayFillColor = Color.LightSkyBlue,
Page = page };
}
3. Save the marked PDF for comparison with the final redacted PDF
"Stream" refers to the instance of FileStream containing the created file.
doc.Save(stream);
doc.Redact();
4. Use the redact method in the GcPdf Document to remove the sensitive content
All the redact annotation objects are considered. We do not specify any object in the argument.
5. Save the redacted PDF.
doc.Save("redactedFile.pdf");
You have successfully redacted customer information from the PDF. You cannot select or copy the redacted content or use any PDF tool to read it.
You can download the sample here.
Redact Annotation Options in GcPdf v3.1
If no argument is passed in the redact method, all content marked for redaction using redact annotations, will be removed from the document.
If any specific area needs redaction, use the specific redact annotation to remove the content from the document.
GcPdfDocument.Redact(GrapeCity.Documents.Pdf.RedactOptions):
Redacts the document using all redact annotation instances in the document.
GcPdfDocument.Redact(GrapeCity.Documents.Pdf.Annotations.RedactAnnotation, GrapeCity.Documents.Pdf.RedactOptions):
Redacts the document to remove a particular redact annotation instance.
GcPdfDocument.Redact(System.Collections.Generic.IList<GrapeCity.Documents.Pdf.Annotations.RedactAnnotation>,GrapeCity.Documents.Pdf.RedactOptions):
Redacts the document to remove the list of redact annotation objects defining the area to redact.
Visit Help | Demo for more details.
Let us know how redacting sensitive content helps you in your application.
Happy Coding!