How to Manage Text Options in PDF Documents Using .NET C#
As we are all aware, the days of typewriters, printing presses, and other mechanical devices for creating documents are a thing of the past. Though there is still a need for these to mass-produce certain documents such as Newspapers, Magazines, and Books, generally speaking, most, if not the majority, of text processing is now completed through a computer.
Either word processing programs, such as MS Word, or online processing, such as Google Docs, all utilize text and need powerful tools to produce quality documents for both online and offline utilization, including hitting the old printing presses (potentially).
Although GrapeCity has many options for working with documents and ultimately bringing the text into PDF files, an essential tool of the GcPdf API Library is the ability to manipulate text directly within a new or existing PDF file.
As always, we are unlikely to win over many friends or family members with lengthy conversations about how to manipulate text in a PDF, but as a member of the professional developer community, it is absolutely a requirement to have these conversations and understand ways to make this happen. This blog explores several powerful features of the GcPdf Library to provide these manipulations.
Ready to Get Started? Download GrapeCity Documents Today!
Why manipulate text in a PDF?
This is a common question given the history of PDF and the original design of this format being one of security and wanting to keep changes from occurring. However, times change, and although the security of PDFs remains strong, there is an ever-increasing need for manipulating text directly within PDF files to meet legal and regulatory requirements and make life easier for users. With this in mind, let's look at a few of the powerful methods, properties, and features of this API.
PDF Text Rendering using .NET C#
As a general basic requirement, creating or rendering new text in a PDF can be a tricky situation. However, utilizing the GrapeCity GcPdf tools makes this far less onerous and certainly more streamlined. Let's take a quick look at how to get started. By the way, the demonstrations for all of the use cases in this blog can be found here!
See the comments in the below example for explanations, but to summarize, here are the steps:
- Instantiate the document using GcPdfDocument() constructor
- Add a page to the document
- Use one of the two methods below for rendering text:
- Use MeasureString/DrawString pair of methods
- Use TextLayout class/DrawTextLayout method
- Save the document
//
// This code is part of GrapeCity Documents for PDF samples.
// Copyright (c) GrapeCity, Inc. All rights reserved.
//
using System;
using System.IO;
using System.Drawing;
using GrapeCity.Documents.Pdf;
using GrapeCity.Documents.Text;
using GrapeCity.Documents.Drawing;
namespace GcPdfWeb.Samples.Basics
{
// This sample demonstrates the basics of rendering text in GcPdf.
// The two main approaches are:
// - using the MeasureString/DrawString pair, or
// - using the TextLayout directly.
// While the first approach may be easier in simple cases,
// the second approach (using TextLayout) is much more powerful
// and generally speaking yields better performance.
// Please read the comments in code below for more details.
// See also CharacterFormatting, PaginatedText, ParagraphAlign,
// ParagraphFormatting, TextAlign.
public class TextRendering
{
public void CreatePDF(Stream stream)
{
var doc = new GcPdfDocument();
var page = doc.NewPage();
var g = page.Graphics;
// By default, GcPdf uses 72dpi:
const float In = 72;
// TextFormat class is used throughout all GcPdf text rendering to specify
// font and other character formatting:
var tf = new TextFormat() { Font = StandardFonts.Times, FontSize = 12 };
// 1.
// The easiest way to render a short string on a page at an arbitrary location,
// when you are 100% sure that the string will fit in the available space,
// is to use the GcGraphics.DrawString() overload accepting jus the point
// at which to draw the string:
g.DrawString("1. Test string. Please read the extensive comments in this sample's code." +
"(Note that line breaks are allowed even in the simplest DrawString overload.)",
tf, new PointF(In, In));
// 2.
// Another overload taking a rectangle instead, plus alignment and wrapping
// options, is also available and provides a bit more flexibility.
// The parameters are:
// - the text string;
// - the text format;
// - the layout rectangle;
// - (optional) text alignment (the default is leading, left for LTR languages);
// - (optional) paragraph alignment (the default is near, top for top-to-bottom flow);
// - (optional) word wrapping (the default is true):
g.DrawString("2. A longer test string which will probably need more than the allocated" +
"4 inches so quite possibly will wrap to show that DrawString can do that.",
tf,
new RectangleF(In, In * 2, In * 4, In),
TextAlignment.Leading,
ParagraphAlignment.Near,
true);
// 3.
// Complementary to DrawString, a MeasureString() method is available
// (with several different overloads), and can be used in pair with
// DrawString when more control over text layout is needed:
string tstr3 = "3. Test string to demo MeasureString() used with DrawString().";
// Available layout size:
SizeF layoutSize = new SizeF(In * 3, In * 0.8f);
SizeF s = g.MeasureString(tstr3, tf, layoutSize, out int fitCharCount);
// Show the passed in size in red, the measured size in blue,
// and draw the string within the returned size as bounds:
PointF pt = new PointF(In, In * 3);
g.DrawRectangle(new RectangleF(pt, layoutSize), Color.Red);
g.DrawRectangle(new RectangleF(pt, s), Color.Blue);
g.DrawString(tstr3, tf, new RectangleF(pt, s));
// 4.
// A much more powerful and with better performance, way to render text
// is to use TextLayout. (TextLayout is used anyway by DrawString/MeasureString,
// so when you use TextLayout directly, you basically cut the work in half.)
// A TextLayout instance represents one or more paragraphs of text, with
// the same paragraph formatting (character formats may be different,
// see MultiFormattedText).
var tl = g.CreateTextLayout();
// To add text, use Append() or AppendLine() methods:
tl.Append("4. First test string added to TextLayout. ", tf);
tl.Append("Second test string added to TextLayout, continuing the same paragraph. ", tf);
// Add a line break, effectively starting a new paragraph:
tl.AppendLine();
tl.Append("Third test string added to TextLayout, a new paragraph. ", tf);
tl.Append("Fourth test string, with a different char formatting. ",
new TextFormat(tf) { Font = StandardFonts.TimesBoldItalic, ForeColor = Color.DarkSeaGreen, });
// Text can be added to TextLayout without explicit TextFormat:
tl.Append("Fifth test string, using the TextLayout's default format.");
// ...but in that case at least the Font must be specified on the
// TextLayout's DefaultFormat, otherwise PerformLayout (below) will fail:
tl.DefaultFormat.Font = StandardFonts.TimesItalic;
// Specify the layout, such as max available size etc.
// Here we only provide the max width, but many more parameters can be set:
tl.MaxWidth = page.Size.Width - In * 2;
// Paragraph formatting can also be set, here we set first line offset,
// spacing between paragraphs and line spacing:
tl.FirstLineIndent = In * 0.5f;
tl.ParagraphSpacing = In * 0.05f;
tl.LineSpacingScaleFactor = 0.8f;
// When all text has been added, and layout options specified,
// the TextLayout needs to calculate the glyphs needed to render
// the text, and perform the layout. This can be done with a
// single call:
tl.PerformLayout(true);
// Now we can draw it on the page:
pt = new PointF(In, In * 4);
g.DrawTextLayout(tl, pt);
// TextLayout provides info about the text including the measured bounds
// and much more. Here we draw the bounding box in orange red:
g.DrawRectangle(new RectangleF(pt, tl.ContentRectangle.Size), Color.OrangeRed);
// 5.
// TextLayout can be re-used to draw different paragraph(s), this can be useful
// when you need to render a different text with the same paragraph formatting.
// The Clear() call removes the text but preserves paragraph formatting:
tl.Clear();
tl.Append("5. This is text rendered re-using the same TextLayout. ");
tl.Append("More text added to TextLayout being re-used, continuing the same paragraph. ", tf);
tl.Append("And finally, some more text added.", tf);
// The necessary call to calculate the glyphs and perform layout:
tl.PerformLayout(true);
// Render the text:
g.DrawTextLayout(tl, new PointF(In, In * 5));
// Done:
doc.Save(stream);
}
}
}
A full demonstration of this C# .NET application can be downloaded here.
PDF Paragraph Formatting in .NET C#
Powerful features to format paragraphs. This example shows developers how to do things like indent the first line of the paragraph and set the line spacing of the paragraph.
Although a complicated process with some other APIs, it's an easy process with GcPdf API.
- Instantiate the GcPdfDocument
- Add an empty page
- Create a TextLayout instance
- Set the TextLayout Properties
- Add text (Paragraphs)
- Save the document
//
// This code is part of GrapeCity Documents for PDF samples.
// Copyright (c) GrapeCity, Inc. All rights reserved.
//
using System;
using System.IO;
using System.Drawing;
using GrapeCity.Documents.Pdf;
namespace GcPdfWeb.Samples.Basics
{
// This sample demonstrates the most basic paragraph formatting options:
// - first line indent;
// - line spacing.
public class ParagraphFormatting
{
public void CreatePDF(Stream stream)
{
Func<string> makePara = () => Common.Util.LoremIpsum(1, 5, 10, 15, 30);
var doc = new GcPdfDocument();
var g = doc.NewPage().Graphics;
// Using Graphics.CreateTextLayout() ensures that TextLayout's resolution
// is set to the same value as that of the graphics (which is 72 dpi by default):
var tl = g.CreateTextLayout();
// Default font:
tl.DefaultFormat.Font = StandardFonts.Times;
tl.DefaultFormat.FontSize = 12;
// Set TextLayout to the whole page:
tl.MaxWidth = doc.PageSize.Width;
tl.MaxHeight = doc.PageSize.Height;
// ...and have it manage the page margins (1" all around):
tl.MarginAll = tl.Resolution;
// First line offset 1/2":
tl.FirstLineIndent = 72 / 2;
// 1.5 line spacing:
tl.LineSpacingScaleFactor = 1.5f;
//
tl.Append(makePara());
tl.PerformLayout(true);
// Render text at (0,0) (margins are added by TextLayout):
g.DrawTextLayout(tl, PointF.Empty);
// Done:
doc.Save(stream);
}
}
}
A full demonstration of this C# .NET application can be downloaded here.
Extract/Parse Text from PDF using C# .NET
Although we could go on for days discussing different ways to add and manipulate text in a PDF using the GcPdf API, it's best to limit the discussion! With that said, the last item to discuss is how to get text out of a PDF file. Why would one want to do this? There are several excellent reasons, including:
- The ability to extract meaningful data
- Wanting to copy/paste various text from one PDF to another
- Combining documents (Legal, real estate, Personal, etc.)
We’re sure there are many other reasons, but we'll leave the list at four items, for now, to not bore everyone to tears. Obviously, there are reasons to get text out of a PDF, so how do we do that? With the GcPdf API Library, of course!
The following example shows how to get text from a document with mixed images and text. This example demonstrates how to extract just the text from one document, create another document and essentially "paste" the text ONLY into that new document. Here are the basic steps involved in this process:
- Create a new PDF for accepting the text
- Set up the TextLayout with all appropriate properties (Margins, font size, etc.)
- Load an existing PDF (where the text will be extracted from)
- Extract the text & add it to the new document (Add all text to TextLayout and loop to render (for pagination purposes))
- Save the document
public class ExtractText
{
public int CreatePDF(Stream stream)
{
GcPdfDocument doc = new GcPdfDocument();
var page = doc.NewPage();
var rc = Common.Util.AddNote(
"This sample loads an arbitrary PDF into a temporary GcPdfDocument, " +
"then retrieves text from each page of the loaded document using the Page.GetText() method, " +
"adds all those texts to a TextLayout and renders it into the current document. " +
"An alternative to Page.GetText() is the method GcPdfDocument.GetText() " +
"which retrieves the text from the whole document at once.",
page);
// Text format for captions:
var tf = new TextFormat()
{
Font = GCTEXT.Font.FromFile(Path.Combine("Resources", "Fonts", "yumin.ttf")),
FontSize = 14,
ForeColor = Color.Blue
};
// Text layout to render the text:
var tl = new TextLayout(72);
tl.DefaultFormat.Font = StandardFonts.Times;
tl.DefaultFormat.FontSize = 12;
tl.MaxWidth = doc.PageSize.Width;
tl.MaxHeight = doc.PageSize.Height;
tl.MarginAll = rc.Left;
tl.MarginTop = rc.Bottom + 36;
// Text split options for widow/orphan control:
TextSplitOptions to = new TextSplitOptions(tl)
{
MinLinesInFirstParagraph = 2,
MinLinesInLastParagraph = 2,
RestMarginTop = rc.Left,
};
// Open an arbitrary PDF, load it into a temp document and get all page texts:
using (var fs = new FileStream(Path.Combine("Resources", "PDFs", "Wetlands.pdf"), FileMode.Open, FileAccess.Read))
{
var doc1 = new GcPdfDocument();
doc1.Load(fs);
// Get the texts of the loaded document's pages:
var texts = new List<string>();
doc1.Pages.ToList().ForEach(p_ => texts.Add(p_.GetText()));
// Add texts and captions to the text layout:
for (int i = 0; i < texts.Count; ++i)
{
tl.AppendLine(string.Format("Text from page {0} of the loaded document:", i + 1), tf);
tl.AppendLine(texts[i]);
}
tl.PerformLayout(true);
while (true)
{
// 'rest' will accept the text that did not fit:
var splitResult = tl.Split(to, out TextLayout rest);
doc.Pages.Last.Graphics.DrawTextLayout(tl, PointF.Empty);
if (splitResult != SplitResult.Split)
break;
tl = rest;
doc.NewPage();
}
}
// Done:
doc.Save(stream);
return doc.Pages.Count;
}
}
}
Summary of using C# and .NET to Manage Text in a PDF
Although the idea of PDF files is to create secure and immutable documents, changes may inevitably be required, and/or requirements may dictate that the documents we need are explicitly created in PDF format without an interim document like Word or Excel. Utilizing the GcPdf API Library, C#, and .NET makes it much easier to work with text in a PDF and can easily make complicated tasks much simpler and even automated, depending on the requirements.
Lastly, the procedures shown in this blog are merely a subset of options available for rendering and handling text within PDF files. Please be sure to check out some of the following topics to help with your text manipulation needs:
Remember to check out all the demonstrations for manipulating text in GcPdf and the other awesome GrapeCity tools to help you and your team become as efficient as possible when managing and creating documents!
As always, don't hesitate to contact us with any questions, and keep on coding!
Ready to Get Started? Download GrapeCity Documents Today!