8 ways to generate optimized PDFs on .NET Core
Documents for PDF (GcPdf) has a wide variety of features and a small footprint, but you can optimize your PDFs even further by following these few quick tips.
1. Create large documents, fast
GcPdf provides two approaches to creating a PDF file:
Traditional: Build the document completely first
In this case, you'd build the document completely first, adding text, graphics and other elements. Then you call Save() on the document passing the name of the file, or the stream to save to. This approach allows you to modify the already-created content - e.g., you can insert pages anywhere in the document, or modify the already added pages.
Optimized PDF: The StartDoc/EndDoc method
With this approach, you provide the stream to save to at the very beginning, before adding any content to the document, by calling the StartDoc() method on the document. All content is then written directly to that stream, and you cannot go back and update the already-created pages.
To complete the document, you call the EndDoc() method. If you try to perform an action that is not allowed, an exception will be thrown. While this approach is somewhat limiting (e.g. linearized cannot be set to true in this mode), it uses less memory and may be preferable especially when creating very large documents.
To view the sample codes on how to create large documents, visit the large documents demo.
2. Optimize PDFs by adjusting font embedding
By default, font subsets containing just the glyphs used in the document are embedded. If needed, this can be changed to embed whole fonts (which may result in huge file sizes, so be careful) or to not embed fonts at all (which is usually also not recommended). Embedding can also be changed for individual fonts using the GcPdfDocument.Fonts collection).
var doc = new GcPdfDocument();
doc.FontEmbedMode = FontEmbedMode.NotEmbed;
3. Use FontCollection to optimize PDF documents
When using FontCollection with GcPdf, here are some key points and recommended steps:
Step 1: Create an instance of the FontCollection class.
FontCollection is not a static class, so you need an instance to use it. In addition, it's a regular .NET collection of Font objects, so all usual collection manipulation methods (Add, Insert, Remove etc) can be used on it.
Step 2: Populate the font collection with fonts using any of the following methods:
RegisterDirectory()
: Registers all fonts found in a specified directory;RegisterFont()
: Registers font(s) found in a specified file;Add(Font)
: Adds a font instance that you created.
Note that registering directories or fonts with a font collection is a fast and lightweight operation. The font collection does not actually load all the font data when directories or individual fonts are registered with it. Instead, it loads only the minimal info so that it can find and provide fonts quickly and efficiently when needed.
Step 3: Assign your instance of the font collection
Assign your font collection instance to TextLayout.FontCollection (and to GcGraphics.FontCollection if using GcGraphics.MeasureString/DrawString) so that the correct fonts can be found.
Step 4: Select fonts by specifying font names
In your text rendering code, select fonts by specifying font names (TextFormat.FontName, the names must match exactly but the case is not important), font bold and italic flags (TextFormat.FontBold/FontItalic). If a suitable bold/italic version of the requested font is found in the collection, it will be used; otherwise, font emulation will be applied.
FontCollection methods and properties are thread-safe, so once your font collection has been populated, you can cache and share it between sessions and/or modules of your application. You do need to exercize caution when modifying and accessing the font collection simultaneously from different threads though, as it may change between a check of some condition on the collection, and action on that check. For such cases the FontCollection.SyncRoot property is provided, and should be used.
To view the sample codes on how to use FontCollection, visit the FontCollection demo.
4. Optimize PDFs by loading fonts from a file
When Font.FromFile() is used, the actual data is loaded on demand, so that usually a Font instance will not take too much space. The situation is different for fonts created using Font.FromArray() and Font.FromStream() methods—in those cases, the whole font is immediately loaded into memory, so using those methods is generally not recommended.
When different Font instances (created using any of the static ctors mentioned above) are used to render text in a PDF, each instance will result in embedding a separate subset of glyphs even if the glyphs are the same, because GcPdf has no way of knowing that two different Font instances represent the same physical font. So either make sure that only one Font instance is created for each physical font, or better yet use the FontCollection class to add the fonts you need, and specify them via TextFormat.FontName.
To view the sample code on how to load fonts from File, visit the FontFromFile demo.
5. Use fallback fonts to optimize PDFs
Fallback fonts are fonts used to draw glyphs that are not present in a font specified by the application. GcPdf provides a default list of fallback font families that is automatically initialized, and includes large fonts that are usually suitable to be used as fallbacks for many languages for which some common fonts do not have the glyphs.
These automatically-added fallback font families are available via methods on the FontCollection.SystemFonts static collection. You can customize the default (and system-dependent) behavior by providing your own fallback fonts, and by adding them either to fallbacks managed by the global FontCollection.SystemFonts, by adding them to your own instance of the FontCollection, or to specific fonts that you are using. In this way the fallback font behaviour can be finely tuned and be completely system-independent.
To view the sample code on how to use Fallback Fonts, visit the FallBack Fonts demo.
6. Improve text rendering
Text rendering using GcPdf can be done using two main approaches:
- Using the MeasureString/DrawString pair, or,
- Using the TextLayout directly
While the first approach may be easier in simple cases, the second approach (using TextLayout) is much more powerful and yields better performance.
To view sample code for how to render text using GcPdf, visit the Text Rendering demo.
7. Load images once and cache in an image object
When you render an image in GcPdf multiple times (e.g. rendering the same image as part of a page header on all pages), it will automatically be added to a dictionary and reused throughout the document, provided you use the same image object on all pages. So rather than loading the same image from file (or stream) each time it is needed, it is always preferable to load the image once and cache it in an image object. This applies to all image types available in GcPdf (Image, RawImage, ImageWrapper).
To view sample codes on how to render images, visit the Images demo.
8. Loading PDFs
When working with an existing PDF file using Documents for PDFDocument.Load() method, the stream passed to that method must remain open while working with the document. This is because Load() does not load the entire PDF document into memory right away. Instead, it loads the various parts of the PDF as needed. The stream is only used for reading, and the original file itself is not modified. To save the changes you need to call one of the Documents for PDFDocument.Save() overloads as usual.
To view sample codes on how to load PDFs (and modify them), visit the Editing demo.