Document Solutions for PDF
Features / Document Optimization
In This Topic
    Document Optimization
    In This Topic

    Optimizing a document helps reduce the size of the document significantly, making it faster to load, read, and share. DsPdf allows you to optimize PDF documents with various options without compromising their quality and integrity. Refer to the following sections to learn more about various document optimization options:

    Remove Duplicate Images

    DsPdf allows you to reduce the size of a document efficiently using RemoveDuplicateImages method of GcPdfDocument class. This method eliminates redundant instances of identical images internally within the document, retaining only a single instance across multiple locations, hence reducing the size of the document.

    Refer to the following example code demonstrating how to optimize the file size of a document using RemoveDuplicateImages method:

    C#
    Copy Code
    // Initialize GcPdfDocument.
    GcPdfDocument doc = new GcPdfDocument();
    
    // Open PDF document in the file stream.
    FileStream fs = File.OpenRead("Invoice.pdf");
    
    // Load the PDF document.
    doc.Load(fs);
    
    // Remove duplicate images.
    doc.RemoveDuplicateImages();
    
    // Save PDF document.
    doc.Save("RemovedDuplicateImages.pdf");
    
    Note: You can also optimize the file size of a merged PDF document. See Remove Duplicate Images from Merged Document.

    Optimize Document Fonts and Font Formats

    Optimize Fonts

    DsPdf enables you to optimize font usage by merging subsets of the same fonts and removing duplicate and unused fonts using OptimizeFonts method of GcPdfDocument class. Furthermore, DsPdf also provides OptimizeFontsOptions class that enables you to control the behavior of OptimizeFonts method using the following properties of this class:

    Refer to the following example code to optimize font usage:

    C#
    Copy Code
    static void Main(string[] args)
    {
        // Create a 5 page non-optimal PDF.
        var tmpInput = MakeInputFile("CompleteJavaScriptBook.pdf");
        var fiInput = new FileInfo(tmpInput);
    
        // Create a new PDF, load the source PDF into it, and optimize it.
        var tmpOutput = Path.GetTempFileName();
        var tmpDoc = new GcPdfDocument();
        using (var fs = File.OpenRead(tmpInput))
        {
            tmpDoc.Load(fs);
    
            // By default GcPdfDocument uses CompressionLevel.Fastest when saving a PDF.
            // Set CompressionLevel to Optimal to reduce the size of the PDF.
            tmpDoc.CompressionLevel = CompressionLevel.Optimal;
    
            // Optimize the font usage.
            tmpDoc.OptimizeFonts();
            tmpDoc.Save(tmpOutput);
        }
        var fiOutput = new FileInfo(tmpOutput);
    
        // Record the input and output file sizes in the resultant PDF.
        var doc = new GcPdfDocument();
        Common.Util.AddNote(String.Format(
            "Using the GcPdfDocument.OptimizeFonts() method reduced the size of a 5-page PDF from {0:N0} to {1:N0} bytes " +
            "by merging duplicate and removing unused font data.\n" +
            "To reproduce these results locally, download and run this sample. You may also modify the sample code to keep the temporary " +
            "input and output files and compare their sizes using a file manager.", fiInput.Length, fiOutput.Length),
            doc.NewPage());
        doc.Save("OptimizeFonts.Pdf");
    
        // Delete the temp files.
        File.Delete(tmpInput);
        File.Delete(tmpOutput);
    }
    
    // Create a method to make input file.
    static string MakeInputFile(string inFn)
    {
        // Initialize GcPdfDocument.
        var indoc = new GcPdfDocument();
    
        // Load the PDF document.
        using var fs = File.OpenRead(inFn);
        indoc.Load(fs);
    
        // Create 5 PDFs from the first 5 pages of the source document.
        var pageCount = 5;
        var docs = new List<GcPdfDocument>(pageCount);
        for (int i = 0; i < pageCount; ++i)
        {
            var outdoc = new GcPdfDocument();
            outdoc.MergeWithDocument(indoc, new MergeDocumentOptions() { PagesRange = new OutputRange(i + 1, i + 1) });
            docs.Add(outdoc);
        }
    
        // Merge the PDFs into a single document.
        var doc = new GcPdfDocument();
        foreach (var d in docs)
            doc.MergeWithDocument(d);
    
        // Save the resultant PDF in a temp file.
        var outFn = Path.GetTempFileName();
        doc.Save(outFn);
        return outFn;
    }
    
    Note: DsPdf only supports TrueType fonts.

    Optimize Font Format

    DsPdf uses the one-byte encoding format, i.e., Type0AutoOneByteEncoding, by default. This format produces smaller PDF content than Type0IdentityEncoding, but if the amount of text rendered with a font using Type0AutoOneByteEncoding is small (less than ~1000 symbols), then the resulting PDF content size may be bigger than when using Type0IdentityEncoding. This happens due to the requirement of additional information by Type0AutoOneByteEncoding about encoding. The additional size is ~1Kb depending on the number of unique characters used in the text.

    DsPdf allows you to set the encoding type for the font formats representing a font in a PDF document using PdfFontFormat property of GcPdfDocument and FontHandler classes. This property uses PdfFontFormat enumeration to define the encoding type.

    PdfFontFormat enumeration provides the following options that define the encoding type:

    Option Description
    Type0AutoOneByteEncoding Saves the font as one or more Type0 PDF fonts, where each character is encoded by one byte.
    Type0IdentityEncoding Saves the font as a single Type0 font with Identity encoding, where each character is encoded with two bytes.

    Refer to the following example code to define the encoding type:

    C#
    Copy Code
    // Load the font from file.
    var gabriola = GCTEXT.Font.FromFile(Path.Combine("Resources", "Fonts", "Gabriola.ttf"));
    if (gabriola == null)
        throw new Exception("Could not load font Gabriola");
    
    // Render the text using the font.
    var tf = new TextFormat() { Font = gabriola, FontSize = 16 };
    
    // Initialize GcPdfDocument.
    var doc = new GcPdfDocument();
    var g = doc.NewPage().Graphics;
    
    // Set PdfFontFormat to Type0IdentityEncoding.
    doc.PdfFontFormat = PdfFontFormat.Type0IdentityEncoding;
    
    // Draw the string.
    g.DrawString($"Sample text drawn with font {gabriola.FontFamilyName}.", tf, new PointF(72, 72));
                
    // Change the font size.
    tf.FontSize += 4;
                
    // Draw the string.
    g.DrawString("The quick brown fox jumps over the lazy dog.", tf, new PointF(72, 72 * 2));
    
    // Emulate bold or italic style with a non-bold (non-italic) font.
    tf.FontStyle = GCTEXT.FontStyle.Bold;
    
    // Draw the string.
    g.DrawString("This line prints with the same font, using emulated bold style.", tf, new PointF(72, 72 * 3));
    
    // Set bold italic font and print a line with it.
    var timesbi = GCTEXT.Font.FromFile(Path.Combine("Resources", "Fonts", "timesbi.ttf"));
    tf.Font = timesbi ?? throw new Exception("Could not load font timesbi");
    tf.FontStyle = GCTEXT.FontStyle.Regular;
    
    // Draw the string.
    g.DrawString($"This line prints with {timesbi.FullFontName}.", tf, new PointF(72, 72 * 4));
                
    // Save the PDF document.
    doc.Save("OptimizeFontFormat.pdf");
    
    Note: Set PdfFontFormat property before adding any text to the document, or an exception will be thrown. The exception occurs when the desired font embedding mode for any fonts in the document is not embed.
    Note: PdfFontFormat property will not affect the standard PDF fonts, as it is required by the PDF specification to save them as PDF Type 1 fonts.

    Limitations

    DsPdf will use Type0IdentityEncoding regardless of the user’s selection if a font is not embedded, as Acrobat Reader renders such PDFs with a lot of distortions.

    Optimize Document Size with Object Streams

    DsPdf enables the use of object streams when saving a PDF document through the UseObjectStreams property in the SavePdfOptions class. This property utilizes the UseObjectStreams enumeration to specify whether to use object streams and, if so, determine the type of object streams to apply.

    An object stream is a stream object that can store a sequence of indirect objects more compactly using CompressionLevel property rather than storing them at the file's outermost level. Object Streams significantly reduce the size of PDF documents.

    The SavePdfOptions class gives you precise control over how your code saves the PDFs in the optimal way, an instance of which can be passed to SaveSign, and TimeStamp methods of GcPdfDocument class. The SavePdfOptions class provides following properties:

    Property Description
    PdfStreamHandling

    Sets a value controlling how existing PDF streams will be handled when the document is saved using PdfStreamHandling enumeration.

    PdfStreamHandling enumeration provides the following options:

    • Copy: Copy the content of the original stream to the output as is without any changes.
    • UseCompressionLevel: Decompress existing PDF streams and recompress them using CompressionLevel property.
    • MinimizeSize: Copy the existing streams as is or recompress them, achieving the smallest possible size.
    Mode

    Sets a value specifying the PDF save mode using SaveMode enumeration.

    SaveMode enumeration provides the following save modes:

    • Default: In this mode, the PDF is not linearized, and incremental update is not used.
    • Linearized: The document is saved as a linearized ("fast web view") PDF.
    • IncrementalUpdate: The document is saved using incremental update.
    UseObjectStreams

    Sets a value indicating whether to use object streams when saving the PDF using UseObjectStreams enumeration.

    UseObjectStreams enumeration provides the following options:

    • None: Do not use object streams.
    • Single: Use a single object stream for the entire document. This option can significantly reduce the PDF file size, though it may introduce a delay when opening the PDF in a viewer.
    • Multiple: Uses multiple object streams. This may slightly increase the file size compared to Single, but the PDF will open without delay in a viewer.

    Refer to the following example code to use multiple object streams to reduce the PDF document size:

    C#
    Copy Code
    static void Main(string[] args)
    {
        // Create a 5 page non-optimal PDF.
        var tmpInput = MakeInputFile();
        var fiInput = new FileInfo(tmpInput);
    
        // Create a new PDF, load the source PDF into it, and optimize it.
        var tmpOutput = Path.GetTempFileName();
        var tmpDoc = new GcPdfDocument();
        using (var fs = File.OpenRead(tmpInput))
        {
            tmpDoc.Load(fs);
    
            // By default GcPdfDocument uses CompressionLevel.Fastest when saving a PDF.
            // Set CompressionLevel to Optimal.
            tmpDoc.CompressionLevel = CompressionLevel.Optimal;
    
            // Minimize stream sizes using object streams.
            tmpDoc.Save(tmpOutput, new SavePdfOptions(SaveMode.Default, PdfStreamHandling.MinimizeSize, UseObjectStreams.Multiple));
        }
        var fiOutput = new FileInfo(tmpOutput);
    
        // Record the input and output file sizes in the resultant PDF.
        var doc = new GcPdfDocument();
        Common.Util.AddNote(String.Format(
            "Using the UseObjectStreams.Multiple option when saving a PDF will in most cases reduce the resulting file size, " +
            "sometimes significantly. In this case the size of the PDF generated by the 'Large Document' sample decreased " +
            "from {0:N0} to {1:N0} bytes, without any loss in fidelity or PDF opening speed.\n" +
            "Using the UseObjectStreams.Single option yields an even slightly smaller PDF size at the cost of slower opening in PDF viewers.\n" +
            "To reproduce these results locally, download and run this sample, specifying a valid license key " +
            "(otherwise loading is limited to 5 pages, and the size reduction may be too small). " +
            "You may also modify the sample code to keep the temporary " +
            "input and output files, and compare their sizes using a file manager.", fiInput.Length, fiOutput.Length),
            doc.NewPage());
    
        // Save the resultant PDF document.
        doc.Save("ObjectStreams.pdf");
    
        // Delete the temp files.
        File.Delete(tmpInput);
        File.Delete(tmpOutput);
    }
    
    // Create method to make input file.
    static string MakeInputFile()
    {
        // Set number of pages to generate.
        const int N = Common.Util.LargeDocumentIterations;
        var start = Common.Util.TimeNow();
        var doc = new GcPdfDocument();
    
        // Create a TextLayout to hold/format the text.
        var tl = new TextLayout(72)
        {
            MaxWidth = doc.PageSize.Width,
            MaxHeight = doc.PageSize.Height,
            MarginAll = 72,
            FirstLineIndent = 36,
        };
        tl.DefaultFormat.Font = StandardFonts.Times;
        tl.DefaultFormat.FontSize = 12;
    
        // Generate the PDF document.
        for (int pageIdx = 0; pageIdx < N; ++pageIdx)
        {
            tl.Append(Common.Util.LoremIpsum(1));
            tl.PerformLayout(true);
            doc.NewPage().Graphics.DrawTextLayout(tl, PointF.Empty);
            tl.Clear();
        }
    
        // Insert a title page (cannot be done if using StartDoc/EndDoc).
        tl.FirstLineIndent = 0;
        var fnt = GCTEXT.Font.FromFile(Path.Combine("Resources", "Fonts", "yumin.ttf"));
        var tf0 = new TextFormat() { Font = fnt, FontSize = 24, FontBold = true };
        tl.Append(string.Format("Large Document\n{0} Pages of Lorem Ipsum\n\n", N), tf0);
        var tf1 = new TextFormat(tf0) { FontSize = 14, FontItalic = true };
        tl.Append(string.Format("Generated on {0} in {1:m\\m\\ s\\s\\ fff\\m\\s}.", Common.Util.TimeNow().ToString("R"), Common.Util.TimeNow() - start), tf1);
        tl.TextAlignment = TextAlignment.Center;
        tl.PerformLayout(true);
        doc.Pages.Insert(0).Graphics.DrawTextLayout(tl, PointF.Empty);
    
        // Save the resultant PDF in a temp file with UseObjectStreams.None (it is the default).
        var outFn = Path.GetTempFileName();
        doc.Save(outFn, new SavePdfOptions(SaveMode.Default, PdfStreamHandling.Copy, UseObjectStreams.None));
        return outFn;
    }
    
    Note: By default, PdfStreamHandling property is set to Copy. Copy option copies the existing PDF streams in the document to the destination document as is. UseCompressionLevel option does not guarantee that the size of the PDF will decrease because the original streams could have been compressed using a more effective compression than DsPdf. Hence, MinimizeSize option guarantees that the size of the resulting PDF will not increase and will decrease in most cases, as in this mode, DsPdf decompresses each existing stream (if it was compressed), compresses it using CompressionLevel property of GcPdfDocument class and compares the size of the compressed data with the original size. The resulting stream is updated only if the new size is smaller. This option yields the minimum size but may increase the document's saving time.

    Limitations

    DsPdf does not save the document using object streams:

    Note: The example codes in this topic add content to the PDF document using a helper file named Util.cs. To run the example codes directly, download the file from here.