DsPdf is a collection of cross-platform .NET class libraries written in C#, that provides an API that allows the creation of PDF files from scratch as well as loading, analyzing, and modifying existing documents.
DsPdf is compatible with .NET Core 2.x/3.x, .NET Standard 2.x, .NET Framework 4.6.2 or higher, and .NET 6 or higher.
DsPdf and supporting packages are available on nuget.org:
To use DsPdf in an application, simply reference the DS.Documents.Pdf package. All other required packages that DsPdf utilizes will be installed automatically.
To render barcodes, install the DS.Documents.Barcode package (DsBarcode for short). It provides extension methods allowing to draw barcodes when using DsPdf.
DS.Documents.DX.Windows provides access to the native imaging APIs to DsPdf if it runs on a Windows system.
Classes and other types in the DsPdf and related libraries expose a PDF object model that closely follows the Adobe PDF specification version 2.0 published by Adobe. DsPdf is designed to provide, whenever feasible, direct access to all features of the PDF format, including the low-level features. In addition, DsPdf provides a powerful and platform-independent text layout engine and some other high-level features that make document creation using DsPdf easy and convenient.
Namespaces | Description |
---|---|
GrapeCity.Documents.Drawing | Framework for drawing on the abstract GcGraphics surface. |
GrapeCity.Documents.Pdf | Types used to create, process and modify PDF documents includes GcPdfGraphics. Nested namespaces contain types supporting specific PDF spec areas: |
GrapeCity.Documents.Text | Text processing sub-system. |
A PDF document in DsPdf is represented by an instance of the GrapeCity.Documents.Pdf.GcPdfDocument class. To create a new PDF, create an instance of GcPdfDocument, add content to it and then call one of the GcPdfDocument.Save() overloads to write the document to a file. Save() method can be called multiple times on an instance of GcPdfDocument, so that many (possibly different) PDF documents can be created.
GcPdfDocument also provides a Load() method, allowing the analysis and/or modification of an existing PDF. When Load() method is called on an instance of GcPdfDocument, the instance is cleared first. It is important to note that the Load() method accepts a Stream that is opened by the caller on the PDF which is loaded, and the stream must be readable and must be kept open for the duration of working with the loaded document. This is because Load() method does not actually load the whole document into memory, rather it loads the required parts on demand, which keeps the memory footprint to a minimum and improves performance. Note that Load() is a "read-only" method. GcPdfDocument does not try to write back to the loaded stream - In order to save any changes made to the document, Save() method must be called, specifying the output file or stream as a newly created document.
A number of properties and collections on the GcPdfDocument provide access to the content and properties of the document. The most important collection is Pages (see The Pages Collection), others include Outlines, AcroForm, Security and so on.
The Pages collection represents the collection of a document's pages. When a new GcPdfDocument is created, this collection is initially empty. The usual collection modifying methods are available and can be used to fetch, add, insert, remove or move pages around. When an existing PDF is loaded into a GcPdfDocument, the Pages collection is filled with the pages loaded from that document. It can then be modified in the same way as in a document created from scratch.
Using the GcPdfDocument.Load() method, existing documents can be inspected and modified. The possible modifications include:
No other modifications are supported at this time. For example, it is currently not possible to replace existing text or graphics, except by removing existing and adding new content streams.
It should be noted again that when an existing document is loaded into a GcPdfDocument instance, the connection with the original document is read-only, i.e. content is fetched as needed from the underlying stream, but no attempt is made to write back the changes. The GcPdfDocument.Save() method should be called if preserving the changes is required.
In addition to the Save() method mentioned above, GcPdfDocument provides a sequential mode for creating a PDF. To use this mode, start by calling the StartDoc() method on the document, specifying a writable Stream as the method's only parameter. After that content can be added to the document as usual, but with following limitations. When done, call the EndDoc() method which completes writing the document.
The limitations of the sequential method are as follows:
The advantage of the sequential mode is that the pages of the document are written to the underlying stream as soon as they are completed, so especially if creating a very large PDF the memory footprint can be much smaller.
Text measuring and layout is supported by a specialized set of classes in the GrapeCity.Documents.Text namespace. These classes provide a rich object model that includes, and allows access to text elements from high-level (paragraphs) all the way down to the lowest levels, such as individual font and glyph features. Text processing is completely platform-independent and does not rely on any operating system-provided APIs.
The most important class in the GrapeCity.Documents.Text namespace is TextLayout, it represents one or more paragraphs of text, and supports the following features:
All features are fully supported for vertical (Chinese or Japanese) and RTL/bidirectional text.
After a text has been added to, and processed by, an instance of the TextLayout class, a representation of the text is generated using the glyphs from the specified fonts, and coordinates of any fragment of the original text in the generated layout can be fetched, if necessary.
A TextLayout instance can also be directly rendered onto GcGraphics (see Graphics) using the DrawTextLayout method. Simple MeasureString/DrawString methods on GcGraphics are also provided for convenience.
DsPdf provides a graphics surface to draw on, represented by a GcPdfGraphics class, which is an implementation of the abstract GcGraphics base class. GcPdfGraphics provides a flexible and rich object model for measuring, stroking, and filling the usual graphic primitives such as lines, rectangles, polygons, ellipses and so on. Drawing (Stroking) can be done with solid or dashed lines, shapes can be filled with solid, or gradient brushes. For an example of shape rendering methods, see GcPdfGraphics.DrawEllipse() or GcPdfGraphics.FillEllipse() method. Complex shapes can be created and rendered using graphic paths. For example, see GcPdfGraphics.DrawPath() method.
Graphics transformations using 3x2 matrices are fully supported (including text). For more information, see GcPdfGraphics.Transform() method.
The default units of measurement used by GcPdfGraphics and TextLayout are printer points (1/72 of an inch). If desired, these can be changed to an arbitrary resolution using the Resolution property available on both GcPdfGraphics and TextLayout classes.
Coordinates of all graphic objects are measured from the top left corner of the graphics surface (which in GcPdfGraphics is usually a page). GcPdfGraphics.Transform can be used to change that.
To draw on a page in a PDF document, an instance of GcPdfGraphics must be used for each page. Each page in the GcPdfDocument.Pages collection has the Graphics property that fetches the graphics for that page. You can simply get that property and draw on the returned graphics instance. Initially each page has just one graphics associated with it. But if the page contains multiple context streams, each context stream will have its own graphics, and the Page.Graphics property will return the graphics of the last (top-most) content stream. (All content streams of the page can be accessed via its ContentStreams collection.)
DsHtml is a utility library that renders HTML to PDF file or an image in PNG, JPEG, and WebP format. DsHtml uses a Chrome or Edge browser (already installed in the current system, or downloaded from a public web site) in headless mode. Also, it doesn’t matter whether your .NET application is built for x64, x86 or AnyCPU platform target. The browser is continuously working in a separate process.
The DS.Documents.Html library consists of a platform-independent main package that exposes the HTML rendering functionality. The main package contains the following namespaces:
Namespaces | Description |
---|---|
It provides the extension methods for rendering HTML to PDF file and represents the formatting attributes for rendering HTML to PDF file. The namespace comprises the following classes: |
|
GrapeCity.Documents.Html |
It provides methods for converting HTML to PDF or images and defines parameters for the PDF or image. The namespace comprises the following classes: |
GrapeCity.Documents.Drawing |
It provides the extension methods and formatting attributes for rendering HTML to image. The namespace comprises the following classes: |
The BrowserFetcher class has two static methods: GetSystemChromePath() and GetSystemEdgePath(). The methods return the path to an executable file of Chrome or Edge browsers correspondingly. Another option is to download and install Chromium into a local folder. You can create an instance of BrowserFetcher and pass the information such as host, platform, revision, and the destination folder, if needed. Then, execute the BrowserFetcher.GetDownloadedPath() method which downloads Chromium, if required, and returns the path to an executable file for running the Chromium.
The GcHtmlBrowser class provides methods for converting HTML to PDF and images. With a path to an executable file for running either the Chromium or Edge browsers discovered in the BrowserFetcher class, we can create an instance of GcHtmlBrowser class, which effectively runs the browser process in the background. GcHtmlBrowser also accepts another parameter of LaunchOptions type. The LaunchOptions class provides various settings specific to launching the browser.
The class has two important methods: NewPage(Uri uri) and NewPage(string html). Both methods return an instance of HtmlPage class which represents a browser tab after navigating to the specified web address, file, or the arbitrary HTML content. The second parameter of PageOptions type provides various properties to be applied to the new browser page such as username, password for HTTP authentication, disabling JavaScript, lazy loading etc.
Note:
The HtmlPage class represents a browser tab after navigating to the specified web address, file, or the arbitrary HTML content. The class has methods such as SaveAsPdf, SaveAsPng, SaveAsJpeg, and SaveAsWebp to save the current page as a PDF or as a raster image of PNG, JPEG, or WebP formats respectively. The first parameter of these methods specifies the destination file or stream. The second parameter passes the additional options for rendering HTML page as single PDF page, setting page size, margins, header and footer etc.
The HtmlPage class contains the additional methods that help to interact with HTML page content. For example, you can obtain the full HTML content of the page using the GetContent method. The SetContent method updates the HTML markup. You can reload the web page with the Reload method or even execute a script in the browser context using the EvaluateExpression method. The WaitForNetworkIdle method helps with loading asynchronous web content.
The PdfOptions class represents output settings for rendering HTML to PDF and defines parameters for the Chromium PDF exporter. In the case of PDF, it doesn’t support any transparency.
If PageWidth and PageHeight properties are not set, the Letter paper size (8.5 by 11 inches) is used by default. Landscape property of the class indicates the paper orientation and is ignored if FullPage property is set to true. The Margins property specifies page margins, in inches and its default value is 0. The Scale property scales the content of PDF on the scale of 0.1 to 2.0. You might also need to provide the scaled values for PageWidth and PageHeight properties to keep the relative size of the resulting pages unchanged.
The PageRanges property allows you to limit the number of pages in the output PDF file. You could specify the desired page numbers as a string, such the following: "1-5, 8, 11-13". Invalid page ranges (e.g., "9-5") are ignored.
Setting the FullPage property to true allows you to export the whole HTML as single PDF page. All other layout settings (except Scale) are ignored in that case.
The HtmlToPdfFormat class contains the formatting attributes for rendering HTML to PDF file on a GcPdfGraphicsExt class using DrawHtml extension methods. The HTML is drawn to a temporary PDF as single page (if FullPage is true) or with the specified page size (MaxPageWidth, MaxPageHeight), Scale and DefaultBackgroundColor. It is then loaded into a GcPdfDocument and trimmed to actual size of the HTML content. The result is rendered on a GcPdfGraphics as PDF Form XObject.
If MaxPageWidth or MaxPageHeight properties are not set explicitly they are assumed to be equal to 200 inches. DefaultBackgroundColor is equal to Color.White by default.
Other properties of HtmlToPdfFormat are mapped to the corresponding properties of the PageOptions/PdfOptions class:
HtmlToPdfFormat Property | PageOptions/PdfOptions Property |
WindowSize | PageOptions.WindowSize |
DefaultBackgroundColor | PageOptions.DefaultBackgroundColor |
FullPage | PdfOptions.FullPage |
DisplayBackgroundGraphics | PdfOptions.PrintBackground |
Scale | PdfOptions.Scale |
MaxPageWidth | PdfOptions.PageWidth |
MaxPageHeight | PdfOptions.PageHeight |
DsHtml provides 4 methods that extend GcPdfGraphics and allow to render or measure an HTML text or page: