How to Programmatically Convert HTML to PDF in C#
Quick Start Guide | |
---|---|
Tutorial Concept |
Learn how to programmatically convert HTML to PDF in .NET back-end applications using C#. |
What You Will Need |
.NET 6+ |
Controls Referenced |
HyperText markup language, commonly referred to as HTML, has been the foundation of creating and navigating web pages from the very beginning. Its significance further increases provided how the world is moving towards digitization. Hence, working with this format just can’t be confined to internet use; users even look forward to accessing the information available in online and offline modes. PDF format is the perfect solution to accomplish this goal, and this blog will showcase how it’s done.
Document Solutions offers the DsHtml library, which is dedicated solely to converting HTML content to PDF and images. With each release, the DsHtml package has evolved to end any dependencies on the browser version or GPL/LGPL licenses. DsHtml no longer depends on a custom Chromium build. It can now work with the Chrome or Edge browsers installed in the operating system. Also, it can download Chromium from a public website and install it in a local folder to be used in the application.
In this blog, we will learn about the new DsHtml package, tips for migrating from the old to the new DsHtml package, and finally, we will explore how to use the new DsHtml package for converting HTML to PDF in the sections below:
Ready to Test it Out? Download Document Solutions Today!
Setup and Installation
To begin, you’ll want to download and install the DsHtml package to your application using the NuGet Package Manager. Search for the package “DS.Documents.Html” on NuGet and install it to the application as shown in the screenshot below:
DsHtml uses a Chrome or Edge browser (already installed in the current system or downloaded from a public website) in headless mode and interacts with it using the WebSocket protocol. Additionally, as in previous versions, platform(OS) specific NuGet packages are no longer required.
The list below defines the fundamental classes structured under the DS.Documents.Html namespace, which are mandatory for the conversion:
- BrowserFetcher: This class helps discover the path to the installed browser or download the Chromium browser from a public server.
- GcHtmlBrowser: This class represents a browser process such as Chrome, Edge, or Chromium.
- HtmlPage: This class represents a browser tab with HTML content and provides various methods to save the HTML content to PDF or images.
For detailed information on these and other classes available in the package, refer to the documentation.
Note: The old GcHtmlRenderer class is now obsolete, but it is still available (for backward compatibility) and works internally through the GcHtmlBrowser class.
Rendering HTML to PDF
To begin, let’s first understand how the DsHtml library helps convert HTML to PDF. The DsHtml library provides two different ways to perform HTML to PDF conversion. The list below summarizes these methods:
1. Using GcHtmlBrowser class: This approach can be considered when you want to generate a PDF document from scratch or a PDF document that solely consists of the HTML content you are looking forward to rendering.
To implement this approach, the NewPage method of the GcHtmlBrowser class should be invoked to prepare a browser page with HTML content. This method has two overloads, one that accepts Uri to the source HTML page and the other that accepts HTML as a plain string.
This method returns an instance of the HtmlPage class, and then the SaveAsPdf method of the HtmlPage class helps to convert the source HTML to PDF. It accepts the output file path as the first parameter. The second parameter (optional) is the PdfOptions instance that defines parameters for the output PDF file.
2. Using DrawHtml method: This approach can be considered when you would like to append the HTML information into an existing PDF file that already has some other content available. All the HTML content you want to render as PDF will be appended on a new page in the existing document.
This method extends the GcPdfGraphics class and allows it to render an HTML text or page in a PDF. This also allows inserting HTML fragments into a PDF file along with other (non-HTML) content.
DrawHtml method has two overloads:
- Draws an HTML string on GcPdfGraphics at a specified position:
bool GcPdfGraphics.DrawHtml(GcHtmlBrowser browser, string html, float x, float y, HtmlToPdfFormat format, out SizeF size);
- Draws an HTML page specified by a URI on GcPdfGraphics at a specified position:
bool GcPdfGraphics.DrawHtml(GcHtmlBrowser browser, Uri htmlUri, float x, float y, HtmlToPdfFormat format, out SizeF size);
Here, the HtmlToPdfFormat class contains attributes for rendering HTML on a GcPdfGraphics instance using the DrawHtml extension methods.
Further Examples and Best Practices
HTML Files to PDF
Consider a scenario where an e-commerce firm’s transactions are carried out online. The invoices for these transactions are generated over the same platform in HTML format. The style and layout of these invoices may not remain intact when viewed offline or on other devices.
Since these invoices need to be distributed to the customers and they may use different devices or browsers to view the invoices, converting HTML to PDF would be better to retain the content, layout, and formatting. Hence, to provide these invoices to the customers over email, the company converts the HTML files to PDF.
Here is a quick view of an Invoice in HTML file format:
To serve this purpose, the DsPdf and DsHtml packages can be used. Let us see how to go about it from scratch:
- Open Visual Studio and create a .Net Core Console application by selecting the same from the templates.
- In your application, right-click ‘Dependencies’ and select ‘Manage NuGet Packages’.
- With the "Package source" set to the NuGet website, search for Ds.Documents.Pdf under the ‘Browse’ tab and click "Install."
- Similarly, install the "DS.Documents.Html" package.
Note: While installing, you’ll receive two confirmation dialogs: ‘Preview Changes’ (if the "Show preview window" option setting for the package is checked) and ‘License Acceptance.’ Click ‘OK’ and ‘I Agree,’ respectively, to continue.
- Add references to the following namespaces in Program.cs file:
using GrapeCity.Documents.Html;
using GrapeCity.Documents.Pdf;
using GrapeCity.Documents.Drawing;
- Now, we can achieve the conversion using the approaches defined above, i.e., GcHtmlBrowser class and the DrawHtml method. The code snippets below depict the implementation both ways:
Using GcHtmlBrowser class:
// Define the HTML file URI
var uri = new Uri("Invoice.html", UriKind.Relative);
//Invoke the NewPage method to generate a browser page with HTML content
using var pg = browser.NewPage(uri, new PageOptions
{
WindowSize = new Size(1024, 1024)
});
// Save HTML to PDF using SaveAsPDF method
pg.SaveAsPdf("Invoice_Save.pdf", new PdfOptions
{
FullPage = false
});
Using DrawHtml method:
// Create a GcPdfDocument instance
var doc = new GcPdfDocument();
// Add a new page to the document
var page = doc.Pages.Add();
// Take the Graphics instance of the page
var g = page.Graphics;
// Add the HTML file to it, using the DrawHtml method which reads the html content from the invoice file
g.DrawHtml(browser, File.ReadAllText("Invoice.html"), 72, 72, new HtmlToPdfFormat(false) { MaxPageWidth = 6.5f, MaxPageHeight = 9f }, out SizeF size);
// Save the PDF Document
doc.Save("Invoice_Draw.pdf");
With these quick steps, you now have a PDF file generated from an HTML file, as depicted in the screenshot below:
HTML String to PDF
Simple HTML strings can be directly rendered to PDF using the DrawHtml method. This can be done using HTML files, allowing you to directly specify the HTML content.
Follow steps (1) to (5) as mentioned above. After that, add the following code in the Program.cs file, which performs the HTML to PDF conversion using the DrawHtml method approach:
//Create a variable containing the HTML code as string
var html = "<!DOCTYPE html>" +
"<html>" +
"<head>" +
"<style>" +
"p.round {" +
"font: 36px verdana;" +
"color: Red;" +
"border: 4px solid SlateBlue;" +
"border-radius: 16px;" +
"padding: 3px 5px 3px 5px;" +
"}" +
"</style>" +
"</head>" +
"<body>" +
"<p class='round'>Thank You for shopping with us!</p>" +
"<p class='round'>Hope to see you again soon.</p>" +
"</body>" +
"</html>";
// Create a GcPdfDocument instance
var doc = new GcPdfDocument();
// Add a new page to the document
var page = doc.Pages.Add();
// Take the Graphics instance of the page
var g = page.Graphics;
//Define GcHtmlBrowser instance
var path = new BrowserFetcher().GetDownloadedPath();
using (var browser = new GcHtmlBrowser(path))
{
// Render the HTML string on the PDF, using the DrawHtml method
var ok = g.DrawHtml(browser, html, 72, 72, new HtmlToPdfFormat(false) { MaxPageWidth = 6.5f }, out SizeF size);
// Additionally, draw a rounded rectangle around this HTML string
if (ok)
{
var rc = new RectangleF(72 - 4, 72 - 4, size.Width + 8, size.Height + 8);
g.DrawRoundRect(rc, 8, Color.PaleVioletRed);
}
//Save the PDF Document
doc.Save("HTMLStringToPDF.pdf");
}
The screenshot below depicts the PDF file generated by executing the above code snippet:
Web Pages to PDF
The GcHtmlBrowser class and HtmlPage class can be used to render webpages to a PDF. If the above-discussed firm wants to update its customers or the stakeholders with the new products every month, it sends a PDF generated from the New Releases page on its website. The process should be automated to make the stakeholders aware of the new launches regularly and create a consolidated report at the end of every year.
Here is a view of one such web page:
The GcHtmlBrowser class, along with HtmlPage class, can be used to perform the conversion. The Uri of the webpage will be used, and the required settings of the PDF are applied using the PdfOptions class.
Follow steps (1) to (4) as mentioned above. After that, add the following code in the Program.cs file, which performs the HTML to PDF conversion using the GcHtmlBrowser class approach:
//Specify a PDF file name
var fn = @"webpage.pdf";
//Specify the url to be used for PDF conversion
var uri = new Uri(@"https://www.amazon.com/gp/new-releases/electronics/ref=zg_bs_tab_t_bsnr");
//Define GcHtmlBrowser instance
var path = new BrowserFetcher().GetDownloadedPath();
using (var browser = new GcHtmlBrowser(path))
{
//The PdfOptions instance is created to specify the pdf related settings that will show up in the generated PDF.
var pdfOptions = new PdfOptions()
{
PageRanges = "1-100",
Margins = new PdfMargins(0.2f), // narrow margins all around
Landscape = false,
PreferCSSPageSize = true
};
// Create an HtmlPage instance rendering the source Uri:
using var htmlPage = browser.NewPage(uri);
// Render the source Web page to the temporary file:
htmlPage.SaveAsPdf(fn, pdfOptions);
}
Here is a quick view of the PDF file generated from the web page:
Customizing the PDF Output
As mentioned earlier in the blog, the PdfOptions class exists to allow the developer to control how the PDF output will appear once generated. Below are the different features available within the class:
- DisplayHeaderFooter: Gets or sets a value indicating whether the header and footer are rendered.
- FooterTemplate: Gets or sets the HTML template for the page footer.
- FullPage: Gets or sets a value indicating whether the whole HTML page should be rendered as a single PDF page.
- HeaderTemplate: Gets or sets the HTML template for the page header.
- Landscape: Gets or sets a value indicating whether the paper orientation is Landscape.
- Margins: Gets or sets page margins in inches.
- PageHeight: Gets or sets the page height in inches.
- PageRanges: Gets or sets the range of pages to render, e.g., '1-5, 8, 11-13'.
- PageWidth: Gets or sets the page width in inches.
- PreferCSSPageSize: Gets or sets a value indicating whether the CSS-defined page size should have priority over what is declared in PageWidth and PageHeight.
- PrintBackground: Gets or sets a value indicating whether to print background graphics.
- Scale: Gets or sets the scale factor between 0.1 and 2.0.
Conclusion
In this blog, we explored the two main approaches to utilizing the DsHtml library when converting HTML content to PDF format. If you enjoyed learning about this topic and are interested in more code examples, please check out our in-depth HTML conversion demos, complete with code and comments, as well as rendered PDF output. These are a great resource when testing out different features contained within the library. The documentation linked throughout this blog is also a valuable resource for learning more about the API library. In case you’re more of a visual learner, then you can check out our video below, where we discuss how to programmatically convert HTML to PDF in .NET Core C#:
How do you use DsHtml in your applications? Let us know in the comments.
Visit Help | DsHtml Architecture | Demo
Ready to Test it Out? Download Document Solutions Today!