This sample demonstrates how to extract table data from a tagged PDF. We load a PDF that contains structure tags (it was produced by saving an MS Word document as PDF) into the source GcPdfDocument, and use the
GetLogicalStructure() method to fetch the document's structure specified by the tags.
We also create the resulting PDF (the document that you see) and use it to print out the table data fetched from the source PDF, and original pages from it that contained the tables.
To find tables we look for
Table tags in the source PDF structure. Within those tag nodes,
TR define rows, and
TD the actual data in the table cells.
We group the tables by source page, and in the resulting PDF print out the tables' data, followed by the page from the source PDF on which those tables were found.
DsPdf is compatible with .NET 6 or later, .NET Core 2.0 or later, .NET Standard 2.x, .NET Framework 4.6.1 or later. All features are supported on Windows, macOS and Linux.