[]
Extracting table data from a PDF document is essential for many businesses and professionals dealing with large amounts of data stored in PDF documents. PDFs are a popular way to share documents, but their structure, often as an image or non-editable format, can make extracting information, particularly tables, difficult. Whether you are working with invoices, reports, research papers, or financial statements, being able to extract structured data from tables in PDF documents efficiently can save time, reduce errors, and streamline workflows.
DsPdfViewer provides Extract table data toolbar button that allows you to extract table data from PDF documents. Extract table data toolbar button opens Table Extraction side panel, which will enable you to select region, configure table recognition settings, preview, and export table data in multiple formats such as TSV, CSV, JSON, XLSX, XML, and HTML. The Table Extraction side panel provides the following buttons:
Select Region: Enables you to define the region for extracting table data.
Cancel: Allows you to cancel the current selection.
Defaults: Allows you to reset all table extraction options to their default values.
File Format Dropdown Menu: Enables you to select the desired file format for exporting the extracted table data.
Download: Allows you to save the extracted table data in the selected file format.
Copy: Allows you to copy the extracted data to the clipboard for quick use and easy pasting into other applications.
The following table lists the available extraction options in the Table Extraction side panel, which allow you to configure additional settings for optimizing table recognition:
Option | Description |
---|---|
Min row spacing | The factor used to determine the minimum spacing between table rows. The calculated spacing depends on this value and font size. |
Min col spacing | The factor used to determine the minimum spacing between table columns. The calculated spacing depends on this value and font size. |
Min row height (pts) | The minimum height of rows in the table, in points. Higher values will put more adjacent text lines within the same row. |
Min col width (pts) | The minimum width of columns in the table, in points. Higher values will put more adjacent text within the same column. |
The Extract table data also features visual table editing, which provides an intuitive interface for refining extracted table data from PDF content. After selecting an area within a PDF document, this tool displays the boundaries of the detected table, allowing you to adjust and structure the table for accurate data extraction. The visual table editing tool provides the following features:
Table Boundaries: Displays the detected table's boundaries as a visual reference for editing.
Row and Column Manipulation: Add or remove rows and columns to change the table structure.
Table repositioning: Drag the table to move it within the document.
Resizing capabilities: Adjust row heights and column widths to perfectly align cells with the text on the PDF page.
Real-time Feedback: Instant rendering updates and visual feedback during editing ensure a smooth experience.
!type=note
Note: To use "Extract Table Data" feature, SupportApi must be configured. Note that SupportApi is only available with the Professional License of DsPdfViewer. For further details, refer to Configure Server-Based PDF Editor.
Open the Table Extraction side panel by clicking the Extract table data toolbar button.
If not already in selection mode, click Select region to activate it. By default, the side panel opens in selection mode.
Define the region from which to extract the table data.
Choose the desired file format for exporting the extracted table data.
Click Download to save the file in the selected format.
DsPdfViewer also allows you to display the Table Extraction side panel programmatically using addTableExtractionPanel method. Refer to the following example code to display the Table Extraction side panel:
// Display Table Extraction side panel.
const handle = viewer.addTableExtractionPanel();
viewer.expandPanel(handle);
Limitations
XLSX files can only be downloaded and not copied to the clipboard due to technical constraints.
The accuracy of extraction may vary with highly complex or irregular table structures.