Posted 10 March 2026, 6:54 am EST
Hi Robin,
Thank you for your patience.
After further investigation with our development team, we identified the reason why the GetText() operation appears to hang or take a long time for your scenario.
By default, DsPdf uses the RecognitionAlgorithm.Advanced mode for text extraction. This algorithm attempts to reconstruct the logical structure of the document (for example, grouping text into paragraphs and reconstructing layout). While this approach works well for documents with clear paragraph-based structures, it can be inefficient for PDFs generated from Excel where the content is arranged in dense tabular layouts across many columns.
In such cases, the algorithm may attempt to combine column text into paragraphs, which significantly increases the processing time.
For Excel-like PDFs, we recommend switching the recognition algorithm to AcrobatLike, which performs a simpler extraction similar to Acrobat’s behavior and is much faster for this type of document.
You can apply this change before calling GetText():
var doc = new GcPdfDocument();
doc.Load("sample.pdf");
doc.RecognitionAlgorithm = GrapeCity.Documents.Pdf.Recognition.RecognitionAlgorithm.AcrobatLike;
string text = doc.Pages[0].GetText();
In our internal tests with a similar Excel-generated PDF, this change reduced the extraction time from around 20–30 seconds to approximately 2-3 seconds.
You can refer to the attached code sample that uses the above code snippet and extracts the text efficiently.
Please let us know if you still encounter any issues with this approach in your PDF file.
Best regards,
Chirag
Attachment: GetTextIssue.zip