Extract Text from PDF

Posted by: john.burke on 14 August 2024, 9:01 am EST

    • Post Options:
    • Link

    Posted 14 August 2024, 9:01 am EST - Updated 14 August 2024, 9:06 am EST

    For several .PDF documents, I am able to use an export filter to export .PDFs to .html and then parse the .html for text.

    I have this one .PDF where the .html file is split such that each individual character is in it’s own html element, so I am unable to parse the .html efficiently.

    The .PDF file has text, but when I use the Xls and Rtf export filters, the content is exported as an image.

    Is there any sample that shows how to extract or iterate over the text fields in a .PDF?

    I have the WinForms ComponentOne v4.0.20173.282 and a later version on my development machines.

    Thanks for any advice or help in advance…

    John

    PDF file has a bunch of tables in it like below…

  • Posted 16 August 2024, 2:23 am EST

    Hello John,

    You can use C1PdfDocumentSource.GetWholeDocumentRange().GetText() method to extract text from a PDF document as follows:

    var mc = new C1.Win.C1Document.Util.C1DXTextMeasurementContext();
    var dr = _document.GetWholeDocumentRange(mc);
    var text = dr.GetText();
    textBox1.Text = text;

    Please refer to the attached sample for implementation and let us know if you face any issues (see PDF_TextHandling.zip).

    Please share a dummy PDF with which you face issues for investigation, along with the C1-specific code if you are using any.

    FYI, the version of the controls you’re currently using is quite outdated and no longer supported. Therefore, we highly recommend updating your C1 version to the latest release to take advantage of the newest features and fixes.

    Regards,

    Uttkarsh.

  • Posted 20 September 2024, 10:57 am EST - Updated 20 September 2024, 11:14 am EST

    I forgot to come back and thank you for the solution.

    Too busy using it to process data…

    Thanks!

    John

Need extra support?

Upgrade your support plan and get personal unlimited phone support with our customer engagement team

Learn More

Forum Channels