[]
        
(Showing Draft Content)

Key Features

Following are the techniques used in TextParser library for text extraction:

  • Text extraction based on regular expressions

    The TextParser library provides a quick and effective method to pull the relevant text placed between the occurrence of two regular expressions in a plain text document. You can set starting and ending regular expression patterns to modify the text extraction as required.

  • Text extraction based on XML template

    The TextParser library allows you to parse a plain text document following any user defined structure format, called as template, which is specified following a declarative approach that is XML. All the text that matches the specification of the template would be extracted from the input text.

  • Text extraction based on HTML markup

    The TextParser library allows you to extract text from HTML documents such as emails. The main purpose of this extraction technique is to automate the process of extracting relevant text that we receive frequently in our e-mail client such as information about flights tickets, e-commerce receipts, and so on.