In This Topic
Following are the techniques used in TextParser library for text extraction:
- Text extraction based on regular expressions
The TextParser library provides a quick and effective method to pull the relevant text placed between the occurrence of two regular expressions in a plain text document. You can set starting and ending regular expression patterns to modify the text extraction as required.
- Text extraction based on XML template
The TextParser library allows you to parse a plain text document following any user defined structure format, called as template, which is specified following a declarative approach that is XML. All the text that matches the specification of the template would be extracted from the input text.
- Text extraction based on HTML markup
The TextParser library allows you to extract text from HTML documents such as emails. The main purpose of this extraction technique is to automate the process of extracting relevant text that we receive frequently in our e-mail client such as information about flights tickets, e-commerce receipts, and so on.