[]
Consider a scenario where we want to extract all the email addresses that appear in a text file. In this case, we can use a simple template that would refer to a single template element defining a set of properties to set text extraction criteria.
For the above scenario, we can simply use the following template definition:
<template extractFormat="email" />
The above template definition consists of extractFormat property for element “template”. The extractFormat property expresses the format of the intended extract text for a specific template element and enables text extraction based on the data format..
Similarly, we can use the following extractFormat property values to extract different type of textual data:
int - This format can be used to extract integers from a given text input.
Check Example
Input | Template Definition | Output (JSON) |
|---|---|---|
There are 8 items in order number 11002345. |
| { "Extractor": "XMLTemplateBased", "Result": { "simpleTemplate": [ 8, 11002345 ] } } |
bool - This format can be used to extract the boolean values that is "true" or "false" from a given text input.
Check Example
Input | Template Definition | Output (JSON) |
|---|---|---|
There are 8 items in order number 11002345 and they will be true to their description. Please let us know in case you find any false information in the products description. |
| { "Extractor": "XMLTemplateBased", "Result": { "simpleTemplate": [ true, false ]}} |
float - This format can be used to extract floating point numbers from a given text input.
Check Example
Input | Template Definition | Output (JSON) |
|---|---|---|
There are 8 items in order number 11002345. The price of each item in the order is greater than Rs. 100.00. |
| { "Extractor": "XMLTemplateBased", "Result": { "simpleTemplate": [ 8.0, 11002345.0, 100.0 ] } } |
email - This format can be used to extract emails from a given text input.
Check Example
Input | Template Definition | Output (JSON) |
|---|---|---|
The order placed by "Armor Cathe" is successful. There are 8 items in order number 11002345. The price of each item in the order is greater than Rs. 100.00. The order has to be delievered at pin code 0012345. For order details visit http://orderAtEase.com or refer to the registered email address armor.cathe@gamil.com. |
| { "Extractor": "XMLTemplateBased", "Result": { "simpleEmailTemplate": [ "armor.cathe@gamil.com" ] } } |
url - This format can be used to extract all URLs (address of a World Wide Web page) from a given text input.
Check Example
Input | Template Definition | Output (JSON) |
|---|---|---|
There are 8 items in order number 11002345. The price of each item in the order is greater than Rs. 100.00. For order details visit http://orderAtEase.com. |
| { "Extractor": "XMLTemplateBased", "Result": { "simpleTemplate": [ "100.00.", "http://orderAtEase.com." ] } } |
quotedString - This format can be used to extract a sequence of characters that appear between double quotes.
Check Example
Input | Template Definition | Output (JSON) |
|---|---|---|
The order placed by "Armor Cathe" is successful. There are 8 items in order number 11002345. The price of each item in the order is greater than Rs. 100.00. For order details visit http://orderAtEase.com. |
| { "Extractor": "XMLTemplateBased", "Result": { "simpleTemplate": [ "Armor Cathe" ] } } |
word - This format is used to extract a single literal word (having only a-z and A-Z characters).
Check Example
Input | Template Definition | Output (JSON) |
|---|---|---|
The 8 orders placed by "Armor Cathe" are successful. |
| { "Extractor": "XMLTemplateBased", "Result": { "simpleTemplate": [ "The", "order", "placed", "by", "Armor", "Cathe", "are", "successful", ] } } |
whiteSpaces - This format is used to extract one or more occurrences of a white space (tab, new line, or space).
Check Example
Template Definition |
|---|
|
regex - This format is used when the above defined formats cannot be used. To use this format, you need to simply specify a regular expression that matches the data to be extracted. For example, the following template definition can be used to extract a sequence of 7 digits from the input text.
Check Example
Input | Template Definition | Output (JSON) |
|---|---|---|
The order placed by "Armor Cathe" is successful. There are 8 items in order number 11002345. The price of each item in the order is greater than Rs. 100.00. The order has to be delievered at pin code 0012345. For order details visit http://orderAtEase.com. | To extract a sequence of 7 digits |
You can set the following properties on the template XML element to enhance text extraction:
Name - This property is mandatory to be used for specifying a name for the template elements. Naming each template element helps quickly recognize the extracted element. It is necessary to provide a name to a template element when we want to inject the template element into another.
Check Example
Input | Template Definition | Output (JSON) |
|---|---|---|
This is my working email: alexander.silva@grapecity.com and my private email is: alexsilva050@gmail.com. Please feel free to contact me also with silva050alexander@gmail.com. |
| { "Extractor": "XMLTemplateBased", "Result": { "simpleEmailTemplate": [ "alexander.silva@grapecity.com", "alexsilva050@gmail.com", "silva050alexander@gmail.com" ] } } |
startingRegex -This property represents a regular expression that must match at the beginning of a possible instance of the template element.
Check Example
Input | Template Definition | Output (JSON) |
|---|---|---|
This is my working email: alexander.silva@grapecity.com and my private email is: alexsilva050@gmail.com. Please feel free to contact me also with silva050alexander@gmail.com. | To extract working email address: | { "Extractor": "XMLTemplateBased", "Result": { "simpleEmailTemplate": [ "alexander.silva@grapecity.com" ] } } |
endingRegex - This property represents a regular expression that must match at the end of a possible instance of the template element.
Check Example
Input | Template Definition | Output (JSON) |
|---|---|---|
This is my working email: alexander.silva@grapecity.com and my private email is: alexsilva050@gmail.com. Please feel free to contact me also with silva050alexander@gmail.com. | To extract the email addresses that appear at the end of the sentence. | { "Extractor": "XMLTemplateBased", "Result": { "emails": [ "alexsilva050@gmail.com", "silva050alexander@gmail.com" ] } } |
ignoreWhitespaces - This property is used to specify whether the template-based extractor should ignore all the white spaces when parsing an input source or not. The default value of this property is “true” so the template-based extractor ignores all the white spaces when parsing an input source. However, to allow parsing of text while considering the whitespaces, set the value of this property to "false".
Check Example
Input | Template Definition | Output (JSON) |
|---|---|---|
This is my working email: alexander.silva@grapecity.com and my private email is: alexsilva050@gmail.com. Please feel free to contact me also with silva050alexander@gmail.com. | To extract working email address: | { "Extractor": "XMLTemplateBased", "Result": {} } |