Consider a scenario where we want to extract all the email addresses that appear in a text file. In this case, we can use a simple template that would refer to a single template element defining a set of properties to set text extraction criteria.
For the above scenario, we can simply use the following template definition:
<template extractFormat="email" />
The above template definition consists of extractFormat property for element “template”. The extractFormat property expresses the format of the intended extract text for a specific template element and enables text extraction based on the data format..
Similarly, we can use the following extractFormat property values to extract different type of textual data:
Check Example
Input | Template Definition | Output (JSON) |
---|---|---|
There are 8 items in order number 11002345. |
<template name="simpleTemplate" extractFormat="int"/> |
{ "Extractor": "XMLTemplateBased", "Result": { "simpleTemplate": [ 8, 11002345 ] } } |
Check Example
Input | Template Definition | Output (JSON) |
---|---|---|
There are 8 items in order number 11002345 and they will be true to their description. Please let us know in case you find any false information in the products description. | <template name="simpleTemplate" extractFormat="bool" /> |
{ "Extractor": "XMLTemplateBased", "Result": { "simpleTemplate": [ true, false ] |
Check Example
Input | Template Definition | Output (JSON) |
---|---|---|
There are 8 items in order number 11002345. The price of each item in the order is greater than Rs. 100.00. | <template name="simpleTemplate" extractFormat="float" /> |
{ "Extractor": "XMLTemplateBased", "Result": { "simpleTemplate": [ 8.0, 11002345.0, 100.0 ] } } |
Check Example
Input | Template Definition | Output (JSON) |
---|---|---|
The order placed by "Armor Cathe" is successful. There are 8 items in order number 11002345. The price of each item in the order is greater than Rs. 100.00. The order has to be delievered at pin code 0012345. For order details visit http://orderAtEase.com or refer to the registered email address armor.cathe@gamil.com. | <template name="simpleEmailTemplate" extractFormat="email" /> |
{ "Extractor": "XMLTemplateBased", "Result": { "simpleEmailTemplate": [ "armor.cathe@gamil.com" ] } } |
Check Example
Input | Template Definition | Output (JSON) |
---|---|---|
There are 8 items in order number 11002345. The price of each item in the order is greater than Rs. 100.00. For order details visit http://orderAtEase.com. | <template name="simpleTemplate" extractFormat="url" /> |
{ "Extractor": "XMLTemplateBased", "Result": { "simpleTemplate": [ "100.00.", "http://orderAtEase.com." ] } } |
Check Example
Input | Template Definition | Output (JSON) |
---|---|---|
The order placed by "Armor Cathe" is successful. There are 8 items in order number 11002345. The price of each item in the order is greater than Rs. 100.00. For order details visit http://orderAtEase.com. | <template name="simpleTemplate" extractFormat="quotedString" /> |
{ "Extractor": "XMLTemplateBased", "Result": { "simpleTemplate": [ "Armor Cathe" ] } } |
Check Example
Input | Template Definition | Output (JSON) |
---|---|---|
The 8 orders placed by "Armor Cathe" are successful. |
<template name="simpleTemplate" extractFormat="word" /> |
{ "Extractor": "XMLTemplateBased", "Result": { "simpleTemplate": [ "The", "order", "placed", "by", "Armor", "Cathe", "are", "successful", ] } } |
Check Example
Template Definition |
---|
<template extractFormat="whiteSpaces" /> |
Check Example
Input | Template Definition | Output (JSON) |
---|---|---|
The order placed by "Armor Cathe" is successful. There are 8 items in order number 11002345. The price of each item in the order is greater than Rs. 100.00. The order has to be delievered at pin code 0012345. For order details visit http://orderAtEase.com. |
To extract a sequence of 7 digits
<template name="simpleEmailTemplate" extractFormat="regex:[0-9]{7}|" /> |
{ "Extractor": "XMLTemplateBased", "Result": { "simpleEmailTemplate": [ "1100234", "0012345" ] } } |
You can set the following properties on the template XML element to enhance text extraction:
Check Example
Input | Template Definition | Output (JSON) |
---|---|---|
This is my working email: alexander.silva@grapecity.com and my private email is: alexsilva050@gmail.com. Please feel free to contact me also with silva050alexander@gmail.com. |
<template name="simpleEmailTemplate" extractFormat="email" /> |
{ "Extractor": "XMLTemplateBased", "Result": { "simpleEmailTemplate": [ "alexander.silva@grapecity.com", "alexsilva050@gmail.com", "silva050alexander@gmail.com" ] } } |
Check Example
Input | Template Definition | Output (JSON) |
---|---|---|
This is my working email: alexander.silva@grapecity.com and my private email is: alexsilva050@gmail.com. Please feel free to contact me also with silva050alexander@gmail.com. |
To extract working email address: <template name="simpleEmailTemplate" startingRegex="working email:" extractFormat="email" /> |
{ "Extractor": "XMLTemplateBased", "Result": { "simpleEmailTemplate": [ "alexander.silva@grapecity.com" ] } } |
Check Example
Input | Template Definition | Output (JSON) |
---|---|---|
This is my working email: alexander.silva@grapecity.com and my private email is: alexsilva050@gmail.com. Please feel free to contact me also with silva050alexander@gmail.com. |
To extract the email addresses that appear at the end of the sentence. <template name="emails" endingRegex="[.]" extractFormat="email" /> |
{ "Extractor": "XMLTemplateBased", "Result": { "emails": [ "alexsilva050@gmail.com", "silva050alexander@gmail.com" ] } } |
Check Example
Input | Template Definition | Output (JSON) |
---|---|---|
This is my working email: alexander.silva@grapecity.com and my private email is: alexsilva050@gmail.com. Please feel free to contact me also with silva050alexander@gmail.com. |
To extract working email address: <template name="simpleEmailTemplate" startingRegex="working email:" extractFormat="email" ignoreWhiteSpaces="false" /> |
{ "Extractor": "XMLTemplateBased", "Result": {} } |