Using TextParser in WinForms
In This Topic
One of the greatest advantages of the TextParser library is that it can be used with all platforms which implement .NET Standard 2.0. This enables you to use this library in almost any type of .NET application.
This specific walkthrough explains how you can use the TextParser library in a WinForms application to extract text from HTML documents (For example, Emails). Further, it also explains how the extraction results can be visualized using different controls such as FlexGrid, FlexPie, and others.
After completing the implementation of this walkthrough, you will learn the following:
- Extracting text using HTML extractor
- Populating the WinForms controls with the extraction results
Let us take an example to understand the implementation of the above mentioned points. Consider a scenario where we want to extract information (example, customer name, total order amount, ordered items, and so on from order confirmation emails) received from an e-commerce provider like Amazon as shown in the image below.
As it is commonly observed that the emails sent by a specific provider follow the same general structure in text presentation. So, one email can be used as a template to extract the data from all the other emails. The HTML extractor provided by the TextParser library would appropriately serve the purpose in this scenario as it is capable of extracting the desired text from HTML documents correctly, even if slight differences exist between the template emails and the source emails.
Step 1: Extracting text using HTML extractor
- Create a new Windows Forms application.
- Install the ‘C1.TextParser’ NuGet package in your application to add the appropriate references to the project.
- Copy the template email (‘amazonEmail1.html’) and the source email (‘amazonEmail2.html’) files from the ECommerceOrder product sample to your project folder.
- Load the template email by adding the following line of code to Form1.cs:
//Open the stream used as template for extracting data from similar HTML streams
Stream amazonTemplateStream = File.Open(@"..\\..\\amazonEmail1.html", FileMode.Open);
- Initialize the HtmlExtractor class using the loaded template email, using the code provided below:
//Initialize the HTMLExtractor class to extract data from HTML source based on template
HtmlExtractor amazonTemplate = new HtmlExtractor(amazonTemplateStream);
- Define fixed placeholders to extract the name of the customer, the expected delivery date and the total order amount by using the AddPlaceHolder method of the HtmlExtractor class. Note that fixed placeholders are shown marked with blue coloured boxes in the above image.
//Fixed placeHolder for the customer name
String customerNameXPath =
@"/html/body/div[2]/div/div/div/table/tbody/tr[2]/td/p[1]";
amazonTemplate.AddPlaceHolder("CustomerName", customerNameXPath, 6, 15);
//Fixed placeHolder for the expected delivery date
String deliveryDateXPath =
@"/html/body/div[2]/div/div/div/table/tbody/tr[3]/td/table/tbody/tr[1]/td[1]/p/strong";
amazonTemplate.AddPlaceHolder("DeliveryDate", deliveryDateXPath);
//Fixed placeHolder for the total amount of the order
String totalAmountXPath = @"//*[@id=""shipmentDetails""]/table/tbody/tr[8]/td[2]/strong";
amazonTemplate.AddPlaceHolder("TotalOrderAmount", totalAmountXPath);
- Define repeated place holders to extract the price, name and seller of each ordered item. Note that repeated placeholders are shown marked with red coloured boxes in the above image.
//Repeated block for each article in the order
String articleNameXPath = @"//*[@id=""shipmentDetails""]/table/tbody/tr[1]/td[2]/p/a";
amazonTemplate.AddPlaceHolder("OrderedArticles", "ArticleName", articleNameXPath);
String articlePriceXPath = @"//*[@id=""shipmentDetails""]/table/tbody/tr[1]/td[3]/strong";
amazonTemplate.AddPlaceHolder("OrderedArticles", "ArticlePrice", articlePriceXPath);
String articleSellerXPath = @"//*[@id=""shipmentDetails""]/table/tbody/tr[1]/td[2]/p/span";
amazonTemplate.AddPlaceHolder("OrderedArticles", "ArticleSeller", articleSellerXPath, 8, 18);
- Load the source email and invoke the Extract method of the HtmlExtractor class for extracting the desired text from the source email. Note that after extraction, the results are returned into a variable of type IExtractionResult.
//Open the stream from which you wish to extract the data
Stream source = File.Open(@"..\\..\\amazonEmail2.html", FileMode.Open);
Step 2: Designing the dashboard
- Drag and drop the DashboardLayout control from the toolbox onto your Form and set its Dock property to Fill.
Observe: A layout of the type Split is attached to the DashboardLayout control and it contains two child containers (Splitter Panels) by default.
- Right click inside the DashboardLayout control. A context menu will open up. Click ‘Select c1DashboardLayout1.SplitContentPanel’ option.
Observe: The SplitContentPanel (layout control attached to the DashboardLayout by default) is selected.
- Click on the SplitContentPanel’s smart tag to open its Tasks Menu. Select ‘Add Panel’ from the DashboardSplitContainer Tasks menu to add a third child container to the dashboard.
- Select ‘c1SplitterPanel1’ and set its Dock property to Left.
- Drag and drop the RichTextBox control from the Toolbox onto ‘c1SplitterPanel1’. Set its following properties: Font size to ‘10.2’ and BackColor to ‘226, 218, 241’.
- Drag and drop the C1FlexGrid control from the Toolbox onto ‘c1SplitterPanel2’.
- Drag and drop the FlexPie control from the Toolbox onto ‘c1SplitterPanel3’. Set its Dock property to Fill.
Step 3: Populating Dashboard controls with the extraction results
Display extraction results as a JSON string in the RichTextBox control:
- Convert the extracted result to JSON format and assign it to the Text property of the RichTextBox control. Add the following code to Form1.cs to implement the described approach:
richTextBox1.Text = extractedResult.ToJsonString();
Display extraction results in a FlexGrid control:
- Configure the properties of the FlexGrid control by adding and calling the following method in Form1.cs:
private void ConfigureFlexGrid()
{
c1FlexGrid1.Rows.Count = 1;
c1FlexGrid1.Cols.Count = 2;
c1FlexGrid1.Cols.Fixed = 0;
c1FlexGrid1[0, 0] = "Placeholder";
c1FlexGrid1[0, 1] = "Value";
c1FlexGrid1.Cols[0].StarWidth = "*";
c1FlexGrid1.Cols[1].StarWidth = "2*";
c1FlexGrid1.Font = new System.Drawing.Font("Segoe UI", 8.45f);
c1FlexGrid1.Row = -1;
//styles
CellStyle cs = c1FlexGrid1.Styles.Normal;
cs.Border.Direction = BorderDirEnum.Vertical;
cs.TextAlign = TextAlignEnum.LeftCenter;
cs = c1FlexGrid1.Styles.Add("Data");
c1FlexGrid1.Styles.Alternate.BackColor = Color.FromArgb(232, 216, 232);
// outline tree
c1FlexGrid1.Tree.Column = 0;
c1FlexGrid1.Tree.Style = TreeStyleFlags.Simple;
c1FlexGrid1.Tree.LineStyle = System.Drawing.Drawing2D.DashStyle.Solid;
c1FlexGrid1.AllowMerging = AllowMergingEnum.Nodes;
// other
c1FlexGrid1.AllowResizing = AllowResizingEnum.Columns;
c1FlexGrid1.SelectionMode = SelectionModeEnum.Cell;
c1FlexGrid1.HighLight = HighLightEnum.Always;
c1FlexGrid1.FocusRect = FocusRectEnum.Solid;
c1FlexGrid1.AllowSorting = AllowSortingEnum.None;
}
- Convert the extraction result to XML format so that it can be displayed as a hierarchy in the FlexGrid control, by adding the following line of code in Form1.cs:
//Convert Json to XML
XmlDocument doc =
JsonConvert.DeserializeXmlNode(extractedResult.ToJsonString(), "ExtractedResult");
- Read the XML document and populate the FlexGrid with XML data by defining the following method in Form1.cs:
private void GetXMLData(XmlNode node, int level)
{
// skip comment nodes
if (node.NodeType == XmlNodeType.Comment)
return;
// add new row for this node
int row = c1FlexGrid1.Rows.Count;
c1FlexGrid1.Rows.Add();
if (node.Name.Equals("Property"))
c1FlexGrid1[row, 0] = node.Attributes["Name"].Value;
else
c1FlexGrid1[row, 0] = node.Name;
if (node.ChildNodes.Count == 1)
{
c1FlexGrid1[row, 1] = node.InnerText;
c1FlexGrid1.SetCellStyle(row, 1, c1FlexGrid1.Styles["Data"]);
}
// make new row a node
c1FlexGrid1.Rows[row].IsNode = true;
c1FlexGrid1.Rows[row].Node.Level = level;
// if this node has children, get them as well
if (node.ChildNodes.Count > 1)
{
// recurse to get children
foreach (XmlNode child in node.ChildNodes)
GetXMLData(child, level + 1);
}
}
- Call the ‘GetXMLData’ method configured above in Form1.cs to display the extraction results in a hierarchical format in FlexGrid control.
//Populate the FlexGrid with XML data
GetXMLData(doc.ChildNodes[0].ChildNodes[1], 0);
Display extraction results in the FlexPie control:
To display the extracted results in a FlexPie control, we would need to create a datasource using these results which can be bound to the chart. To create the datasource we would define classes to map the extraction results to the class members and bind the chart to the class object.
- Create a class named ‘OrderedArticle’ which would represent each item in the list of ordered items and hence correspond to the repeated place holders. It is important to note that each class property in the ‘OrderedArticle’ class has a DataMemberAttribute whose ‘Name’ property corresponds to the names of the repeated placeholders.
public class OrderedArticle
{
[DataMember(Name = "ArticleName")]
public String Article_Name { get; set; }
[DataMember(Name = "ArticleSeller")]
public String Article_Seller { get; set; }
[DataMember(Name = "ArticlePrice")]
public String Article_Price { get; set; }
public Decimal ArticlePriceInDecimals
{
get
{
return decimal.Parse(Regex.Replace(Article_Price, @"[^\d.]", ""));
}
set
{
ArticlePriceInDecimals = value;
}
}
}
- Create a class named ‘AmazonTemplateRepeatedBlocks‘ class having a DataMemberAttribute whose ‘Name’ property corresponds to the name of the repeated block (‘OrderedArticles’) to which the repeated placeholders belong.
public class AmazonTemplateRepeatedBlocks
{
[DataMember(Name = "OrderedArticles")]
public List<OrderedArticle> Ordered_Items { get; set; }
}
}
- Retrieve the information about the ordered articles into the custom collection of class objects using the Get method of the IExtractionResult interface as shown:
List<OrderedArticle> articles=
extractedResult.Get<AmazonTemplateRepeatedBlocks>().Ordered_Items;
- Finally for populating the FlexPie with the information about the names and prices of the ordered items, add the following code to Form1.cs:
//Populate the Flexpie with the extracted results
flexPie1.DataSource = articles;
flexPie1.Binding = "ArticlePriceInDecimals";
flexPie1.BindingName = "Article_Name";
//other settings
flexPie1.Legend.Position = C1.Chart.Position.Right;
flexPie1.Legend.ItemMaxWidth = 350;
flexPie1.Legend.TextWrapping = C1.Chart.TextWrapping.Wrap;
}
- Run the application. Observe that the controls are populated with the extracted results as shown in the image below: