Exporting extracted result to class/CSV
In This Topic
TextParser provides different techniques for extracting text from an input source and generate output in JSON string format. However, consider a scenario where a user wants to extract text from plain text and HTML documents using several extractors. Further, the user also wants to store it in a purposeful approach. This walkthrough explains how you can retrieve the extracted text into a custom user defined class. It also demonstrates how you can export the extraction result into a CSV file.
After completing the implementation of this walkthrough, you will be able to:
- Extract text using Template based extractor
- Retrieve extraction results in a custom class
- Export extraction results to CSV
For an example, let's take a scenario where the user wants to extract all the ‘ERROR’ logs from the server log file (‘input.txt’). Following drop down section shows the input source.
Click here to see the input
2012-11-11 00:51:25,676 INFO - Starting Backup Manager 5.0.0 build 18536
2012-11-11 00:51:25,789 WARN - Generating Self-Signed SSL Certificate (alias = cdp)
2012-11-11 00:51:26,566 WARN - Saved SSL Certificate (alias = cdp) to Key Store /usr/sbin/r1soft/conf/comkeystore
2012-11-11 00:51:26,789 INFO - Operating System: Linux
2012-11-11 00:51:27,234 INFO - Architecture: amd64
2012-11-11 00:51:27,986 INFO - OS Version: 2.6.32-279.11.1.e16.x86_64
2012-11-11 00:51:28,123 INFO - Processors Detected: 1
2012-11-11 00:51:28,954 INFO - Max Configured Heap Memory: 989.9 MB
2012-11-11 00:51:29,276 ERROR - Unsuccessful: create index stateIndex on RecoveryPoint (state)
2012-11-11 00:51:29,980 ERROR - Index 'STATEINDEX' already exists in Schema 'R1DERBYUSER'.
2012-11-11 00:51:30,213 WARN - Invalid feature (0xECEBE6F7).
2012-11-11 00:51:30,736 INFO - Tomcat Wrapper starting
2012-11-11 00:51:30,800 INFO - Tomcat Wrapper started
Extracting information from this input file can help in troubleshooting the errors quickly. From the above input file, you can observe that each log entry follows a predefined fixed structure, which consists of four major elements; the date, the time (up to ms), the log type and description of the log. Considering this, it would be ideal to use the Template-Based extractor to extract the desired text from the input file.
Step 1: Extract text using Template-Based extractor
- Create a new application (any target that supports .NET Standard 2.0).
- Create a sample input text file named “input.txt”, by copy pasting the contents described above and place the input file in the project’s root directory.
- Install the ‘C1.TextParser’ NuGet package in your application. For more information, refer Adding NuGet Packages to your app.
- To create a template that defines the structure of a log entry (the text to be extracted from the input file), add a new XML file to your project. Name it as ‘template.xml’ and add the following code to it.
- In order to extract the desired text from the input stream based on the above template, add the following lines of code to Program.cs. The code provided below initializes and configures the TemplateBasedExtractor class to perform the text extraction and display the extracted result in the JSON format on the console. After extraction, the results are returned into a variable of type IExtractionResult.
Step 2: Retrieve extraction results in a custom class
- Define the following classes to map the extraction results to a custom class. It is important to note that each class property has a DataMember Attribute, the ‘Name’ property of which corresponds to the “name” property of the template element to which is should be mapped.
- Retrieve the extraction result into the custom class using the Get method of the IExtractionResult interface as shown:
Step 3: Export extraction results to CSV
The extracted text can further be output to a CSV file. This section describes the same in detail:
- Add a new class file to the project. Name it as ‘CsvExportHelper.cs’. This class will be used to convert the IEnumerablecollection containing the extraction results into a string formatted in CSV format. Add the following code to the ‘CsvExportHelper.cs’ file:
- Invoke the ExportList method of the CsvExportHelper class to convert the IEnumerable collection containing the extraction results into a string formatted in CSV format.
- Finally write the string content to a CSV file as shown:
- Run the application. Observe that the extraction results have been successfully exported to "ExtractErrorLogs.csv" as shown in the image below: