C1RichTextBox Task-Based Help / Parsing URLs into Hyperlinks
Parsing URLs into Hyperlinks

It would be highly useful if long URLs written in a document gets automatically formatted into readable links. Parsing URLs into meaningful text in the document enhances its readability.

You can inculcate this parsing capability into standard RichTextbox control to provide seamless reading and writing experience to your users. The C1RichTextBox control is used in conjunction with C1TextParser library for the conversion. The C1TextParser library is a strong text parsing .NET library that enables you to convert and format URLs to hyperlinks automatically.

This enhancement in the standard RichTextBox control helps in creating more powerful and smart text editor.

Hyperlink Parsing

To create an application for parsing URLs into hyperlinks, follow these steps.

Set up the Application UI

  1. Create a new WPF App(.NET Framework) in Visual Studio.
  2. In the Solution Explorer, right-click Dependencies and select Manage NuGet Packages.
  3. In NuGet Package Manager, select nuget.org as the Package source.
  4. Search and install the following packages:
    • C1.XAML.WPF.RichTextbox
    • C1.TextParser
  5. In XAML view, add a RichTextbox control, a button control and a checkbox control in the grid by adding the following code inside the <Grid> tags.
    XAML
    Copy Code
    <Grid.RowDefinitions>
        <RowDefinition Height="1*"/>
        <RowDefinition Height="10*"/>
    </Grid.RowDefinitions>
    <Grid.ColumnDefinitions>
        <ColumnDefinition Width="5*"/>
        <ColumnDefinition Width="13*"/>
    </Grid.ColumnDefinitions>
    <Button x:Name="btnPasteDoc" Content="Paste Text with hyperlinks" Width="144" HorizontalAlignment="Left" Click="btnPasteDoc_Click" Grid.Row="0" Grid.Column="0" Margin="20,0,0,0"/>
    <CheckBox Name="chkAutoConversion" Grid.Row="0" Grid.Column="1" Margin="23,3,7,7" Content="Allow Auto Hyperlink Conversion" FontSize="13" IsChecked="False" Checked="chkAutoConversion_Checked"/>
    <c1:C1RichTextBox Name="smartRichTextBox" Grid.Row="1" Grid.Column="0" Grid.ColumnSpan="2" Margin="20,20,0,16" Height="200" Width="550" HorizontalAlignment="left" VerticalAlignment="Top"/>
    

Create a Hyperlink Parser 

  1. Add a class file in your application and named it as HyperlinkParser.cs.
  2. Declare the following IHyperlinkParser interface which contains two methods that are ExtractURLs and GetDisplayText.
    CS
    Copy Code
    public interface IHyperlinkParser
    {
        IEnumerable<string> ExtractURLs(string text);
        string GetDisplayText(string uri);
    }
    
    VB
    Copy Code
    Public Interface IHyperlinkParser
        Function ExtractURLs(text As String) As IEnumerable(Of String)
        Function GetDisplayText(uri As String) As String
    End Interface
    
  3. Create a custom data extraction method named ExtractData to identify relevant portion of the text from a given text based on specific starting and ending criteria. In this method, the TextParser’s Starts-After-Continues-Until extractor extracts all the text present between StartsAfter/ContinuesUntil(Ends Before) text phrases.
    CS
    Copy Code
    private List<ExtractedData> ExtractData(string text, string startsAfter, string continueUntil)
    {
        //Extract URL from complete text
        var parser = new StartsAfterContinuesUntil(startsAfter, continueUntil);
        var result = parser.Extract(new MemoryStream(Encoding.UTF8.GetBytes(text)));
    
        //Create iObject by parsing the JSON string using Newtonsoft.Json.Linq.JObject.Parse
        var jObject = Newtonsoft.Json.Linq.JObject.Parse(result.ToJsonString());
    
        //Retrieves the value associated with the key "Result" from the JObject
        var jToken = jObject.GetValue("Result");
    
        //Convert jToken to a list of ExtractedData objects using jToken.ToObject<List<ExtractedData>>() 
        var extractedData = jToken.ToObject<List<ExtractedData>>();
        return extractedData;
    }
    
    VB
    Copy Code
    Private Function ExtractData(ByVal text As String, ByVal startsAfter As String, ByVal continueUntil As String) As List(Of ExtractedData)
        'Extract URL from complete text
        Dim parser = New StartsAfterContinuesUntil(startsAfter, continueUntil)
        Dim result = parser.Extract(New MemoryStream(Encoding.UTF8.GetBytes(text)))
        'Create iObject by parsing the JSON string using Newtonsoft.Json.Linq.JObject.Parse
        Dim jObject = Newtonsoft.Json.Linq.JObject.Parse(result.ToJsonString)
        'Retrieves the value associated with the key "Result" from the JObject
        Dim jToken = jObject.GetValue("Result")
        'Convert jToken to a list of ExtractedData objects using jToken.ToObject<List<ExtractedData>>() 
        Dim extractedData As List(Of ExtractedData) = jToken.ToObject(Of List(Of ExtractedData))()
        Return extractedData
    End Function
    
  4. Define the ExtractURLs method to extract URLs from the given text. This method processes the input text to extract URLs that start with "http" or "https". It uses a custom data extraction method ExtractData to identify potential URL fragments, validates each URL using the Uri.IsWellFormedUriString method, and collects them into a list. The final list of valid URLs is then returned.
    CS
    Copy Code
    public IEnumerable<string> ExtractURLs(string text)
    {
        //Extract URLs starting with these protocols
        var _protocols = new List<string> { "http", "https" };
        List<string> urls = new List<string>();
        text += " ";
        foreach (var protocol in _protocols)
        {
            //extract the URL from the whole text
            var links = ExtractData(text, protocol, @"\s+");
            foreach (var link in links.Select(x => x.ExtractedText))
            {
                //if hyperlink is correct, add to list of URLs
                string hyperlink = $"{protocol}{link}";
                if (!Uri.IsWellFormedUriString(hyperlink, UriKind.Absolute))
                    continue;
                if (!urls.Contains(hyperlink)) urls.Add(hyperlink);
            }
        }
        return urls;
    }
    
    VB
    Copy Code
    Public Function IHyperlinkParser_ExtractURLs(ByVal text As String) As IEnumerable(Of String) Implements IHyperlinkParser.ExtractURLs
        'Extract URLs starting with these protocols
        Dim _protocols As New List(Of String) From {"http", "https"}
        Dim urls As New List(Of String)()
        text &= " "
    
        For Each protocol In _protocols
            ' Extract the URL from the whole text
            Dim links = ExtractData(text, protocol, "\s+")
    
            For Each link In links.Select(Function(x) x.ExtractedText)
                ' If hyperlink is correct, add to list of URLs
                Dim hyperlink As String = $"{protocol}{link}"
    
                If Not Uri.IsWellFormedUriString(hyperlink, UriKind.Absolute) Then
                    Continue For
                End If
    
                If Not urls.Contains(hyperlink) Then
                    urls.Add(hyperlink)
                End If
            Next
        Next
        Return urls
    End Function
    
  5. Create a method named ExtractDomainName(Uri uri) that takes a Uri object as input and returns the domain name as a string. This method extracts the domain name from a given URI by processing the host part of the URI. It then filters and selects the first non-empty extracted fragment as the domain name. If no valid domain name is found, it returns the original host.
    CS
    Copy Code
    private string ExtractDomainName(Uri uri)
    {
        IEnumerable<string> data = null;
        int dotCount = uri.Host.Count(x => x == '.');
        // This condition will be executed for extracting domain name if the uri host contains more than 1 dot('.'). 
        if (dotCount > 1)
            data = ExtractData(uri.Host, @"\.", @"\.").Select(x => x.ExtractedText);
    
        // This condition will be executed for extracting domain name if the uri host contains only one dot('.'). 
        if (dotCount == 1)
            data = ExtractData($" {uri.Host}", @" ", @"\.").Select(x => x.ExtractedText);
        var domainName = data.Where(x => !string.IsNullOrEmpty(x.Trim())).First();
        
        //return domain name
        return string.IsNullOrEmpty(domainName) ? uri.Host : domainName;
    }
    
    VB
    Copy Code
    Private Function ExtractDomainName(uri As Uri) As String
        Dim data As IEnumerable(Of String) = Nothing
        Dim dotCount As Integer = uri.Host.Count(Function(x) x = "."c)
    
        ' This condition will be executed for extracting domain name if the uri host contains more than 1 dot('.'). For example, www.google.com
        If dotCount > 1 Then
            data = ExtractData(uri.Host, "\.", "\.").Select(Function(x) x.ExtractedText)
        End If
    
        ' This condition will be executed for extracting domain name if the uri host contains only one dot('.'). For example, youtube.com
        If dotCount = 1 Then
            data = ExtractData($" {uri.Host}", " ", "\.").Select(Function(x) x.ExtractedText)
        End If
    
        Dim domainName As String = data.Where(Function(x) Not String.IsNullOrEmpty(x.Trim())).First()
    
        'return domain name
        Return If(String.IsNullOrEmpty(domainName), uri.Host, domainName)
    End Function
    
  6. Create the method named ChooseDisplayText to filter out irrelevant segments, reverse the list to prioritize the most specific segments, and select the first valid segment as the display text. Then, format the selected text to title case and returns it. If the list of segments is empty, it returns an empty string.
    CS
    Copy Code
    protected virtual string ChooseDisplayText(List<string> words)
    {
        //get the correct words from the segments
        if (words.Count == 0) return string.Empty;
        string displayText = string.Empty;
        var correctWords = words.Where(x => !x.Equals("/")).ToList();
        if (correctWords.Count > 0)
        {
            //Choose the word to be used as display text for the hyperlink
            correctWords.Reverse();
            for (int index = 0; index < words.Count; index++)
            {
                displayText = $"{correctWords[index]}";
                break;
            }
        }
        //return the choosen word
        return CultureInfo.CurrentCulture.TextInfo.ToTitleCase(displayText);
    }
    
    VB
    Copy Code
    Protected Overridable Function ChooseDisplayText(ByVal words As List(Of String)) As String
        'get the correct words from the segments
        If (words.Count = 0) Then
            Return String.Empty
        End If
    
        Dim displayText As String = String.Empty
        Dim correctWords = words.Where(Function(x) Not x.Equals("/")).ToList()
        If correctWords.Count > 0 Then
            ' Choose the word to be used as display text for the hyperlink
            correctWords.Reverse()
            For index As Integer = 0 To words.Count - 1
                displayText = $"{correctWords(index)}"
                Exit For
            Next
        End If
    
        'return the choosen word
        Return CultureInfo.CurrentCulture.TextInfo.ToTitleCase(displayText)
    End Function
    
  7. Define the GetDisplayText(string uri) method to generate a user-friendly display text from a given URI. This method processes a given URI to generate a display-friendly text by extracting the domain name and URI segments, then using these parts to form a readable text. If any errors occur during the process, or if the generated display text is empty, the original URI is returned.
    CS
    Copy Code
    public string GetDisplayText(string uri)
    {
        try
        {
            //break URL into segments
            var uriObject = new Uri(uri);
            var words = new List<string>() { ExtractDomainName(uriObject) };
            words.AddRange(uriObject.Segments);
    
            //send these segments to get the choosen word for display
            var displayText = ChooseDisplayText(words);
    
            //return display text
            return string.IsNullOrEmpty(displayText) ? uri : displayText;
        }
        catch
        {
            //In case of errors, return whole URL
            return uri;
        }
    }
    
    VB
    Copy Code
    Public Function IHyperlinkParser_GetDisplayText(ByVal uri As String) As String Implements IHyperlinkParser.GetDisplayText
        Try
            'break URL into segments
            Dim uriObject = New Uri(uri)
            Dim words As New List(Of String) From {ExtractDomainName(uriObject)}
    
            words.AddRange(uriObject.Segments)
            'send these segments to get the choosen word for display
            Dim displayText = ChooseDisplayText(words)
            'return display text
            Return If(String.IsNullOrEmpty(displayText), uri, displayText)
            'TODO: Warning!!!, inline IF is not supported ?
            'TODO: Warning!!!! NULL EXPRESSION DETECTED...
        Catch
            'In case of errors, return whole URL
            Return uri
        End Try
    
    End Function
    

Set RichTextBox for Hyperlinks Conversion

  1. In the MainWindow.xaml.cs file, add the following code in the Click event of the Button control to paste some text in the RichTextBox control from Resources.resx file:
    CS
    Copy Code
    private async void btnPasteDoc_Click(object sender, RoutedEventArgs e)
    {
        //clear both textboxes
        smartRichTextBox.Text = string.Empty;
        //set document to clipboard
        var manager = new ResourceManager(@"SmartRichTextBox_NET48.Resources", Assembly.GetExecutingAssembly());
        Clipboard.SetText(manager.GetString("Document"), TextDataFormat.Text);
        await Task.Delay(50);
        smartRichTextBox.ClipboardPaste();
    }
    
    VB
    Copy Code
    Private Async Sub btnPasteDoc_Click(sender As Object, e As RoutedEventArgs)
        ' Clear both textboxes
        smartRichTextBox.Text = String.Empty
        ' Set document to clipboard
        Dim manager As New ResourceManager("SmartRichTextBox_NETFW_VB.Resourcesvb", Assembly.GetExecutingAssembly())
        Clipboard.SetText(manager.GetString("Document"), TextDataFormat.Text)
        Await Task.Delay(50)
        smartRichTextBox.ClipboardPaste()
    End Sub
    
  2. Add the below code in the Checked event of the chkAutoConversion checkbox to display the textual links in the RichTextBox control in place of long URLs using the methods defined in the HyperlinkParser class. The methods are easily accessible using the object of the HyperlinkParser class.
    CS
    Copy Code
    private void chkAutoConversion_Checked(object sender, RoutedEventArgs e)
    {
        try
        {
            //Create parser object
            var hyperlinkParser = new HyperlinkParser();
    
            //Get pasted data from clipboard
            string text = Clipboard.GetText();
    
            //Converting hyperlinks into meaningful Text using methods of HyperlinkParser class
    if (!string.IsNullOrEmpty(text))
            {
                var links = hyperlinkParser.ExtractURLs(text).ToList();
                foreach (var link in links)
                {
                    var displayText = hyperlinkParser.GetDisplayText(link);
                    var anchor = $"<a href={link}>{displayText}</a>";
                    var pattern = $@"(^|\s){link}(\s|$)";
                    Regex rgx = new Regex(pattern, RegexOptions.Compiled);
                    text = rgx.Replace(text, $" {anchor} ");
                }
                Clipboard.SetData(DataFormats.Html, text);
                smartRichTextBox.Text = "";
                smartRichTextBox.ClipboardPaste();
            }
        }
        catch (Exception ex)
        {
            MessageBox.Show($"{ex.Message}{Environment.NewLine}{ex.StackTrace}");
        }
    }
    
    VB
    Copy Code
    Private Sub chkAutoConversion_Checked(sender As Object, e As RoutedEventArgs)
        Try
            'Create parser object
            Dim hyperlinkParser As New HyperlinkParser()
    
            'Get pasted data from clipboard
            Dim text As String = Clipboard.GetText()
    
            'Converting hyperlinks into meaningful Text using methods of HyperlinkParser class
        If Not String.IsNullOrEmpty(text) Then
                Dim links As List(Of String) = hyperlinkParser.IHyperlinkParser_ExtractURLs(text).ToList()
                For Each link As String In links
                    Dim displayText As String = hyperlinkParser.IHyperlinkParser_GetDisplayText(link)
                    Dim anchor As String = $"<a href={link}>{displayText}</a>"
                    Dim pattern As String = $"(^|\s){link}(\s|$)"
                    Dim rgx As New Regex(pattern, RegexOptions.Compiled)
                    text = rgx.Replace(text, $" {anchor} ")
                Next
                Clipboard.SetData(DataFormats.Html, text)
                smartRichTextBox.Text = ""
                smartRichTextBox.ClipboardPaste()
            End If
        Catch ex As Exception
            MessageBox.Show($"{ex.Message}{Environment.NewLine}{ex.StackTrace}")
        End Try
    
    
    End Sub
    
  3. Execute the application and click the button control to paste some text in the RichTextBox control. Then, select the checkbox to convert the URLs into meaningful text in the RichTextBox.

Transforming a standard RichTextBox into a smart RichTextBox can greatly enhance user interaction and content creation efficiency.