Start and End Index of SelectedText

Posted by: gsnow on 18 June 2025, 8:18 pm EST

    • Post Options:
    • Link

    Posted 18 June 2025, 8:18 pm EST

    How do you get the start and end index (char) of the selected text in the pdf document?

  • Posted 20 June 2025, 1:19 am EST - Updated 20 June 2025, 1:24 am EST

    Hi,

    In DsPdfViewer, the concept of start and end character indices does not directly apply like it would in plain-text editors, because PDF documents are not stored as linear text. Instead, selected text is based on visual regions (bounding rectangles) rather than index positions.

    However, it is possible to extract the visual region of the selected text using the viewer.selectionCopier object.

    <button onclick="getPDFSelection()">Get Selection</button>
    <div id="host"></div>
    <script>
        var viewer = new DsPdfViewer("#host");
        viewer.addDefaultPanels();
        function getPDFSelection() {
            console.log(viewer.getSelectedText()); // Get the selected text
            console.log(viewer.selectionCopier);   // Get the selected text information
        }
    </script>

    So while you can’t get character-level start/end indices, you can identify exactly where the selection is visually on the PDF, which is how DsPdfViewer internally represents and processes selections.

    You can further refer to the attached sample that uses the above code snippet and gets the bounding rectangles containing the selected text (see below).

    Please let us know if you need any further guidance. If you could share more about your use case, we can provide further assistance tailored to your requirements.

    Regards,

    Prabhat Sharma.

    TextSelection.zip

  • Posted 23 June 2025, 5:12 pm EST

    Many thanks for your feedback. I was hoping to store the start and end index of the highlighted text so I could use the “highlightTextSegment” method to re-highlight text that had been previously stored. I want to store these highlighted text segments outside of the document. It is much more cumbersome to store all the bounding rectangles.

  • Posted 24 June 2025, 4:10 am EST - Updated 24 June 2025, 4:15 am EST

    Hi,

    Thanks for sharing your user story.

    While DsPdfViewer does not provide a direct method to retrieve the start and end indices of selected text, it is possible to extract them using the following approach:

    1. Listen to the textlayerready event and store the page index along with the full page text.
    2. When the user selects text, match the selection against the stored text.
    3. Determine and store the start and end indices of the selection within the page text.
    4. Use these indices with the highlightTextSegment() method to highlight the text accurately.

    Please refer to the attached sample, which demonstrates this approach and highlights the selected text as expected (see below).

    Please note that this is a proof-of-concept sample. If the same text appears multiple times on a page, additional logic may be required to compare the selection rectangles and accurately identify the intended text segment.

    Let us know if any further assistance is needed.

    Kind Regards,

    Chirag Gupta

    Attachment: TextSelection.zip

  • Posted 10 July 2025, 7:07 pm EST

    Many thanks for your feedback! Easily the best I have received on any developer forum! I can’t thank you enough!!!

    I have a couple more questions:

    • Is there any way to highlight text across pages, or do you have to highlight sections separately?
    • Is there any way to navigate to a position on a page (independent of the zoom level - e.g. at the start of some text?)
  • Posted 14 July 2025, 1:53 am EST

    Hi,

    We are delighted that the previous solution worked for your use case.

    Regarding the follow-ups:

    1. Highlighting text across pages - Currently, DsPdfViewer works at the individual page level, so to highlight a selection that spans multiple pages, you’ll need to break the full selection into separate segments per page and then call highlightTextSegment() for each segment individually, passing the correct page index and character range for that page.
    2. Navigating to a specific position in the document (independent of zoom level) - Yes, you can scroll the viewer to a specific position using the loadAndScrollPageIntoView() method with the “XYZ” destination type. This allows you to scroll to a given coordinate on the page regardless of the current zoom. Below is a working sample that searches for a keyword in the PDF and scrolls to its location:
    window.onload = function () {
        var viewer = new DsPdfViewer('#viewer', { restoreViewStateOnLoad: false });
        viewer.addDefaultPanels();
    
        var pdf = "/path/to/your/file.pdf";
        loadPdf(viewer, pdf, "shorelines");
    }
    
    async function loadPdf(viewer, pdf, searchText) {
        await viewer.open(pdf);
    
        var findOptions = {
            Text: searchText,
            MatchCase: true,
            WholeWord: true
        };
    
        const searchIterator = await viewer.searcher.search(findOptions);
        const result = await searchIterator.next();
        const item = result.value;
    
        // Use bounding box of the found text to calculate scroll position
        const x = item.ItemArea.left;
        const y = item.ItemArea.top + item.ItemArea.height;
    
        // Scroll to the text position with optional padding
        viewer.loadAndScrollPageIntoView(item.PageIndex, [
            null,
            { name: 'XYZ' },
            x - 10,
            y + 10,
            1.0 // Zoom factor (1.0 = 100%)
        ]);
    }

    You can adapt this to scroll to any previously stored location — for example, from saved highlight metadata — by storing the page index and bounding rectangle or offset coordinates.

    References:

    1. https://developer.mescius.com/document-solutions/javascript-pdf-viewer/demos/viewer-features/find-text/purejs
    2. https://developer.mescius.com/document-solutions/javascript-pdf-viewer/demos/viewer-features/custom-highlights/purejs
    3. https://developer.mescius.com/document-solutions/javascript-pdf-viewer/api/classes/DsPdfViewer#loadandscrollpageintoview

    Please let us know if you require any further assistance.

    Kind Regards,

    Chirag Gupta

  • Posted 11 November 2025, 6:30 pm EST

    Thanks for your previous assistance. How do you use the highlight manager? There is no documentation on it. I want to store all the highlighted text (i.e. pageIndex, startIndex, endIndex) that the user has selected. Do I use highlightManager.textItems?

  • Posted 12 November 2025, 2:56 am EST

    Hi Greg,

    The highlightManager is listed in the public API of DsPdfViewer (see Reference 1). However, the ITextHighlightManager interface is not publicly documented, although its prototype and available methods can be inspected through the developer console.

    If you prefer to use the highlightManager directly for adding highlights instead of the viewer’s highlightTextSegment method, you can do so as follows:

    await viewer.highlightManager.highlightTextSegment(
        highlight.pageIndex,
        highlight.startIndex,
        highlight.endIndex,
        { color: 'rgba(255, 255, 0, 0.5)' }
    );

    Please let us know if you require any further assistance.

    Best Regards,

    Chirag

    References:

    1. highlightManager: https://developer.mescius.com/document-solutions/javascript-pdf-viewer/api/classes/GcPdfViewer#highlightmanager
  • Posted 12 November 2025, 4:25 pm EST

    Thanks for the quick reply. I’ve seen that documentation already and I know how to use the highlightManager to highlight text. What I want to know is how to use it to return what has been highlighted in the pdf. Can you provide an example of how to iterate through the highlighted text items in the pdf, returning the pageIndex, startIndex, endIndex? Thanks!

  • Posted 13 November 2025, 4:21 am EST - Updated 13 November 2025, 4:26 am EST

    Hi Greg,

    Thanks for your question about retrieving highlight information from DsPdfViewer.

    The highlightManager stores highlight data, but it doesn’t directly expose the start/end indices in a simple format. The highlights are stored with the text content and internal positioning data that needs to be converted to linear character indices.

    I’ve created a complete working solution that demonstrates how to iterate through the highlightManager.highlights and calculate the pageIndex, startIndex, and endIndex for each highlight. The implementation:

    1. Accesses viewer.highlightManager.highlights (organized by page)
    2. Iterates through all highlighted text items
    3. Maps the highlight text back to the page’s raw text to calculate precise start and end indices
    4. Returns all three values (pageIndex, startIndex, endIndex) for each highlight

    Please see the attached code sample and demonstration video showing the solution in action. The example includes a “Fetch Applied Highlights” button that retrieves and displays all highlight information with their indices.

    Let me know if you have any questions about the implementation.

    Best regards,

    Chirag

    Attachments: TextSelection.zip

    Working:

  • Posted 14 November 2025, 6:02 am EST

    Amazing! This is the best support I have ever received for a software product. I’ll let you know if I have any problems.

  • Posted 18 November 2025, 11:25 am EST

    Is it possible to get the polygon of the selected text rather the the startIndex and endIndex?

  • Posted 18 November 2025, 10:43 pm EST - Updated 19 November 2025, 1:53 am EST

    Hi Greg,

    As discussed previously, PDF text highlights are internally represented using bounding rectangles. This makes it possible to retrieve the full polygon shape of a selected text based on the rectangle information.

    Please refer to the attached code sample demonstrating how to extract polygon data from the selected text in DsPdfViewer.

    Please let us know if you need any further assistance.

    Kind Regards,

    Chirag

    Attachment: TextSelection.zip

    Working:

  • Posted 19 November 2025, 8:36 am EST

    Thanks. How do you highlight using a set of polygon coordinates (not a rect)?

  • Posted 21 November 2025, 12:30 am EST

    Hi Greg,

    Apologies for the delay caused.

    Currently, PDF text highlights are implemented as a collection of rectangular regions, commonly referred to as quads or quad points. Each rectangle corresponds to a contiguous block or line of text, and multiple rectangles together represent multi-line selections. This approach aligns with the official PDF specification and is supported by all major PDF viewers, including DsPdfViewer.

    These rectangles are essentially four-vertex polygons that map directly to the bounding areas of the selected text. This ensures reliable and consistent rendering across different PDF viewers and platforms.

    Highlighting text using arbitrary polygons with an unrestricted number of vertices is not supported by the PDF text highlight standard.

    For this reason, the solution we provided uses the standard rectangle (quadPoints) approach, as it preserves text semantics and ensures predictable behavior across viewers.

    Please let us know if you have any further questions.

    Kind regards,

    Chirag

Need extra support?

Upgrade your support plan and get personal unlimited phone support with our customer engagement team

Learn More

Forum Channels