Start and End Index of SelectedText

Posted by: gsnow on 18 June 2025, 8:18 pm EST

Please login to follow topic

gsnow
- Post Options:
- Link
  Copy
Posted 18 June 2025, 8:18 pm EST

How do you get the start and end index (char) of the selected text in the pdf document?
prabhat.sharma
- Post Options:
- Link
  Copy
Posted 20 June 2025, 1:19 am EST - Updated 20 June 2025, 1:24 am EST
Hi,

In DsPdfViewer, the concept of start and end character indices does not directly apply like it would in plain-text editors, because PDF documents are not stored as linear text. Instead, selected text is based on visual regions (bounding rectangles) rather than index positions.

However, it is possible to extract the visual region of the selected text using the viewer.selectionCopier object.

<button onclick="getPDFSelection()">Get Selection</button> <div id="host"></div> <script> var viewer = new DsPdfViewer("#host"); viewer.addDefaultPanels(); function getPDFSelection() { console.log(viewer.getSelectedText()); // Get the selected text console.log(viewer.selectionCopier); // Get the selected text information } </script>

So while you can’t get character-level start/end indices, you can identify exactly where the selection is visually on the PDF, which is how DsPdfViewer internally represents and processes selections.

You can further refer to the attached sample that uses the above code snippet and gets the bounding rectangles containing the selected text (see below).

Please let us know if you need any further guidance. If you could share more about your use case, we can provide further assistance tailored to your requirements.

Regards,

Prabhat Sharma.

TextSelection.zip
gsnow
- Post Options:
- Link
  Copy
Posted 23 June 2025, 5:12 pm EST

Many thanks for your feedback. I was hoping to store the start and end index of the highlighted text so I could use the “highlightTextSegment” method to re-highlight text that had been previously stored. I want to store these highlighted text segments outside of the document. It is much more cumbersome to store all the bounding rectangles.
chirag.gupta
- Post Options:
- Link
  Copy
Posted 24 June 2025, 4:10 am EST - Updated 24 June 2025, 4:15 am EST
Hi,

Thanks for sharing your user story.

While DsPdfViewer does not provide a direct method to retrieve the start and end indices of selected text, it is possible to extract them using the following approach:

Listen to the textlayerready event and store the page index along with the full page text.

When the user selects text, match the selection against the stored text.

Determine and store the start and end indices of the selection within the page text.

Use these indices with the highlightTextSegment() method to highlight the text accurately.

Please refer to the attached sample, which demonstrates this approach and highlights the selected text as expected (see below).

Please note that this is a proof-of-concept sample. If the same text appears multiple times on a page, additional logic may be required to compare the selection rectangles and accurately identify the intended text segment.

Let us know if any further assistance is needed.

Kind Regards,

Chirag Gupta

Attachment: TextSelection.zip
gsnow
- Post Options:
- Link
  Copy
Posted 10 July 2025, 7:07 pm EST
Many thanks for your feedback! Easily the best I have received on any developer forum! I can’t thank you enough!!!

I have a couple more questions:

Is there any way to highlight text across pages, or do you have to highlight sections separately?

Is there any way to navigate to a position on a page (independent of the zoom level - e.g. at the start of some text?)
chirag.gupta
- Post Options:
- Link
  Copy
Posted 14 July 2025, 1:53 am EST
Hi,

We are delighted that the previous solution worked for your use case.

Regarding the follow-ups:

Highlighting text across pages - Currently, DsPdfViewer works at the individual page level, so to highlight a selection that spans multiple pages, you’ll need to break the full selection into separate segments per page and then call highlightTextSegment() for each segment individually, passing the correct page index and character range for that page.

Navigating to a specific position in the document (independent of zoom level) - Yes, you can scroll the viewer to a specific position using the loadAndScrollPageIntoView() method with the “XYZ” destination type. This allows you to scroll to a given coordinate on the page regardless of the current zoom. Below is a working sample that searches for a keyword in the PDF and scrolls to its location:

window.onload = function () { var viewer = new DsPdfViewer('#viewer', { restoreViewStateOnLoad: false }); viewer.addDefaultPanels(); var pdf = "/path/to/your/file.pdf"; loadPdf(viewer, pdf, "shorelines"); } async function loadPdf(viewer, pdf, searchText) { await viewer.open(pdf); var findOptions = { Text: searchText, MatchCase: true, WholeWord: true }; const searchIterator = await viewer.searcher.search(findOptions); const result = await searchIterator.next(); const item = result.value; // Use bounding box of the found text to calculate scroll position const x = item.ItemArea.left; const y = item.ItemArea.top + item.ItemArea.height; // Scroll to the text position with optional padding viewer.loadAndScrollPageIntoView(item.PageIndex, [ null, { name: 'XYZ' }, x - 10, y + 10, 1.0 // Zoom factor (1.0 = 100%) ]); }

You can adapt this to scroll to any previously stored location — for example, from saved highlight metadata — by storing the page index and bounding rectangle or offset coordinates.

References:

https://developer.mescius.com/document-solutions/javascript-pdf-viewer/demos/viewer-features/find-text/purejs

https://developer.mescius.com/document-solutions/javascript-pdf-viewer/demos/viewer-features/custom-highlights/purejs

https://developer.mescius.com/document-solutions/javascript-pdf-viewer/api/classes/DsPdfViewer#loadandscrollpageintoview

Please let us know if you require any further assistance.

Kind Regards,

Chirag Gupta
gsnow
- Post Options:
- Link
  Copy
Posted 11 November 2025, 6:30 pm EST

Thanks for your previous assistance. How do you use the highlight manager? There is no documentation on it. I want to store all the highlighted text (i.e. pageIndex, startIndex, endIndex) that the user has selected. Do I use highlightManager.textItems?
chirag.gupta
- Post Options:
- Link
  Copy
Posted 12 November 2025, 2:56 am EST
Hi Greg,

The highlightManager is listed in the public API of DsPdfViewer (see Reference 1). However, the ITextHighlightManager interface is not publicly documented, although its prototype and available methods can be inspected through the developer console.

If you prefer to use the highlightManager directly for adding highlights instead of the viewer’s highlightTextSegment method, you can do so as follows:

await viewer.highlightManager.highlightTextSegment( highlight.pageIndex, highlight.startIndex, highlight.endIndex, { color: 'rgba(255, 255, 0, 0.5)' } );

Please let us know if you require any further assistance.

Best Regards,

Chirag

References:

highlightManager: https://developer.mescius.com/document-solutions/javascript-pdf-viewer/api/classes/GcPdfViewer#highlightmanager
gsnow
- Post Options:
- Link
  Copy
Posted 12 November 2025, 4:25 pm EST

Thanks for the quick reply. I’ve seen that documentation already and I know how to use the highlightManager to highlight text. What I want to know is how to use it to return what has been highlighted in the pdf. Can you provide an example of how to iterate through the highlighted text items in the pdf, returning the pageIndex, startIndex, endIndex? Thanks!
chirag.gupta
- Post Options:
- Link
  Copy
Posted 13 November 2025, 4:21 am EST - Updated 13 November 2025, 4:26 am EST
Hi Greg,

Thanks for your question about retrieving highlight information from DsPdfViewer.

The highlightManager stores highlight data, but it doesn’t directly expose the start/end indices in a simple format. The highlights are stored with the text content and internal positioning data that needs to be converted to linear character indices.

I’ve created a complete working solution that demonstrates how to iterate through the highlightManager.highlights and calculate the pageIndex, startIndex, and endIndex for each highlight. The implementation:

Accesses viewer.highlightManager.highlights (organized by page)

Iterates through all highlighted text items

Maps the highlight text back to the page’s raw text to calculate precise start and end indices

Returns all three values (pageIndex, startIndex, endIndex) for each highlight

Please see the attached code sample and demonstration video showing the solution in action. The example includes a “Fetch Applied Highlights” button that retrieves and displays all highlight information with their indices.

Let me know if you have any questions about the implementation.

Best regards,

Chirag

Attachments: TextSelection.zip

Working:
gsnow
- Post Options:
- Link
  Copy
Posted 14 November 2025, 6:02 am EST

Amazing! This is the best support I have ever received for a software product. I’ll let you know if I have any problems.
gsnow
- Post Options:
- Link
  Copy
Posted 18 November 2025, 11:25 am EST

Is it possible to get the polygon of the selected text rather the the startIndex and endIndex?
chirag.gupta
- Post Options:
- Link
  Copy
Posted 18 November 2025, 10:43 pm EST - Updated 19 November 2025, 1:53 am EST

Hi Greg,

As discussed previously, PDF text highlights are internally represented using bounding rectangles. This makes it possible to retrieve the full polygon shape of a selected text based on the rectangle information.

Please refer to the attached code sample demonstrating how to extract polygon data from the selected text in DsPdfViewer.

Please let us know if you need any further assistance.

Kind Regards,

Chirag

Attachment: TextSelection.zip

Working:
gsnow
- Post Options:
- Link
  Copy
Posted 19 November 2025, 8:36 am EST

Thanks. How do you highlight using a set of polygon coordinates (not a rect)?
chirag.gupta
- Post Options:
- Link
  Copy
Posted 21 November 2025, 12:30 am EST

Hi Greg,

Apologies for the delay caused.

Currently, PDF text highlights are implemented as a collection of rectangular regions, commonly referred to as quads or quad points. Each rectangle corresponds to a contiguous block or line of text, and multiple rectangles together represent multi-line selections. This approach aligns with the official PDF specification and is supported by all major PDF viewers, including DsPdfViewer.

These rectangles are essentially four-vertex polygons that map directly to the bounding areas of the selected text. This ensures reliable and consistent rendering across different PDF viewers and platforms.

Highlighting text using arbitrary polygons with an unrestricted number of vertices is not supported by the PDF text highlight standard.

For this reason, the solution we provided uses the standard rectangle (quadPoints) approach, as it preserves text semantics and ensures predictable behavior across viewers.

Please let us know if you have any further questions.

Kind regards,

Chirag

Please login to reply to thread

Need extra support?

Upgrade your support plan and get personal unlimited phone support with our customer engagement team

Learn More

Forum Channels

ComponentOne

Forums for all current editions of the ComponentOne .NET UI control product line, including ComponentOne Studio and ComponentOne Studio for Xamarin.
ActiveReports

Forums for all versions of ActiveReports and ActiveReports Server
Spread

Forums for all current versions of Spread .NET spreadsheets, SpreadJS JavaScript spreadsheets, and SpreadCOM spreadsheets.
Wijmo

Forums for all Wijmo products, including Wijmo Core, FinancialChart, FlexSheet, MultiRow, OLAP, and ReportViewer
- General Discussion
Document Solutions

Forums for all Document Solutions products, including Document Solutions for PDF, Word, Excel (.NET and Java), and Imaging.

Start and End Index of SelectedText

Need extra support?

Forum Channels

ComponentOne

ActiveReports

Spread

Wijmo

Document Solutions