Posted 16 July 2024, 2:10 am EST
In version 657 there is a serious difference in the text handling of PDF.
I search for occurrences in a certain order in standard documents.
Old (e.g. 631):
BeforeCompany: MyCompany
Header
Before9/999999: 9/999999 some other Text
Header2 ColHeader1 ColHeader2
Line1Col1 Line1Col2
Line2Col1 Line2Col2
New (657):
MyCompany Address 9999999 9/999999 Header1 Line1Col1 Line2Col1
Header BeforeCompany: Before9999999 Before9/999999: Header2
How can I restore the old behavior in the new version?
SearchString = "Before9/999999:"
Using mc As C1.Win.C1Document.Util.C1DXTextMeasurementContext = New C1.Win.C1Document.Util.C1DXTextMeasurementContext()
Dim pdfLines As New List(Of String)
Dim dr As C1DocumentRange = C1PDF_DS.GetWholeDocumentRange(mc)
pdfLines.AddRange(dr.GetText().Split(Environment.NewLine))
filteredPDFLines = pdfLines.Where(Function(line) line.Contains(Searchstring)).ToList
End Using
In the next step, I search for 9/999999 in the lines found, but can no longer find it there:
For i = 0 To n - 1
Dim fp As C1FoundPosition = _textSearchManager.FoundPositions(i)
For Each m As Match In regex.Matches(filteredPDFLines(i))
If ItemList.ContainsKey(m.Value) Then
If Not ItemList.Item(m.Value).Contains(fp.GetPage().PageNo) Then
ItemList.Item(m.Value).Add(fp.GetPage().PageNo)
End If
Else
Dim PageList As New List(Of Integer)
PageList.Add(fp.GetPage().PageNo)
ItemList.Add(m.Value, PageList)
End If
Next
Next
Because the line that has now been found looks like this:
Header BeforeCompany: Before9999999 Before9/999999: Header2
instead of as before:
Before9/999999: 9/999999 some other Text