When I paste text from an OCR'd pdf, I end up with a lot of "­"s (soft hyphens in original code).
eg "Supreme Court" becomes "Su­reme Court".
I'm trying to find a way to remove these without having to re-type each one.
InfoQube Find and Replace doesn't recognize them.
The only way I've found so far is to paste them into Word Pad to remove the formatting, but this replaces them with a dash, which still has to be removed. [update: doesn't work - the symbols re-appear later]
I've looked at this before but have never been able to find a solution.
Wayne
How do I ?
Comments
Hi Wayne, I wasn't aware of …
Hi Wayne,
I wasn't aware of ­. I'm now removing this from copy paste operations. Also, it will be removed for all fields (except the ItemHTML field of course when doing a Repair). This will be available in the next version
HTH!
Pierre_Admin
Thanks. I'll try it again…
Thanks. I'll try it again as soon as the update is out.
Perfect. If you have other…
Perfect. If you have other tags / entities to filter out, now is the time to report them!
I can't think of another tag…
I can't think of another tag that causes consistent problems like this one does.
I do have a general problem with getting correct OCR'd text into InfoQube but I don't know of any other specific instances. It's just general problems interpreting imperfect text from old books and magazines, which causes a lot of manual correcting in InfoQube.
Wayne
v1.125.4 is now online and…
v1.125.4 is now online and should handle this better
I just now got back to this…
I just now got back to this kind of work after updating to 125.5. I tried it and it successfully deleted the unwanted code.
Thanks so much for doing that. It was a major annoyance in copying text into InfoQube.
Wayne