InfoQube IM - Community

Submitted by WayneK on 2024/05/07 09:52

When I paste text from an OCR'd pdf, I end up with a lot of ""s (soft hyphens in original code).

eg "Supreme Court" becomes "Sureme Court".

I'm trying to find a way to remove these without having to re-type each one.

InfoQube Find and Replace doesn't recognize them.

~~The only way I've found so far is to paste them into Word Pad to remove the formatting, but this replaces them with a dash, which still has to be removed.~~ [update: doesn't work - the symbols re-appear later]

I've looked at this before but have never been able to find a solution.

Wayne

Comments

Hi Wayne, I wasn't aware of …

Hi Wayne,

I wasn't aware of . I'm now removing this from copy paste operations. Also, it will be removed for all fields (except the ItemHTML field of course when doing a Repair). This will be available in the next version

HTH!

Pierre_Admin

Thanks. I'll try it again…

Thanks. I'll try it again as soon as the update is out.

Perfect. If you have other…

Perfect. If you have other tags / entities to filter out, now is the time to report them!

I can't think of another tag…

I can't think of another tag that causes consistent problems like this one does.

I do have a general problem with getting correct OCR'd text into InfoQube but I don't know of any other specific instances. It's just general problems interpreting imperfect text from old books and magazines, which causes a lot of manual correcting in InfoQube.

Wayne

v1.125.4 is now online and…

v1.125.4 is now online and should handle this better

I just now got back to this…

I just now got back to this kind of work after updating to 125.5. I tried it and it successfully deleted the unwanted code.

Thanks so much for doing that. It was a major annoyance in copying text into InfoQube.

Wayne

How do I ?

How to get rid of "&shy;"

Comments

Hi Wayne, I wasn't aware of …

Thanks. I'll try it again…

Perfect. If you have other…

I can't think of another tag…

v1.125.4 is now online and…

I just now got back to this…

How to get rid of ""