Submitted by jimspoon on 2022/07/07 15:31

Pierre, I see you've been helping convox with getting Treepad data into IQ.  I have a colossal bookmark collection in Linkman and I've been trying to figure out how to get it into IQ.  Linkman is no longer developed.  Linkman has a number of export options but unfortunately no option to export to OPML or OML.  But the LMD data format seems quite simple.  The hierarchy of bookmarks is stored with tags as follows.  The lines are not indented with tabs and are terminated by CRLF.  Encoding is ANSI.

<node type=folder> (opens folder node)
<sta></sta> (a number, must mean folder status of some sort)
<node type=link> (opens link node)
<tit>Example Title</tit> (webpage title, plain text)
<url>http://www.example.com</url&gt; (web address, plain text)
<des>This text is a description.</des> (description, plain text)
<key>These are the keywords.</key> (keywords separated by commas, plain text)
<add>310722141221</add> (date and time this node added to database, in ddmmyyhhmmss format)
<mod></mod> (date and time this node last modified)
<vis></vis> (date and time this node last visited)
<che></che> (date and time this URL last checked)
<ipc></ipc> (a number, link info of some sort)
</node> (closes the link node)
</node> (closes the folder node)

Of course folders can be nested so you can see multiple consecutive </node> tags.

I want <tit> to go into Item, <url> to go into URL, <des> to go into a "Description" text field, <add> to go into a "Date Added" field.  <key> could go into a Keywords text field, or maybe I could make tags out of them - I haven't looked into that yet.

To preserve the folder/item hierarchy when importing into IQ, I'm thinking the best alternative is to transform this LMD text into OPML and use the IQ Hierarchical Data import.    I can figure out the required transformations - since OPML seems to represent folder hierarchy (i.e. outline level) with tab indents, I need to figure out just how to insert the right number of indents before each item.  I guess i can do that by incrementing the number of tabs after every <node> tag and decrementing it after every </node> tag.

But before I get to work on this, I was wondering if an easier solution occurs to you?  Thanks for any advice.

 

Comments

And the doc on using export template doesn't help you creating a compatible format ?

(seems their website is dead right now, but the Wayback Machine has versions of that page)

Well that's a good point.  I've worked with the export templates a lot, but only to generate TSV files which would not preserve the hierarchy.  Now looking at the template editor again, I see there is a <LEVEL> tag for folder nodes that I can try.  I suppose will put a numeric <LEVEL> tag into the export file.  Then I can use another tool (if necessary) to replace the <LEVEL> tags with the appropriate number of tab indents required by the OPML spec.  There is no <LEVEL> tag for the link (bookmark) nodes so I'll have to figure out how to get the required number of tabs before each bookmark, but I should be able to figure out a way.  Thanks for looking at it. 

Gosh ... I was looking at a sample OPML file that was full of tab indents ( http://hosting.opml.org/dave/spec/states.opml ) and imagined that was required to produce the hierarchy.  Now I see that the hierarchy results from the nesting of <outline><outline /> tags.  So maybe I can figure out how to do this in the export template.  Of course it makes sense that Linkman would make that possible.  Thanks again for looking at it - my dim light bulb is starting to brighen.  haha.  💡💡💡💡 

I made an Linkman export template to export to OPML - or rather a file I hoped would be recognized as a valid OPML file.  I used the IQ Hierarchical Data import and I'm happy to say that I finally got it to import a rather large set of items perfectly.  Not surprisingly it didn't look right until I refreshed the grid I imported into.   But the hierarchy of folders/bookmarks is perfectly preserved as items/subitems.  It didn't work until I limited the imported fields to Title, URL, and Date Added.  When I tried to import these fields along with comments, description, and keywords fields, I was only able to import a few hundred items before the import "completed".  I suspect that the OPML file with those additional fields was somehow defective, and this caused IQ to "complete" the import prematurely.  I tried some web-based OPML validators without much luck.  I'll try some local XML validators to see if they can find what's causing the problem. 

I think one problem might be that the fields exported by Linkman may have characters that have a special meaning in XML - like &, <, >, ", and this may prevent a successful import, so I'm working on that right now. For example if the the text inside XML element attribute has a " in it, that might screw things up e.g. <outline text="This text field has double quotes "" in it" url="http://www.example.com&quot; />

Some URLs have query strings in them with ampersands e.g. "http://www.google.com/search?&q=backlinks%20master&sourceid=mozilla-sea…; and this gets flagged by XML Validators. So far I've tried replacing the & with &amp; and also %24 but when I do that with the link above, the link doesn't work anymore. Trying to figure this out.

While double-quotes inside an attribute would seem to be a problem - I'd think that ampersands, <, and > wouldn't cause a problem - since being inside the attribute "", I'd think that an importer would realize they are literals and not XML special characters.

The exported fields in some items may include character that have a special meaning in XML - like ", <, >, & - so I'm trying to figure out if this it's what's causing problems with the import.  If an attribute like text="" has a " inside the enclosing quotes, that might prematurely terminate that attribute.  I wouldn't think that & < or > inside text="" or comment="" would cause a problem - the quotes indicate that those characters inside the quotes are literals, and not XML special characters, but maybe it is a problem.

The ampersands inside a URL query string e.g. "http://www.google.com/search?&q=backlinks%20master&sourceid=mozilla-sea…" do get flagged by XML validators.  I tried replacing them with &amp: and %24 in the above link but when I did that the link no longer worked.  Trying to figure that out.

Just for one example, this was the title of one one page that I bookmarked (including the quotes):

"holiday house" "ralph moreland" - Google Search

And that was saved in the Name field in Linkman.

In the OPML export file created using the Linkman Export Template that I made, this shows up as this (first part of the <outline> element):

<outline text=""holiday house" "ralph moreland" - Google Search" url="url goes here"

So the IQ importer sees an empty text attribute!  So from that empty text attribute, IQ imported nothing into the Item field. 

The empty text attribute is followed by: 

holiday house" "ralph moreland" - Google Search" url="url goes here"

The URL was properly imported, so it seems that IQ just ignored the extra text after the empty text attribute, and recognized url=" as the start of the url attribute.

Unfortunately the Linkman export templates don't seem to provide any mechanism to replace troublesome characters on the fly during the export to the OPML file.  So it seems I'll have to do a bulk search and replace operation either in Linkman itself, or using some tool on the exported OPML file like TextCrawler, and then perhaps reverse the bulk search and replace in Infoqube.

How do I ?