The main purpose of an Ultralink Server is to perform analysis on fragments and figure out which Ultralinks in its databases match textual strings in the fragment (if any). It packages up that result set and returns it to the client. But how does that process work exactly?
Fragment Content Preparation
When submitting a fragment to an Ultralink Server for examination, the most important information you pass up is obviously the content you want analyzed. The server looks at this raw textual content as is, whether it is filled with HTML, any sort of markup, bare text or whatever it may be. The server treats the input as plain text and will analyze everything (including HTML tags, attributes and other things not normally considered content). Thus, there is a responsibility on the part of the client for the format of the content it sends up in order to be most effective.
For instance, ultralink.js works in web-based contexts and deals primarily with HTML. Once it has identified what parts of the web page it is going to break into fragments (see Content Targeting), it then generates text from those DOM elements. But it does some preparation of that content before it sends it up to the server. There can be any number of strings inside the HTML elements themselves that could potentially match against Ultralinks which is generally undesirable in this use case. ultralink.js strips out all the HTML from the string, leaving just the bare content that is normally visible to the user. This keeps the size of the content manageble, which makes analysis quicker, and makes sure that only Ultralinks that match the user-consumable content are included in the result set.
Hyperlink Special Handling
Other than the raw content to be analyzed, clients can also pass up an array of hyperlinks to the server. If there are any Ultralinks which are associated with those submitted hyperlinks, or if any of the hyperlinks directly point to Ultralinks residing on the server, it will return a seperate structure in the result set with those Ultralinks. ultralink.js can use the upgradeHyperlinks option to enable this feature and 'upgrade' specific hyperlinks into Ultralinks. This allows content authors to easily and unambiguously mark the prescence and position of a specific Ultralink in their content.
Here is an example using a hyperlink to explicitely specify an Ultralink in content:
<p id="hyperlinkExample">I might want to refer to <b><a href="https://ultralink.me/link/3179413">my favorite movie</a></b> indirectly sometimes. Or perhaps I would want to make completely sure that I refer to the film <b><a href="http://www.otnemem.com">Memento</a></b> instead of a company named <b><a href="https://ultralink.me/link/1206737">Memento</a></b>. </p>
Notice in the html that before Ultralink upgrading, the first and third hyperlinks refer to specific Ultralinks in the Mainline Database of the ultralink.me server. It doesn't matter what the underlying text in those hyperlinks is, if the upgradeHyperlinks option is set to true then ultralink.js will convert them into the specified Ultralinks. One advantage to specifying Ultralinks in this way is that if there is a failure somewhere and the Ultralinks do not get added, the existing hyperlinks are still functional and relevant, providing a nice fallback.
The second hyperlink in the example isn't referring to a specific Ultralink, but it does give the Ultralink Server a strong hint as to which Ultralink should go there. It performs a query to get a list of all the Ultralinks in the database that are associated with that hyperlink and then compares the hyperlink text (in this case "Memento") and compares it against every word associated with each of those Ultralinks. The Ultralink with the closest match wins and is returned in the result set.
In many cases though, the Ultralink analysis does choose the correct Ultralinks for the content, even without the assistance of hyperlinks. This is especially true when the Ultralinks in the database have robust connection information associated with them to facilitate disambiguation.
Once a fragment's content has made it to an Ultralink Server, the analysis begins. The first thing that happens is a sanity check to make sure that the URL Hash matches the SHA1 of the provided URL argument as well as a check to ensure that the Content Hash matches the SHA1 of the provided fragment argument concatenated with the provided hyperlinks argument.
If the server's cache already has a result for this fragment's URL Hash/Content Hash pair, then it simply returns that result. If it is not in the cache, the code examines which Ultralink Database the fragment is intended to be washed through. If the database is set to be backed by another Ultralink Database, then it filters the fragment first through the specified database and then through the backed one. The results of both filtering processes are merged together and the backed database's results are given a lower priority so they can be overridden.
The process of filtering a fragment through an Ultralink Database begins with inspecting if there is a Page Ultralink defined for the URL the fragment resides in. If there isn't one explicitely defined, it looks to see if any Ultralinks in the database have URLs similar to the page URL. If so, then the Ultralink with the most similar URL becomes the Page Ultralink. Similarity is determined by partial matching from the start of the URL (example.com/sub/dir/page.html would match example.com/sub).
At this point, the server starts combing through the fragment text, character by character and finds the longest character strings in the fragment that match against word strings in the Ultralink Database. It takes into account case sensitivity settings on individual Ultralinks and tries to be smart about word borders. Matching is also determined by the database collation setting which (by default) means that é will match e etc.
Once it has found these matching strings, it starts to build a list of all the Ultralinks that match it and order them according to their relevancy in the context. This list of Ultralinks is part of the result that is returned to the client and used to overlay Ultralinks in the page. An Ultralink is only initially displayed if it's status entry in this list has a value of hit. Only the topmost Ultralink in the list is allowed to have a hit value and it is possible for the result list to not contain any hits at all. That scenario would mean that there are Ultralinks that could potentially match against the string but it has been determined that in this context it would be inappropriate to do so.
If an Ultralink's pageFeedback value for the URL that the fragment lives in is less than zero, then it is not allowed to have a hit value. The pageFeedback system is a mechanism for ensuring that the right Ultralinks come up when they are supposed to and the wrong Ultralinks don't, in the event that the algorithms make the wrong descision.
Another mechanism that can prevent Ultralinks from becoming the top hit Ultralink is the matched Word's commonalityThreshold value. Every matching Ultralink has two commonality values calculated for it. The first is a commonality value relating to current Page Ultralink, which is set to a value of 2 if the Ultralink in question and the Page Ultralink are directly connected to each other and is set to 1 if they have a second-degree connection with each other.
The second commonality value is calculated by adding all the connection values for every other potential Ultralink in the fragment (2 for every direct connection, 1 for every second-degree connection) and then adding 2 for every direct connection to other Ultralinks on the same page but not in the same fragment. Because it is possible that the rest of the page has not yet been analyzed, the influence of these common page Ultralinks can change from run to run. If the grand total of this commonality value is not larger than the commonalityThreshold, then this Ultralink is not allowed to be considered a hit. The only exception to this is if its connection value to the Page Ultralink is larger than zero.
In this way, Ultralink Words that might overly match, or incorrectly match in specific contexts can be reined in.
Now that all the Ultralinks that should not be appearing on the page have been stripped out, we can go about figuring out which of the remaining Ultralink hit candidates should actually be given the hit status. If there is only one Ultralink candidate left standing, then there is no contest. If there are multiple remaning Ultralinks, then they are each pitted against each other and run through these comparisons in this order:
- Higher pageFeedback value wins.
- If either of the Ultralinks happen to actually be the Page Ultralink, that Ultralink wins.
- Higher page connection value wins.
- Higher Primary Word value wins.
- Higher connection value wins.
- Lower Ultralink ID wins (This is a last resort. Older Ultralink wins).
The last Ultralink standing is granted the hit status. The Ultralink result list is then constructed with the hit Ultralink at the top and the order of the remaining Ultralinks being determined by the above comparison process (the other, losing Ultralinks' status fields contain an explanation of how it got put in its position on the list).
The result is a list that includes all the Ultralinks in the database that match against the string present in the fragment, in order from most contextually relevant to least contextually relevant.