When Ultralinks are enabled, a sophisticated process kicks off which tries to figure out what Ultralinks to overlay in which places. The particulars of this process are different depending on whether you are using Ultralinks in the context of a web page or as an app within your operating system. In either case, the default behavior is to intelligently figure out for itself where the content of interest is.
In most cases, you shouldn't have do anything and it will just work. Sometimes though, it can make sense to specifically target content and override default behaviors. This page will help you understand how this process works and how to configure it for your needs.
When running Ultralinks in the context of a web page, you usually do not need or want to overlay Ultralinks in every part of the page. There is usually some central content which we intend to target for analysis and overlay. Once this content is found, it is broken up into paragraph size chunks called fragments. Analysis is performed on each fragment individually as opposed to the content in it's entirety. Per-fragment analysis enables:
- Better parallelism. The Ultralink Server can process multiple fragments at once.
- Lower latency analysis. The smaller the fragment, the quicker we get the result.
- Better prioritization. Fragments at the top of the page in view are analyzed before those lower on the page.
- Less cache invalidation risk. If even one character of a fragment changes, it is considered a different fragment and must be re-analyzed.
As much as possible, the Ultralink code tries to find a happy medium between the number of fragments and average fragment size to optimize for speed and responsiveness. The performance benchmark for Ultralinks is to be fully loaded and ready for use before the user ever hovers their mouse over an Ultralink in the page.
Outlined below are the steps it tries (in order) to chose the right selector for the page.
- If the scanSelector option has been specified, then it just uses that.
- If there are any elements in the page of the ultralink class, then use .ultralink.
- Look in the hardcodedSites option and see if there are any entires that match the current page. If so, then use the specified selector in that option.
- Run through a built-in list of selectors that match common internet editorial patterns. Calculate the number of matching elements, the number of visual pixels they occupy on the page and the number of paragraph tags contained. Examine the selectors that have a favorable combintation of those values and use the best one to construct a new selector that includes p, ul and dl elements in it. Despite the complexity, this step actually executes quite quickly.
- If a suitable selector has not been found at this point, use the current backupSelector which defaults to p.
Most of the time, this process produces nice results for common content found on the internet. For this specific page, it found that the div#content selector returned the most favorable content results for this page and so it constructed the final div#content p, div#content ul, div#content dl selector and used that to divide the page into seperate fragments.
In the cases when it might be better to explicitely specify the selector. You can do this any number of ways:
- Simply tell Ultralink exactly which selector to use with the scanSelector option.
- Use the hardcodedSites option to give it a set of sites to match against along with corresponding selectors. These values can either be passed in manually or managed and automatically delivered by the Ultralink Server.
- Use the scanningGuides option to surgically specify exactly what content on your page you want to select along with fragment-specific options for each of them.
If it makes sense, you can specify a custom selector for every page and enable it using the scanSelector option but this can quickly become very unwieldly. The hardcodedSites option allows you to pass in a set of regular expression/selector pairs. The current page address is evaluated against the regular expressions in this set and the first match dictates what selector will be used for fragment construction in the page.
You can either pass in these value sets by hand or use the Ultralink Dashboard to manage these value pairs on a per-database basis (this feature is currently in alpha and not yet available to the public).
Scanning guides allow for extremly granular overriding of Ultralink settings on a per-fragment or per-UI element basis. For instance, you might want one set of fragments in a web page to be filtered through a specific Ultralink Database and another set of fragments to be washed through a different one. Check out the Scanning Guides documentation for detail on how to use them.
Once the fragments for a page have been identified (or in the case of the Ultralink Windows app, once a fragment has been created for the targeted UI element), the next thing that happens is a URL is constructed for each fragment. The structure of that URL looks something like this:
|masterPath||The address of the Ultralink Server that will perform the analysis on the fragment.|
|URL Hash||A SHA1 hash of the page URL where the fragment resides.|
|Content Hash||A SHA1 hash based on the actual content of the fragment.|
|Database||(optional)||A postfix identifying the Ultralink Database to wash the fragment through. If not present, it assumes the Mainline Database.|
|The address of the Ultralink Server that will perform the analysis on the fragment.|
|A SHA1 hash of the page URL where the fragment resides.|
|A SHA1 hash based on the actual content of the fragment.|
|A postfix identifying the Ultralink Database to wash the fragment through. If not present, it assumes the Mainline Database.|
The URL Hash and Content Hash uniquely identify (within reason for this use case) this specific fragment inside this specific page. If even one character changes in either the page URL or the fragment content, then their respective hashes will change and it is considered to be a new and distinct fragment. All these parts come together to form a URL that references the results of this specific fragment washed through the specified database.
By performing a simple HTTP GET on this URL, clients like ultralink.js will recieve one of two kinds of results back. It will either receive a JSON object describing all the Ultralinks from the specified database present in the fragment along with a field called type with a value of hit. Or, it will receive a JSON object with the type field and a value of miss indicating that the Ultralink Server does not currently have the Ultralink result set for that fragment (essentially a cache miss).
In the case of a miss, ultralink.js will perform an HTTP POST to a similar URL on the Ultralink Server (it replaces /fragment/ with /fragmentFilter/) and passes the fragment content up to the Ultralink Server for analysis. The response to that POST command is a new JSON object containing the description of the Ultralinks present in the fragment. After that point, the originally constructed URL will now return that same result set when queried (unless the nostoreSites setting is specified which tells the Ultralink Server to not store the results).
The compact and identifying nature of the above URL format allows for a simple interface to fragment processing and is also condusive to URL level network caching. Ultralink Servers support high-level caching services like CloudFlare in addition to their own built-in result caching system. (Check out our blog post on using CloudFlare as a DB Read Cache)
Unless otherwise specified, the fragment content and the Ultralink result set are stored in a per-database cache on the Ultralink Server. This allows the server to return result sets extremely quickly for fragments that it has already seen and allows it to re-calculate old result sets when changes are made to the relevant Ultralink Database. This cache deletes old fragment content and result sets if they have not been accessed in the last 31 days.