Critical Section


Outbound Trackbacks

Friday,  05/02/03  02:44 PM

For the web nerds among you...  (yeah, you!)

I implemented "outbound trackbacks" today.  Essentially a trackback is a way to tell someone: "hey, I linked to your site".  To post a trackback to somebody their site has to support "inbound trackbacks".  This is not yet a widespread feature; I discovered that since the start of the year I've made 1188 links to other sites, of which 28 were trackback-enabled.  Hardly seems worth it, except that I'm sure this will become more popular over time.

I'm still deciding whether to implement "inbound trackbacks".  This would allow me to know when someone has linked to me, but only if they have a trackback-enabled site.  I think for now I'm going to keep looking through my referer logs instead...  Not only does this cover every inbound link (including those from non-trackback-enabled sites), but it tells me when the link was used, which is actually a little more interesting than whether it exists.

Trackbacks are pretty simple; the concept was developed by the folks at Movable Type (a popular blogging tool), and the specification is on their site.  My implementation was to write a script which will run once a day and process all new posts and articles.  For each link in each post, the script retrieves the linked-to page and looks for RDF information in the page which describes the trackback.  (If there isn't any the site isn't trackback enabled, and you're done.)  If there is a trackback URL, you make an HTTP POST to it giving your URL, your site name, and an optional excerpt (there's a good example in the spec).  That's it.

The most interesting part of the script creates a reasonable "excerpt":

grep "$url" $file |
sed "s/<[^>]*>//g;s/&amp;/\&/g;s/&lt;/\</g;s/&gt;/\>/g" |
cut -c1-252 |
sed "s/\\$/%24/g;s/&/%26/g;s/+/%2B/g;s/=/%3D/g;s/\?/%3F/g;s/ /+/g" |
sed "s/+[^+]*\$//;;s/.\$/&.../\"

Yeah, I know, nerdy.  The grep gets the paragraph containing the link.  The first sed converts the HTML into text, throwing away tags.  The cut truncates the excerpt at 252 characters.  The second sed URL-encodes the excerpt, and the final sed appends a "..." to the end.  Voila.

If all sites were trackback-enabled in both directions, it would have the effect of making all links two-way; for any page you would know all the links to it, from all over the web.  I doubt this will ever happen; for one thing the information is not always useful and could be huge (imagine all the inbound links to the Google home page, for example).  But it is a cool thing in the blogosphere, and I expect all the popular blogging tools will support it...

Home
Archive
flight
About Me
W=UH
Email
RSS   OPML

Greatest Hits
Correlation vs. Causality
The Tyranny of Email
Unnatural Selection
Lying
Aperio's Mission = Automating Pathology
On Blame
Try, or Try Not
Books and Wine
Emergent Properties
God and Beauty
Moving Mount Fuji The Nest Rock 'n Roll
IQ and Populations
Are You a Bright?
Adding Value
Confidence
The Joy of Craftsmanship
The Emperor's New Code
Toy Story
The Return of the King
Religion vs IQ
In the Wet
the big day
solving bongard problems
visiting Titan
unintelligent design
the nuclear option
estimating in meatspace
second gear
On the Persistence of Bad Design...
Texas chili cookoff
almost famous design and stochastic debugging
may I take your order?
universal healthcare
entertainment
triple double
New Yorker covers
Death Rider! (da da dum)
how did I get here (Mt.Whitney)?
the Law of Significance
Holiday Inn
Daniel Jacoby's photographs
the first bird
Gödel Escher Bach: Birthday Cantatatata
Father's Day (in pictures)
your cat for my car
Jobsnotes of note
world population map
no joy in Baker
vote smart
exact nonsense
introducing eyesFinder
resolved
to space
notebooks
where are the desktop apps?