[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [syndication] site-wide metadata discovery
> Ok, supplementary questions :
> How did the agent 'hear' about the site?
In 'theory' the agent doing this sort of thing is bordering on the behavior that
many sites would consider bot or spider-like. Thus it would be expected to
already be looking at the robots.txt file. This would help stave off the 'make
it use the fewest number of requests' foolishness. They'd already have this
file. That and there seems like a pretty strong argument for making them honor
this file if they're not doing so now.
This is a subtle line. A reader program acting at a single user's request (as
in a desktop client) might arguably NOT be expected to query for robots.txt.
Browsers, depending on who you ask, aren't expected to be watching this file. A
reader picking up a feed and presenting it's content would, more or less, fall
into the same category.
But when we get into spidering or other discovery techniques we're crossing
outside that sort of boundary. So we're not being unreasonable thinking a
request for robots.txt should be happening.
> Where does the agent look first? (...and why isn't there a link tag there?)
There is valid resistance to having the link tag as the /only/ means to start
this process rolling. There doesn't seem to be any resistance from having the
link being present /when it's possible/ to add it. So yes, for sites that can
integrate a link tag they're more than welcome to do so (when we eventually
agree on the rel and type string contents).
This then turns to those sites that can affect this and raises the question of
whether they can affect the contents of the robots.txt file at the root of the
domain. We're essentially switching sides in the "we can't do that" argument.
We then turn to what sort of best practices are we going to encourage?
Where is someone 'starting' in this process? What are they getting now,
data-wise, from the site and what can we suggest being added to it? What impact
would that addition have on how that data is currently being handed?
As in, do they have the web page? Can we get to the HTML head section?
Do they have the RSS feed? Can we put a new element in it's template?
Can we put something in robots.txt? Are they reading it now? Will the add-ons
break it?
What resource impacts will these alternatives have?
All of them will 'cost' something. What costs are involved and will they be
greater than the real world will bear? Or can we offer such a compelling
added-value that they'll readily 'budget' for making this happen?
> >Thus they're saddled with the
> > burden of 'educating' the powers-that-be. We can all imagine why
> > this is a less-than-ideal way to engender their cooperation.
>
> Ok, that makes sense.
> Thanks for reminding me why I'm not doing sysadmin any more ;-)
Yes, we all bring our own valuable knowledge of how ideas get translated into
actions.
Some of the technical purity arguments fall completely apart when REALITY is
taken into account. This isn't faulting the technical merit or accuracy, just
presenting why it may not be readily adopted.
I had to slog through this same sort of battle with people on the idea of
in-band feed redirection so I'm pretty well versed on the full range of issues
involved here. This is much the same sort of argument. Yes, technical purity
would make this a hack, but reality dictates otherwise, once again. And, once
again, we're presented with a rush job hack that won't work.
Fortunately wiser minds are prevailing and actual research with dialog is taking
place.
-Bill Kearney
Syndic8.com