Posted: 2005-11-05 18:53
Jason Fried asks Would you pay $5/month for Google if it wasn't free?
Dave Winer says Clone the Google API.
I'd like to join those together and ask Would you pay $80/month to own a part of the infrastructure that powered a free implementation of the Google API? In other words, rent a dedicated (low-end) server from the likes of Layered Technologies, or EV1Servers, and use it to run a distributed part of the crawling, indexing, or querying part of a full-on search engine.
Why would this be a good thing?
It seems like the right thing to do — sort of like joining the EFF.
No-one really owns the Internet. We all (mostly) pay for little parts of the internet infrastructure, but what we're really paying for is simply end-to-end transport of IP packets. I'd like search infrastructure to become like that — something (those who need it) pay their bit for, and that we can all use as a basic commodity.
The next generation of tools to help us do our work (and play); to help us manage information overload; are going to need to be heavily search based (in an automated, behind-the-scenes, kind of way). That infrastructure needs to be open and free to use in the same way that IP transport is now.
What would this take?
An open specification for how individual nodes in the system talk. Individual nodes need to come and go as they please — if I suddenly can't afford to use so many of my resources on search, then the redundant index fragment that my machine(s) build need to slip away quietly, without disrupting the behaviour of the the whole system.
Open-source crawling, indexing, and querying code that can work within the above constraints.
Focus on providing an API, from the beginning.
The collective will (and time) to make this work (ouch!).
What could go wrong?
Getting organized. Starting small and being open to change seems to be the only way to get anything off the ground.
Freeloaders. I don't think this would be such a problem — anyone with a massive query volume would probably also feel their own pressure to contribute proportionally.
Spamming.
I know things like Grub have tried something like this in the past, but I think the need is here more than ever, now.
So would I pay?
If it's $5 for the web-UI for Google, then while others are free, maybe not.
If it's for an (almost) unfettered Google API (from Google), then probably.
If it's to be part of a universal search infrastructure, then of course :-)
So would you pay, or contribute?
Posted: 2004-11-22 06:23
I've just spent the afternoon trying to get a simple TCP connection established.
The story is that I was configuring Nagios to check some HTTP services that are behind a Cisco PIX firewall. Nagios kept on getting timeout errors — when I tried by hand with telnet, I was getting the same problem, but only from the machine running Nagios. After setting up some logging on the firewall, I could see lines like:
%PIX-6-106015: Deny TCP (no connection) from a.b.c.d/xxxx to e.f.g.h/yyyy flags SYN on interface outside
According to the Cisco documentation, this means that a packet with the flags shown (in this case SYN) has arrived, that doesn't match an existing connection. But hold on — the SYN flag should be set in the first packet of a TCP connection, so by definition there is no connection yet.
After lots of packet sniffing with tcpdump, and examining the dumps with the marvelous Ethereal, I spotted that SYN packets logged on the machine that couldn't connect, also had bits called ECN and CRW set, as well as SYN. Whereas packets from a working machine only had SYN set.
After that, the mystery was quickly solved — it turns out that a significant number of TCP stacks out there either drop or send RST for SYN packets that have the ECN bit set. The version of the software on my PIX must be one of them.
Although the ideal fix would be to update the software on the PIX, it's half-a-planet away, and I'm loath to mess with it too much. The quick fix was to add the following line to /etc/sysctl.conf on my Debian machine that was having problems:
net.ipv4.tcp_ecn = 0
Now everything's fine, but it makes me wonder what other weird network behaviour can be explained by this.
(See http://www.icir.org/floyd/ecnProblems.html for more details on the problem)
Posted: 2004-11-05 04:06
I'm probably the last person to see all these new North American political maps floating around the Web. But, just in case, here are three from: Richard Friedman, Mena Trott, and Tim Bray.
If Australia's going to be aligning itself closely with the U.S. (it is, and it does), then perhaps we should be joining the Federation — I'm sure they'd accept representation on the western rim of the Pacific :-).
Posted: 2004-08-02 09:45
It turned into more of a political weekend than I'm accustomed to.
First, my friend and former colleague, Bill Malkin, wrote asking if he could borrow my site design for a website he was setting up in support of Brian Deegan. Brian is contesting the Adelaide Hills seat of Mayo, against the incumbent member, federal minister Alexander Downer, in the upcoming federal election.
Then, on Sunday evening, Theen had organized for us to see Michael Moore's Farenheit 9/11. What can I say — what a web of lies; fear, uncertainty and doubt; and corruption. The lies I'd heard of, the FUD I knew about, but the corruption was largely new to me.
Still mulling over the film, I came home, and what should be waiting for me in my RSS reader, but Tim Bray's pointer to an all-out summary of what's wrong with the Bush administration, written by former President Reagan's son, Ron. Read it.
Oh, and while we're on the subject of US politics and foreign policy, read another Australian, Jonathon Delacour, on Patriotism and the martial state.
Posted: 2004-07-17 16:04
Tim Bray recently likened MP3 encoding of music to Vandalism, and again made the point in more detail when discussing a New York Times article about online music purchasing.
While I agree that 128Kbps MP3s are not really up to scratch, it's worth noting that at least one online music vendor, Magnatune, lets you download your music in a number of different formats, including two lossless formats, and a couple of higher-than-usual-quality lossy formats. More should follow their lead, as I've not been happy with the quality of files from another online music vendor.
I first became aware of Magnatune from an article in Linux Journal, by Magnatune's John Buckman, describing their infrastructure and philosophy. The article is now available from the Linux Journal site. Well worth a read if you're the music-buying geek kind.
Posted: 2004-03-03 18:55
In his now-famous, A Plan for Spam, Paul Graham wrote:
... the spam of the future will probably look something like this:
Hey there. Thought you should check out the following:
http://www.27meg.com/foo
because that is about as much sales pitch as content-based filtering will leave the spammer room to make. (Indeed, it will be hard even to get this past filters, because if everything else in the email is neutral, the spam probability will hinge on the url, and it will take some effort to make that look neutral.)
Well I guess the future is here, now.
A lot of the spam I get looks a lot like this, and now it looks as though spammers have started to use wildcard DNS, to try and make their URLs look more 'neutral' (or at least different for each message).
Maybe it's been going on for a while, but I just noticed a URL that looked a bit like:
http://cementarlene.coerciblewade.tabulawhereby.not-real-domain.info/
Sure enough, any combination of words prepended to not-real-domain.info, resolves to the same IP address.
I suppose, now that Internet Explorer doesn't support usernames and passwords in URLs, it's the next best thing for filling your domain names with random or confusing words. I can see a few future phishing expeditions based on it too.