Thursday, January 06, 2005

BitTorrent, the slayer.

The first time I heard of bittorrent, some unthinkable months ago, I was grinning so broadly my grinning muscles hurt. It was as if some mythical demon that was unslayable, was slain magically by the words that I was reading. How rude of me! Let me introduce that demon to you.
Meet Dm.Redundant Packets (Dm = Demon ;))
Mr. Redundant Packets was a mythical kind since he was a demon only in my head. I saw him whenever a server has to send the same data packets to multiple hosts, even if the hosts were requesting the packets simulataneously. The demon became really stank when a server buckled trying to send the same file to too many hosts. It was quite horrifying to watch all this, but since I could think of no solutions, I was but a mute spectator.

A lecture that introduced me to "Multi-casting", a form of "Broadcasting" but within a predefined set of exclusive hosts. These are exclusive hosts since in order to be included in a multicast group, you need to pay some money and register your "group" under a multicast IP address. This is because, as opposed to broadcasting where every Jack connected to the sender should be sent a copy of the same packet, in multicast only one copy of the data is sent to the multicast group. The multicast IP address (Class D : 224.0.0.0 to 239.255.255.255) is needed to identify the group since the multicast happens at the router level. Thoug my level of knowledge stops here with multicast servers and MBones, I knew enough to belive that the demon wasn't dead, yet.

Joining multicast groups is costly and this is something which can't be done every time a server wants to send data to multiple temporary hosts.

Enter BitTorrent.

I was reading about bittorrent some months back and realized its potential as a new distribution system. Though I always feel this way with every new P2P system, mainly because P2P architectures excite me, it was different this time. That is because bittorrent smoothes one major inevitablity in P2P systems: freeloaders. Simply put, FreeLoaders are those who downloads but not shares. P2P systems adapted to eliminate this problem. Kazaa implemented a "Participation Level" counter which says to other hosts of the number of files being shared by the user. It worked somewhat, but was not enough.

BitTorrent follows the "Give And Ye Shall Receive" mantra. What BT does is, it splits files into samll equal sized pieces and lets people download the pieces from other people who have already downloaded it. This way, the original "Swarm" doesn't have to upload more than that file's size worth of it. To ensure that the pieces were legitemate, BT uses SHA1 hashing to hash each piece into a 160 bit (20Byte) hash string. This and other meta information are present in the Torrent file which is made available in the public domain. The actual file itself is spread over the world, literally. If anyone is downloading the file in question, you are sure to get it because by downloading, you ensure that future prospectors of the file will get it. But he will get it as long as someone is seeding it and/or downloading it.

Unfortuantely, this property of BitTorrent means that, only popular content remain downloadable via BitTorrent. If the original source of the file exists, it has to be found and downloaded in a client-server like manner.

But fortunately, there are many applications that follow this eccicentric business model. Television shows, blog posts, podcasts, CVS or bleeding edge codes to name some off the top of my mind. Content distribution via BitTorrent is not only fast and cheap but also far reaching. Let me quote one article in Wired that I read recently:

Evidence that Burnham's prediction is coming true came a few weeks before the US presidential election in November, when Jon Stewart - host of Comedy Central's irreverent The Daily Show - made a now-famous appearance on CNN's Crossfire. Stewart attacked the hosts, Paul Begala and Tucker Carlson, calling them political puppets. "What you do is partisan hackery," he said, just before he called Carlson "a dick." Amusing enough, but what happened next was more remarkable. Delighted fans immediately ripped the segment and posted it online as a torrent. Word of Stewart's smackdown spread rapidly through the blogs, and within a day at least 4,000 servers were hosting the clip. One host reported having, at any given time, more than a hundred peers swapping and downloading the file. No one knows exactly how many people got the clip through BitTorrent, but this kind of traffic on the very first day suggests a number in the hundreds of thousands - and probably much higher. Another 2.3 million people streamed it from iFilm.com over the next few weeks. By contrast, CNN's audience for Crossfire was only 867,000. Three times as many people saw Stewart's appearance online as on CNN itself.

This method of broadcasting a content via BitTorrent, called PeerCasting, is being regarded as a nemesis by the distribution companies. Why? Imagine television with advertisements ripped-off, unnecessary scenes and credits cut-off and nil cost of distributing via huge TV antennas and satellite dishes. That is the nightmare distribution companies have when they want to dream about their future. But what does that mean for the movie/Television industry of the future? More self-produced shows, techie science shows that were super-ceded by fashion pageantries and such shows may increase and be sustained by blogs and emails who patronize them. These will, ofcourse, exist side-by-side oridinary sequential television for some time atleast. Maybe some other technique will take the throne?

I along with Sriram krishnan and Balakrishnan, who are my class-mates and close-friends, are currently working on integrating BitTorrent distribution model into blogging websites. By observing how blog servers were buckling under the current text blog load, we felt that the bandwidth required to serve hundreds of podcasts and video casts, in the future, will drive the blog servers out of money and out of the picture. We wanted to try fitting BitTorrent like tracker server transparently without modifying the existing blogging platform. Kinda like what Coral did. Seems like not many attempted this but rather have been successful in replacing servers with BitTorrent trackers. The trouble we thought we would face if we attampted that was that nobody will replace their servers just because a few undergrads asked them to. BTW, the project is called "Smoke" and since a better and in-depth description of it is available at Sriram Krishnan's website, I shall refrain from repeating it.


PS: During the project proposal phase of our smoke project, I spent a full five minutes writing one sentence to my fellow mates on, as it turned out, a varied topics. Check it out:

Don't bother printing, even if you have already done so, as I have changed the definition of "slashdotting" and changed the constant values and have printed it out, which is because of my lateness in answering to your late response to my earlier request to type out this project proposal as an adaptation of sriram's description of our supposedly finalized final year project involving two of my favourite topics: BitTorrent and the web, which I suggest are my favourite topics because of their simplicity and effectiveness and, last but not in anyway the least, their inherent beauty of which I am captivated the most, I should say, as will be expressed in my next blog post titled"BitTorrent, the Slayer", named because it (bittorrent) hath slain the demon of a problem namely "redundant packets" that any server running the currently old Client-Server architecture can experience due to its inherent nature of not using the client's cache and the originally small files for which the client-server architecture was developed forand which it served well and, hopefully, will serve well for quite some time into the future, that is, until our or rather a BitTorrent like architecture, supercedes it - an event I sure wish I was around to spectate - an event I am sure will happen never minding what people might tell you just because there are softwares that only support client-server models around and because companies know how to compute Cost-Benefit which will show that the company will be better off adopting the BT distribution model and eliminate nasty bandwidth requirements and slashdotting effects than wastefully serving the same file simultaneously to multiple clients, while that bandwidth could have been used, say, to launch multiple new servers all running the BT-like distribution model, which excites me so much since we are too developing one such solution which might get popular if it is really revolutionary in such a way as this seemingly endless, and not to mention surprisingly techie, post that started out as a simple acknowledgement post and ended(not quite) as a "dotless" , some what techy and hence not just a non-sensical rant usually just created for the sake of typing out such a monstrosity of a sentence which no one in their right minds would try to follow after the first few lines which indicate the shameful intent of the author.

So long (and thanks for all the files)!

1 comment:

Anand kumar said...

Just for kicks, the long sentence had:

Words = 394

Characters
(without space)= 1977

Characters = 2372
(with spaces)

Lines = 30

AFAIK, there were no spelling or grammatical mistakes.