MyXP

Saturday, May 21, 2005

To Smoke or to Smoke

Nah, I am not missing a "not" in the title of the post. This post is about the two huge ideas that went from smoke to smoke. Here's what happened:

After quite a bit of ramblin' over the topic for THE project that we are supposed to do at the end of under-graduate degree, we finally had come at a fork in the road.

The "we" was made of THE TWO guys of IT department in my college plus me. Sure, you can easily run a word-count on the words I spoke at college and still not hit anything more than a few thousands, but it seems whatever little I said have given me some recognition. The other two were vociferous when it comes to voicing anything. Sriram krishnan and Balak are the two best brains that you can have around if you are planning a brainstorming session. I was definetely out-of-league here. I take in things bite-by-bite and come up with something...in the end.

Me and balak were fascinated by p2p and network related research stuff, while the microsoft employee sriram wanted to do something that will give "instant-gratification" to the masses. My suggession was that we do something with BitTorrent to alleviate ordinary web servers' bandwidth problems. (Yeah, I like BT). Initially, my focus was on eliminating slashdot effects. Then I wanted a one-code-to-solve'em-all thing. I had planned to write a replacement web server and a replacement webclient (actually, a browser) which would incorporate all of BitTorrent goodness in it.
Then when I actually sat and talked about it, sriram wasn't exactly sure people would like to change their web servers "just because a bunch of undergrads told'em so". So we sat down and refined the idea and pointed it at a slightly different but more appropriate angle. Since not a lot of them do podcasting and videocasting, we may as well provide a distribution media for them. But for a more short-term focus, we chose text blogs. Thus was born "Smoke". Read all about it in sriram's blog posts here.

(There's a running gag that if balak or I were asked "Why name it Smoke?", that we would reply with a "Ask sriram's GF" ;) )

Sriram was more into employing intelligent search engines, retro-fitted with intuitive visualizations and putting that behind a MS-legacy-usablity-for-the-masses UI. But now he wanted to build a machine. A machine so powerful that...ok, it's a virtual machine :)

I was not in favour of doing a virtual machine. Partly because I wasn't sure I could crank out something as big as a virtual machine, and mainly because I wanted a p2p project in my resume. But later, I accepted to be a builder of the virtual machine, partly because I knew sriram had no problems with doing a p2p thing so I shouldn't have one working on a VM, and mainly because neither me or balak was able to come up with a project worthy p2p idea.
One interesting idea I came up with involved "TorrC"s, or TorrentContainers which are just Torrent files containing hash of torrent files. Basically, my idea was to give the main tracker server the abilty to make a tracker out of the peers, dynamically and transparently. I hoped that this would halt the tracker server shutdowns for some time and give the BT sites more robustness and more bandwidth. I didn't really go into the details as this seemed too small a project and was dropped. Recently, I found that the BTHub project (@isohunt.com) is a similar, but more centralized, implementation.

So, thus began the construction of this awesome machine. And it was thus named ....(wait for it).....Smoke! It even has a proper little home at sourceforge.net. Sriram follows the building of the SmokeVM at his end in parts: Here's First, second, third and fourth parts (for now).

Since sriram was the authority on programming languages, virtual machines and stuff like that, we let him split the workload amongst us. Sriram enlisted more help from the outside. Aarthy and kaushik were invited to join the coding work force. Before long, each one of us were assigned our task. Sriram would muscle with the main smokevm engine, while balak and I would produce a python-to-smoke compiler, kaushik would produce the parser that reads-in the smoke code file and Aarthy would script out a Lisp-to-smoke compiler.

Not before a month passed, schedules changed followed by our plans. So, in the end, we wound up having to dump the Lisp compiler and post-poning a few of the ambitious goals for the college project deadline. I wound up coding the Python compiler, while balak wrote the project documentation (for the college version of the code). Though we got it all planned and stuff a full three months early, the entire code and documentation that was submited to the college was done in less than three weeks.

The entire project was filled with a lot of everything. I was helpless as I listened to a LOT of geek noise between the head-geeks (sriram and kaushik) both in chat conferences and in the smokevm mailing-list. I had fun coding something and sending it over to sriram for checking it with the VM, who would check it out immediately and reply. We were initiated to the world of cvs through this project. I had my own moments when I n00b'ily created a few modules on the cvs tree and sriram had to "patiently" delete those. I had great fun constantly updating my compiler code once I finally got the hang of the cvs thingy. I experienced responsiblity when I started creating Change logs for all my coding updates. We all had our own goof-ups in the project and I believe that we learnt a great deal from this.

I remember being baffled at the very thought of coding a compiler for a language that I hardly knew existed (python). But after I wound up learning that language, I was happy to write the compiler in Python itself. In retrospect, it only seems perfect to write it in python because it has excellent library support to examine it's own bytecode output. I had promised a be-all-end-all python disassembly guide after I am done with the python compiler and it will be posted.

Throughout the project, sriram was kind enough to ask me to have fun doing this python stuff. I really was having a good time.

PS:
There was another project we were considering at that time but it never had a definite goal to it. It was for the Microsoft Student Project program 2005. It was basically a contest where you can register your final year project with microsoft and compete among other students from other colleges for the best project. The winner also gets a chance to enter the Imagine cup 2005.
We had a [Search engine + Blog + p2p = Cool!] theory that we wanted to make a project out of. Since I got to register the project, I whipped out the name DEBIAN, a retronym for Distributed Enhanced Blog Itemiser And Navigator. A cool insult to microsoft if the project was awarded the best. I had to endure quite a few words from sriram before he tried to forget what name I gave the project (being a pro-MS guy that he is).

Tuesday, May 10, 2005

Drink your own spiddle!

It's always annoying when someone drinks water off of a glass(or silver) tumbler and not drink the entire contents. I always find this puddle of spit ridden water at the bottom of the glasses kinda disgusting and always(mostly/sometimes) get rid of that water and then rinse it before drinking off that glass.

It's been quite a while I've been doing this but today it suddenly it hit me that there was no actual word for this kinda water. So why not make my own. So here's my word for it: SPIDDLE.

How spiddle? Spittle + puddle = SPIDDLE !

All hail Spiddle (the word, that is).

Now, no more have you to be baffled when you are trying to refer to this obnoxious pool of stagnation at the bottom of the glass! Just say, "Drink your own spiddle!"

PS: Google says that Spiddle the name of some irish town. Who cares? Let them change their town name if they feel embarrased.

Sunday, May 08, 2005

"Be 100% sure"

Those were the words that would stare at me whenever I come out of the bath room after a ponderous bath. The words were part of a dettol liquid soap bottle, left unused for ....quite some time. So on my birthday, which keeps getting more uneventful every year that it surprises me, I finally took the dettol liquid soap...thingy (hereafter called "squeezy") to my room and "studied" a bit closely. (much to the anxiety of my mother).

After a few months of modelling petty things like chairs, champagne glass, tables and the irks, I found a technique that could help me model most of the non-baffling objects into 3D studio max. The step I folllow in modelling an object is pretty simple and straight-forward: Look at the objet and try to visualize it in terms of primitive shapes and geometry, then try to figure out how to place those primitives in the software. I don't know how long this method will stay good, but it has, so far.

Back to the squeezy. Designing the squeezy was the most fun thing I had done in quite some time. It had quite some radical new challenges than the chairs and tables: It had curves all over its body. I tried a total of 5 to 6 different approaches to model the sqeezy and all of them turned ..er..ugly, at best. Finally, after some sleep, I got the shape I wanted.
One of the reasons I didn't start modeling humans was 'cause of the curves (no double meanings here). The next challenge was the squeezy's nozzle. It was all curves. Again, when i was sufferring from unsuccessfulness, some sleep cured me. (But I really am not satisfied with the current nozzle).

The next fun part was desiging the label that's stuck on the front side. I never thought I would ever do "real" texturing this early into my designing hobby, but there I was firing up ol'photoshop cs and painting the logo. *sniff*I am so proud*sniff*. After working with an excellent context-sensitive and "sense ful" application like 3ds max, Adobe photoshop felt clumsy like hell. Found my way through it in an hour. (actually, I was searching for the custom shapes button :( ) Struggled with gradients, got fed-up with layers, wrestled with transparency and finally split my head trying to texture-map the squeezy in 3d studio max.

After that, my favourite renderer, Mental Ray started acting up. It would just crash just before rendering the scene. I tried changing the environment map, materials and render settings but to no avail. Since I was reluctant to go with the default scanline renderer, I downloaded some famous renderers like brazil and vray, albeit as trials. Both had a LOT more controls than the default scanline. Brazil(rio, the free version)would only output in 512x384 mode and vray was SLOW. So I went with default scanline renderer. After a lot of burnt-out images, I finally found the lighting that produced the least artifacts and settled with it.

Here are the final renders of squeezy:

Wednesday, March 02, 2005

Quantum theory + NURBS curves + P2P.

I was hallucinating while sitting there on the bed and a few thoughts crossed my mind in a flash. And luckily, I was able to remember them quite vivdly too. Actually, these were the questions I was toying with for quite some time.

So on with the questions:
1. We know that the data from a wave file can be represented in the form of waves, which in turn can be represented mathematically. The mathematical expression can be quite smaller than the sampled version in the wave file.
In 3D softwares, there's an equivalent version of mathematically efficient method of representing a 3D model, NURBS. Non-Uniform Rational B-Splines, are mathematical representations of 3-D geometry that can accurately describe any shape from a simple 2-D line, circle, arc, or curve to the most complex 3-D organic free-form surface or solid.
So the question is,
"So why not a mathematically efficient form for defining an arbitrary data file?"
Ofcourse, I understand that wave files and 3D models have certain parameters and a basic structure to it which is lacking in our arbitrary data file. But I want to try and see for myself why and how it WON'T work.

2. If you thought the first thought was crazy, wait till you hear this. I was thinking of the parallel universe as explained by quantum theory. The theory in its basic form tells us that, if the is a probablity of one event happenning is 1 in 100 then, there will be 100 parallel universes where each unique outcome exist in each universe. Mind bogling it may seem, the whole concept of quantum cryptography relies on parallel universes' existence, and it seems to work too. So no questions there.
So, following this theory, if we look into our past, it looks like we have had a pre-determined path down the time line. So I thought,
'Is this what people mean by fate/detiny? And can we predict in which branch we will be in the future? And if so, it is as if there was no probablity in the first place.'
Whatever.
But my real musings were to trace the tree back to its root and find out what the odds of our current universe being created looked like. This looks like the famous 'Sum over Histories' theory by the inimitable doctor Richard C Feynman, though I don't know much of it to conclude if it is the same.

3. Now the third question combines both the quantum theory and mathematical curves with Peer-to-Peer communication. No, it's not encrypting a wave file with quantum cryptography and P2P ing it.
I was involved, along with my friend Balakrishnan, in a project that aimed to create a P2P architecture that eliminated discovery servers in the form of central, napster-like one or a distributed, gnutella-like one. We chose to experiment with pinging the IP addresses of clients returned by a deterministic formula sequentially throughout the ISP's IP space and hope that there is a client running a P2P protocol in the IP pinged. As expected, this is very inefficient and unreliable, since the client is assigned a dynamic IP from a pool of IP addresses by the ISP. There is no pattern for predicting where in the IP address space of the ISP an online client exists.
Quantum theory suggests that electrons in an atom are not exactly locatable due to Heisenberg's uncertainity principle. So they are represented as Electron clouds rather than individual particles. If we consider the ISP as an atom, its IP address space as the volume of the electron cloud and each position in that volume as an unique IP address, then each electron is a client. And at any instance, the position(IP address) of the electron (client) cannot be found out, ever. I know that I am omitting the 'momentum' factor here, but I can't find a relevant equivalent anywhere in this picture. So, I wanted to know,
'Can quantum theory be applied in P2P models (if not ours)?'
Wait...where's the mathematical curve idea in the third question? It's just that, I wanted to use the mathematically compressed files in P2P to reduce traffic, that's all.

So there ya go. The awesome ponderings of the idle mind.
(Warning: The above were from my own mind. Some of the information maybe correct and if so, I hold the IP right ;) )

Saturday, January 29, 2005

All your cable modems combined, I am Captain Pirate!

I read this interesting article and this one on music during one of my quests into the world wide web. That article reflects what I had already thought could be the case with P2P and file sharing but which I dismissed as too radical a view on the issue. Besides, being on the other side of the law, I thought maybe I am a little too biased for the topic. But now I can rest easier knowing this view was actually proved right.

I used to think that the software and the media industries are not losing money at all because of P2P file sharing or even pirated CDs. I have even read many interviews in DIGIT on how various people are moving to original version software instead of the bootlegged one in order to avoid the headache of missing features, files, etc. So people who want to buy an original version and have the money to buy it still buys it. Those who want the software but don't/can't pay for it, swaps it with the peers on the Internet. So essentially, there should be no effect of Peer-to-Peer pirating on the sales of such products. But the company quarterly postings report severe losses, mainly attributed to P2P. What's going on? Which figure is lying?

I don't know much about this, but my two cents thought says that I believe that the music and software just lost the new buyers of their products who would have liked the product, had the resource to do so but would have preferred to have the copy while idling in their chair.

There can be no doubt that P2P has started receiving all the lime-light. New P2P architectures and applications are aiding more faster, comprehensive, up-to-date and anonymous file sharing. In such a scenario, existing copyright technologies fail to prevent illegal copies to be made. The media and software industries are boiling over watching their products being swapped between countless users without paying royalty. Sure, there were a few arrests and cease-and-desists but that's not going to stop another P2P technique from providing more anonymity to the bootlegger. The cat-and-mouse chase can go on forever, with the media cops catching up with the bootleggers somehow, but can't this be avoided? Why is it happenning?

A long time ago, when men were men, women were women, small furry creatures from alpha-centauri were small furry creatures from alpha-centauri and Objects were real, physical objects, the concept of object ownership was simple: Anything that you created, grew, planted or staked was yours. Period. Everyone was happy. The birds chirped. Leaves rustled and Life was beautiful. That is, until someone came up with mangled-looking scrawls called "Programs" that would inevitably crash big boxes called "Computers". If the ancient law of ownership, that made people's life oh-so-happy, were to be applied to them programs, no company could have made a single dollar. This was the time when security people can get away saying things like floppies and "Dongles" without getting wierd looks from other people. This was a time when the Internet was an obscure science-fiction rumored to be mentioned at some big university. This was a time when file sharing involved two environmentalists writing on a single sheet of paper.

Times definetly changed and here we are with the same old copyright protection laws offering a paper-thin resistance against copying when multi-megabytes can be downloaded and uploaded for million others to download and upload. P2P and other disruptive technologies may have made many illegal software owners.

In a recent article I read, the cops in United States were having a problem: Illegal aliens (foreigners, incase you let your imagination wander) are at large and their huge number makes for a large suspect pool for the cops to investigate everytime a crime occurred. The security advisors came up with an utterly surprising and clever plan: Instead of ignoring the presence of illegal aliens among us, let us recognise them and provide them with alien driving licences. Now this may come as a shock to a lot, but the idea has a clever base to it. Since the aliens can obtain a licence, the good people of the lot will come forward and let themselves be testified as good citizens. The remaining of the lot will contain the bad eggs. This will atleast reduce the number of fake driving licences and other documents and at the same time give the good guys a chance to prove themselves.

Applying the same logic to our case, instead of trying to ignore the fact that illegal software owners will be around always, why not come up with copyright laws that acknowledge such users' presence and invent some kind of compromise-plan to keep their numbers at bay? I am not sure if such a scheme would be the best, but it sure seems a good change of perspective to me. I think Microsoft has come up with something like that in distributing future software updates to its Operating Systems.

PS: Okay, the title of this post doesn't have anything to do with the content, but it seemed cool :)

Thursday, January 06, 2005

BitTorrent, the slayer.

The first time I heard of bittorrent, some unthinkable months ago, I was grinning so broadly my grinning muscles hurt. It was as if some mythical demon that was unslayable, was slain magically by the words that I was reading. How rude of me! Let me introduce that demon to you.
Meet Dm.Redundant Packets (Dm = Demon ;))
Mr. Redundant Packets was a mythical kind since he was a demon only in my head. I saw him whenever a server has to send the same data packets to multiple hosts, even if the hosts were requesting the packets simulataneously. The demon became really stank when a server buckled trying to send the same file to too many hosts. It was quite horrifying to watch all this, but since I could think of no solutions, I was but a mute spectator.

A lecture that introduced me to "Multi-casting", a form of "Broadcasting" but within a predefined set of exclusive hosts. These are exclusive hosts since in order to be included in a multicast group, you need to pay some money and register your "group" under a multicast IP address. This is because, as opposed to broadcasting where every Jack connected to the sender should be sent a copy of the same packet, in multicast only one copy of the data is sent to the multicast group. The multicast IP address (Class D : 224.0.0.0 to 239.255.255.255) is needed to identify the group since the multicast happens at the router level. Thoug my level of knowledge stops here with multicast servers and MBones, I knew enough to belive that the demon wasn't dead, yet.

Joining multicast groups is costly and this is something which can't be done every time a server wants to send data to multiple temporary hosts.

Enter BitTorrent.

I was reading about bittorrent some months back and realized its potential as a new distribution system. Though I always feel this way with every new P2P system, mainly because P2P architectures excite me, it was different this time. That is because bittorrent smoothes one major inevitablity in P2P systems: freeloaders. Simply put, FreeLoaders are those who downloads but not shares. P2P systems adapted to eliminate this problem. Kazaa implemented a "Participation Level" counter which says to other hosts of the number of files being shared by the user. It worked somewhat, but was not enough.

BitTorrent follows the "Give And Ye Shall Receive" mantra. What BT does is, it splits files into samll equal sized pieces and lets people download the pieces from other people who have already downloaded it. This way, the original "Swarm" doesn't have to upload more than that file's size worth of it. To ensure that the pieces were legitemate, BT uses SHA1 hashing to hash each piece into a 160 bit (20Byte) hash string. This and other meta information are present in the Torrent file which is made available in the public domain. The actual file itself is spread over the world, literally. If anyone is downloading the file in question, you are sure to get it because by downloading, you ensure that future prospectors of the file will get it. But he will get it as long as someone is seeding it and/or downloading it.

Unfortuantely, this property of BitTorrent means that, only popular content remain downloadable via BitTorrent. If the original source of the file exists, it has to be found and downloaded in a client-server like manner.

But fortunately, there are many applications that follow this eccicentric business model. Television shows, blog posts, podcasts, CVS or bleeding edge codes to name some off the top of my mind. Content distribution via BitTorrent is not only fast and cheap but also far reaching. Let me quote one article in Wired that I read recently:

Evidence that Burnham's prediction is coming true came a few weeks before the US presidential election in November, when Jon Stewart - host of Comedy Central's irreverent The Daily Show - made a now-famous appearance on CNN's Crossfire. Stewart attacked the hosts, Paul Begala and Tucker Carlson, calling them political puppets. "What you do is partisan hackery," he said, just before he called Carlson "a dick." Amusing enough, but what happened next was more remarkable. Delighted fans immediately ripped the segment and posted it online as a torrent. Word of Stewart's smackdown spread rapidly through the blogs, and within a day at least 4,000 servers were hosting the clip. One host reported having, at any given time, more than a hundred peers swapping and downloading the file. No one knows exactly how many people got the clip through BitTorrent, but this kind of traffic on the very first day suggests a number in the hundreds of thousands - and probably much higher. Another 2.3 million people streamed it from iFilm.com over the next few weeks. By contrast, CNN's audience for Crossfire was only 867,000. Three times as many people saw Stewart's appearance online as on CNN itself.

This method of broadcasting a content via BitTorrent, called PeerCasting, is being regarded as a nemesis by the distribution companies. Why? Imagine television with advertisements ripped-off, unnecessary scenes and credits cut-off and nil cost of distributing via huge TV antennas and satellite dishes. That is the nightmare distribution companies have when they want to dream about their future. But what does that mean for the movie/Television industry of the future? More self-produced shows, techie science shows that were super-ceded by fashion pageantries and such shows may increase and be sustained by blogs and emails who patronize them. These will, ofcourse, exist side-by-side oridinary sequential television for some time atleast. Maybe some other technique will take the throne?

I along with Sriram krishnan and Balakrishnan, who are my class-mates and close-friends, are currently working on integrating BitTorrent distribution model into blogging websites. By observing how blog servers were buckling under the current text blog load, we felt that the bandwidth required to serve hundreds of podcasts and video casts, in the future, will drive the blog servers out of money and out of the picture. We wanted to try fitting BitTorrent like tracker server transparently without modifying the existing blogging platform. Kinda like what Coral did. Seems like not many attempted this but rather have been successful in replacing servers with BitTorrent trackers. The trouble we thought we would face if we attampted that was that nobody will replace their servers just because a few undergrads asked them to. BTW, the project is called "Smoke" and since a better and in-depth description of it is available at Sriram Krishnan's website, I shall refrain from repeating it.

PS: During the project proposal phase of our smoke project, I spent a full five minutes writing one sentence to my fellow mates on, as it turned out, a varied topics. Check it out:

Don't bother printing, even if you have already done so, as I have changed the definition of "slashdotting" and changed the constant values and have printed it out, which is because of my lateness in answering to your late response to my earlier request to type out this project proposal as an adaptation of sriram's description of our supposedly finalized final year project involving two of my favourite topics: BitTorrent and the web, which I suggest are my favourite topics because of their simplicity and effectiveness and, last but not in anyway the least, their inherent beauty of which I am captivated the most, I should say, as will be expressed in my next blog post titled"BitTorrent, the Slayer", named because it (bittorrent) hath slain the demon of a problem namely "redundant packets" that any server running the currently old Client-Server architecture can experience due to its inherent nature of not using the client's cache and the originally small files for which the client-server architecture was developed forand which it served well and, hopefully, will serve well for quite some time into the future, that is, until our or rather a BitTorrent like architecture, supercedes it - an event I sure wish I was around to spectate - an event I am sure will happen never minding what people might tell you just because there are softwares that only support client-server models around and because companies know how to compute Cost-Benefit which will show that the company will be better off adopting the BT distribution model and eliminate nasty bandwidth requirements and slashdotting effects than wastefully serving the same file simultaneously to multiple clients, while that bandwidth could have been used, say, to launch multiple new servers all running the BT-like distribution model, which excites me so much since we are too developing one such solution which might get popular if it is really revolutionary in such a way as this seemingly endless, and not to mention surprisingly techie, post that started out as a simple acknowledgement post and ended(not quite) as a "dotless" , some what techy and hence not just a non-sensical rant usually just created for the sake of typing out such a monstrosity of a sentence which no one in their right minds would try to follow after the first few lines which indicate the shameful intent of the author.

So long (and thanks for all the files)!

Profile

About Me

Name: Anandkumar Arumugasamy

Location: Chennai, Tamil Nadu, India

I am totally introverted and a science loving person. I am an atheist. I am a bit slow in succeding a task. I can't understand maths even if my life depended on it. Rest you can figure out from my blog.

View my complete profile

Are you sleep-walking, or bubbling with caffeine? Click the "Start" button and click on "Stop" as soon as you see the border flash Red.

Press start to begin.