World Class- In Our Backyard
It was late night at the end of a long, long day at work. Even the lights in the fishing boats anchored in
There is a brute force way to do that - employ hundreds to check every major website in the world… But every time the content changed you'd have to go back and redo the work. Anyway, hiring that many people would cost too much. There had to be a more elegant way. For example, write a computer program to patiently check sites throughout the world and send back snippets of information that was of interest to Indians.
We understood the first steps in doing this. For instance you can easily get a computer program to tag an article as '
This much we had figured out on our own. The trick was how to do this economically. Here again the brute force method is to crawl every single English language site in the world and look for words to compare with our chosen corpus. But the elegant way would be to devise a method of inspired guessing as to where to look for first and where to look for next.
That night I was trawling the internet for research papers that described methods of inspired guessing.
Here was one! Accelerated Focused Crawling through
Online Relevance Feedback. I skimmed through the paper; it pretty much dealt with the problem I had in mind.
It was past midnight in
I ran my eyes back up to the start of the paper to check which
I couldn't wait for the sun to rise to call him. The next day, I and some colleagues trudged to IIT Bombay to meet the professor. He was sitting at his computer in an ice cold room in a remote corner of the Computer Science Department which itself was in a remote corner of the IIT Bombay campus. He was very helpful and immediately gave us the computer code that we needed.
I was curious about how he got interested in this topic. He pulled out a book from the stack in his room: Mining the Web: Discovering Knowledge from Hypertext Data.
"This is a textbook I have just finished writing for US computer science students", he said nonchalantly tossing it back into the stack. "It will be out in later this year in the
It turns out that he was at Stanford at the same time as Brin and Page, the Google founders; they went and found a venture capitalist to fund their search engine efforts and Soumen came back to India to work at IIT Bombay: "because my mother is old and does not want to leave India".
Landing the Soumen catch turned out to be the easy part. Getting to engage IIT Bombay in a commercial relationship was to be a near-impossible task. The process for such an engagement is unchartered territory for Indian academic institutions. We settled on a compromise: we hired two of his star graduate students (or more accurately he persuaded them to join us instead of doing what all their classmates did- immigrate to
I have ever since felt mildly guilty about this arrangement that gave us so much know-how for so little payment. Till I encountered the head of Sarnoff Labs at a conference in
I was much relieved to hear this. So, the arrangement we stumbled into: hiring a professor's top students and getting the professor for free seems to be the way R&D is done today. And the irony was that we found it in our own backyard.
END