Wednesday, September 22, 2010

If the winter is here, the Next Summer Of Code can't be far behind.


What started as Google Summer of Code on May 24th, 2010 came to an end, like many other Open Source Contributors from around the world who participated in it. It was an experience that can be gauged from the amount of code generated worldwide. Thanks to StatusNet team and Mentor Brion we put together a system which is one of its kind on the Internet. A Distributed Approach To Finding Friends You Already Know On The Federated Social Network. Codenamed Talash.

Sometime back the progress report updated on the features implemented and those that were to come. The project has ended nicely and I really love the results. This blog post summarizes the final updates on the work , some interesting side products and its future.

The project was a recommendation system in general. It would recommend new friends to the user and try to expand his online social network. Imagine yourself as a user on the federated social network. Now you need your 'first' friend on the internet. But you don't know where he is. This cold start problem was eliminated by the first phase of the project, named 'Quick Connect'.

Fast Friends

We tapped into the address books of out user, found the email address of his friends, and searched the Social Graph for his other public social profiles. If the profile is subscribable, then our user is recommended this profile. Suddenly you have many many friends, whom you didn't know they existed on the internet, for subscription.

We are currently tapping into Gmail, Plaxo, Twitter. We use OAuth to access this information. A new OAuth wrapper has been designed over the current in-library oauth class. A reference is here.

What if you want to add a new service? Does your website have an API for accessing a users contacts and you want it to be integrated into this plugin? Check out API Section of the documentation to find out about plugging in your own service. A basic framework of the endpoints is provided in the documentation and can be easily modified.

New Friends

Then we come to the 'Delayed Connect'. The contention being that a user can take time to settle down in this immediate social network before searching for newer friends, whom he knew or shares common interests with. We feel that the analysis of a users social graph must be done with the users permission, and hence we give a preference list for the user which also acts as a filter to our recommendations.

  1. Recommend me friends I know on other social networks

  2. Recommend me interesting friends

  3. Recommend me friends in my geographical proximity

The first option has been implemented, the framework for the operation of second option has been created and third is still pending.

To search for friends whom you might know on other social networks, we take the list of your friends' friends. We search their social graph and if you and they are follower/subscribers on an external website, then they are recommended. Since all these requests go on HTTP and friends of friends is a huge number which grows exponentially, we have a good cacheing mechanism in place which reduces these requests.

Friend Similarity Endpoints

To allow 'Interesting Friends' recommendation to our user, we give a chance to all the plugins to contribute to the recommendation process. The idea is to generate a similarity score, through every other plugin, between the users and then find their closeness.

We have a FriendSimilarity class which can be registered by any plugin. The plugins can then define a similarityScorer method, which takes two profiles and gives the similarity between them. This generates a score between two users, which can be used to recommend one user to another. for example, consider the Music Plugin, and a similarity score between two users as 85 if they listen to same music, and about 10 if they have nothing in common. This score can be used to recommend profiles that are 'interesting' to our user. This endpoint has tremendous power in terms of users social network expansion.

Closure , Future And Thanks

The GSoC timeline ended. But not the project. Talash has got tremendous possibilities for improvements. Its a one of its kind implementation since Federated Friend Finding and Recommendation algorithms can be implemented on the only Federated Social Network StatusNet. It can distribute the processing of large Social Graphs and has tremendous research scope. This plugin is only the first step.

In the coming future, I shall we working with Brion Vibber to integrate it completely into the StatusNet codebase. The code requires a lot of fine tuning and probably use Delayed Queues for delaying the OAuth retries in case of failures.

I would like to thank Evan, Brion, Zach, Rejon, Derek for making me a part of this wonderful project. And a hats off to Shashi, Luke, Ian and Arunoda who created more than awesome projects this summer!! I known we shall keep contributing to StatusNet in our own ways.

Thursday, July 15, 2010

Delayed Connect

The second phase of my Google Summer Of Code Task is known as delayed connect. It will be a mechanism to recommend new friends to our users by analysing their social graph. Since this process of analysis will take time, its a 'Delayed' mechanism to find new friends.

The Federated System
The goal is to recommend new friends. The traditional recommender systems in place use Collaborative Filtering and Content Based Filtering to find them. An entity can be safely recommended to a user if both of them follow common traits or are similar is some respect. Different attributes are used to find similarity between two users. Example,

Common Friends
Geographic Location

There might be many more based on the system for which its being developed but we will stick to minimal few.
Further a base of recommending entities are used to develop the algorithm. This means a few friends of our user will be selected based on their similarity to the user attributed to the above traits. These friends will be used to find new friends. But these may recommend hundreds of new friends, so a filtering mechanism can be used to find the top k friends which our user might know.

Now that we have a federated social graph, each StatusNet instance on the web will create its own small graph of social interactions. If all the users on a particular domain stick to friends only on their domain, they can never expand their interactions into other OStatus accounts available on the internet because they simply dont know about its existence. Its a Cold Start to our efforts. Thats where Quick Connect comes to our help. Tapping into a users contact list break boundaries and finds new friends right away!

Another issue is that of recommending spammer accounts. A Karma Plugin developed sometime back is the best start. It uses a technique similar to PageRank to find important and valid users. This infrastructure can be easily used to reduce spammers entering into our recommended friends.

A Federated system will require cacheing so stop reparse of social graphs. This can considerable slow down the process and cause unwanted replicated data. A cacheing technique is being designed for this task to eliminate this problem.

This part of the project will develop a basic framework for Friend Recommendation Systems on Federated Networks. Its quite challenging And really enjoyable!!! I shall keep updating the ideas on this Wiki. All suggestions are welcome.

Wednesday, July 14, 2010

Half A World Away

Few days after the latest release of StatusNet 0.9.3 I have reached a halfway mark for Google Summer Of code Project 'Finding People You Already Know'. This is an update of whats done and whats coming up soon!

The Repository at Gitorious now contains a working implementation of the plugin ready with Quick Connect. As a recall, Quick Connect is a mechanism to find out which of your contacts have an OStatus account. Currently we are tapping into Google Address books and can search through your Twitter! friends. Some documentation is also available at Wiki.

The interface now allows you to login with your credentials at Google and Twitter, and syncs your address book. Remember that the wonderful OAuth authentication mechanism ensures that we dont store your passwords! Once authorized, you can browse through your address book. If any contact has a OStatus account, you can subscribe to him right away! The plugin carries out background syncing of your contacts, hence you can skip going through all the contacts. They will all soon appear at an contact management interface from where you can administer them by the options provided.

Soon the Open Social Graph API gets into actions and finds all your friends' OStatus accounts. A recommendation list in the form of 'New Friends' appears and all contacts who have subscribable accounts are listed. You suddenly have lots of friends to whom you can quickly subscribe to and stay updated!!

Coming soon is further use of Open Social Graph API for recommending new and interesting friends by analysing your social graph deeply. I am at Half A World Away, and loving it!

Friday, May 21, 2010

Finding People You Already Know!

So this summer I shall be working with Brion Vibber on a project titled 'Finding People You Already Know'. The project is primarily to let new and existing users find their friends' OStatus accounts and subscribe to them. Additionally it will recommend them about potential friends by analysing their social interaction.

The federated nature of StatusNet social network makes it hard to find friends on other domains unless you explicitly know about it. A two pronged approach will be used to tackle this hurdle.

First approach known as 'Quick Connect' will allow users to tap into their contact lists on Google and Twitter and check for presence of these friends on the same domain as the user. Further the Google's Social Graph API will be used to find other public OStatus profiles of his friends registered on other domains.

The second approach called the 'Delayed Connect' will be a framework that will generate a users social graph. We would use StatusNets own API to retrieve friends lists on same and external domains. The plan is to analyse this graph on some parameters like subscriptions ,subscribers and even their location or profile information to recommend new friends to the user. This service is known as 'Delayed Connect' because the plan is to have incremental analysis of users social graph. Its not necessary that user be recommended new friends every day, but it can be delayed so that the user gets enough time to understand and mingle with his existing friends network.

There is a lot of research going on to study and understand Social Networks.

Thanks to the open source nature of StatusNet and the wonderful StatusNet team I hope to see some theory in action! This is going to be really cool Summer!

Thursday, May 20, 2010

SN Framework..

Thanks to Brion, Shashi g0 andthe SN team I managed to throughly understand a wonderful framework built in php. Almost everytime you see an open source project, you get to learn immensely from it. People with lots of ideas have put their brains together to build a successful and stable product. Sometimes I still wonder how people from different parts of the world manage a consensus on open source projects.

SN code base is different from other Socnet codes I had come across. It faintly reminded me of Django. MVC designs are popular and successful. Application frameworks like Django, Zend etc. give ease in design, better DB interfaces, stable and quick deployment. For a complex and scalable aplicaiton like social networking, SN has pretty much the most perfect codebase.

It basically has three components, as I see it. First being the Memcached_DataObject, and database wrapper to allow object oriented database interaction. The Action class, that renders the html views. And a host of library functions that form the SN's major codebase.

After the template feature of Django, SN's html rendering functions seem very odly places. The XMLOutputter is a wrapper around the PHP's XMLwriter. It gives an interface to generate html code using predefined functions. Its element() function generates standalone tags like P, BR etc. elementStart() and elementEnd() can encapsulate those that are not simple xmls. This class is inherited by the HTMLOutputter which generate the html headers and provide more html elements like input, dropdown, checkbox, stylesheet links, js scripts. Probably security was one reason for such an interface. During some free time, I will try to inject the code with malicious stuff, I hope I fail :D.

Then comes the Router. This is similar to the url configration settings in django. The router maps a user defined url to an action. So at runtime we can define a /jokes/complex url corresponding to action that pulls out the jokes from a table and renders it. I loved this feature.

Every one has different strategies to let plugins interact with the framework. SN uses hooks at various locations which can be overloaded by a plugin to inject code into SN. Events such as end of registration, starting to enqueue notice, start or end check a password, show navigation menus can be overloaded by each plugin and extra code be injected into it.

Daemons for synchronizing different user actions, publishing user notices, and for features such as syncing a users twitter and facebook posts, is an simple feature of SN, but extremely powerful. My task will primarily involve using daemons to generate and process lot of user data!

And I am just waiting to get hold of this kind of data. To generate a users social network graphs and to analyze them! Hopefully SN will generate the next largest graph database.

Saturday, May 1, 2010


Its True! Blue colored font with black background flashed apple screens when Mac finally ran on Intel processor cores. Probably something that you can believe. But a GSoC'10 acceptance is something that you can't!

Yes, Its True. I have been accepted into Google Summer Of Code 2010 for the Organization StatusNet. The project is titled 'Finding people you already know'. That was confirmed by a midnight email from google with subject Congratulations! Thanks to my gmail filters the mail was never visible in my inbox. But one of those code-google labels held it safely! I Am In! Under the mentorship of Mr. Brion Vibber.

StatusNet was a unique organization. I was searching for SocNet related projects and I came across geeklog, facebook and SN. Geeklog was a blogging site and they wanted a feature to allow permissions and friend networks to be established across blogs. This had interested me a lot, but somehow the communication had died down. Facebook was impressive too. hiphop was suggested to me 'equador silvax' and it was something interesting to contribute to. But then I had come across SN.

After I had installed it (which was so simple compared to Thinktank [<- It had even suggested a project to improve installation for thinktank]), boom!! there was a twitter like platform ready for your use. Now not every one can run their own !twitter service. So this was a fabulous solution. I installed it and ran through its code. It had looked complicated and their documentation was sparse. The version itself was 0.9.2. But the code base was pretty large. After I tried a few applications and ran through the code I was confident of taking up the projects offered by thr org.

SN projects were equally interesting, and I had specifically liked 'Social music server'. But since I didnt have much interest in music itself :), I turned to one that somehow reduced to a graph problem. Not completely but partially. The discussions were really interesting and one specific one with Evan made me realize some fundamental software design principles.

Brion was stuck in europe due to volcano ash disrpting air travel back to San Francisco during the application period. 'We have not forgotten you' was his notice to !statusnetsoc, which was really inspiring in the middle of all appliction period. He is great to talk to and extremely experienced. He has worked with wikipedia and mediawiki, and now going to work with SN. He is a great personality to interact with and I know I will get to learn a lot from him.

Anyway, now that I am in, I shall move some concentration back to studies and complete my pending projects. ANLP demo and presentation is due tomorrow, and PA exam will haunt me soon!

Will post about the project details soon!

Let the Google Summer of Code Begin!

Sunday, March 28, 2010

Google Summer Of Code 2010 Is here again!

This is the third year I plan to apply to Google Summer Of Code. GSoC is an experience. In 2008, I had applied to Drupal. My application proposal was very strong and new, but I failed at the end since I hadnt responded to their queries. It was a hard year academically and I had touched my lowest SGPA :) . Then I had applied pidgin with a strange idea, and was astonished to see that some one was already working on it.

In 2009 I tried yet again. This time I tried to align close to my technical interests and applied to ns-3. ns-3 had the most friendliest community and I it was really kind of them to explain the problems to me and help me develop the idea. I gained a lot of insight into their simulator and the general idea of distributed computing. It is a quality effort and I hope that ns-3 soon becomes equipped with all the functionalities available in other simulators and beats them to be the best! Probably sometime later I hope to be able to contribute to their code base.

This year I plan to apply core physics oriented, or those likes Gephi for graph visualizations and social networking application projects like geeklog, facebook and Statusnet. Its surprising to see that almost every organization is having a SocNet project.

The best thing about GSoC is the time before the applications period open. I google for 2010 project idea and try to find out about the kind of projects being offered! This gives me time to go through the organization, its products and updates me about the technologies emerging over one year. There is a mad crazy and intelligent set of programmers trying to beat technology of the past and delivering better online user experience. Lets hope for the best for GSoC'10.