I decided to play around a bit with lalita. Lalita is an irc bot developed by some members of the PyAr - Python Argentina community. What I wanted to achieve is to have some way to be notified when there is something interesting going on in the irc channel, without having to read every single message, or requiring others to actually say my name (so my client notifies me).
I thought about writing a bot (or in this case, extending one), to be able to detect "interesting" threads and notify me about them. The first approach involves detecting different "conversations". To keep it simple, I just decided to define a conversation as a sequence of messages within the same time frame (i.e. the difference between two consecutive messages is small). Thus, a conversation ends when there is a significant pause in the stream. This is a very simple approach, but it's enough for the time being.
Next, in order to determine if a conversation is interesting, I calculate the probability of an user taking part of a given conversation. This is simply the number of messages written by that user , over the total number of messages in the conversation. If the probability of a user taking part on a conversation assuming a second user has taken part of the conversation (conditional probability -- bayes theorem) is "high enough", then the conversation can be marked as "interesting" for the first user.
So basically, the idea is to keep running conditional probabilities for the active conversation in the channel, and if the conditional probabilities match, just notify the user that the current conversation might be of interest.
Future ideas include:
- fine tuning the "conversation" definition - scanning for keywords, to make interests also dependent on the actual content of the conversation - tracking interest relationships