Tuesday, 23 September 2008

Search patterns in a corporate social bookmarking service

This brilliant and useful paper found in Riina's blog explores the search patterns in dogear, a corporate social bookmarking service of a multinational company. This service, only available for company's employees, allows to bookmark internet and intranet links. Another particular feature of this system is the corportate identity of the employees that can not bet hidden by a pseudonym.

The research combines quantitative (log analysis of users actions, followed by cluster analysis of these data) and qualitative (15 interviews to users) methodology in order to explore search patterns within the dogear bookmarking. Three big groups are identificated: Community browsing, Personal search and Explicit search.

Millen, D., Yang, M., Whittaker, S., Feinberg, J., 2007. Social bookmarking and exploratory search
URL http://dx.doi.org/10.1007/978-1-84800-031-5\_2


The table (see above) shows the outcomes of the log analysis of users' actions where events represent the action of accessing a particular view (recent posts, another user bookmarks, oneself bookmarks...) and clicks the action of clicking in one result of these views.

Community search is the most common search pattern. Explicit search and personal search (oneself tags) come in second and third place. The more common pattern is to look at the recent post, but other strategies are also used, like looking at what "thought leaders" are bookmarking or what is hot in a particular topic. The interviews have highlighted that this strategies rely on the trust that employees have in the community or in some users. The fact that corporate identity (full name, contact details...) is displayed play an important role here, and also in another search pattern: looking for experts in the company.

Personal search states for users that come back to their own lists of bookmarks and tags. Users that make personal search are frequently those who are active bookmarkers.

While community search is typical of exploratory search (where the goal is not really to retrieve an element but to profile a user, a topic, news...), explicit search is more orientated to retrieve an element. This explains the high percentage of clicks (39%) in these views. Again, the trust in the community play an important role: "...relatively high proportion of clicks... combined with interviews comments suggest that social bookmarking services provide a good way to capture high value pointers to information sources".

Sunday, 21 September 2008

Networks effects, some links

Dion Hinchliffe made a revision of the topic some time ago and explained that a project that want to exploit their potential need to foster it: "If you have a million people visiting your Web site but you're not leveraging network effects with them (such as by letting them contribute and letting others see and respond to those contributions), then you're probably squandering the greater part of the value of that million person audience."

There is also some links to the theories behind, Meltcalfe, Reed and Bob Briscoe, Andrew Odlyzko, and Benjamin Tilly.



Monday, 15 September 2008

Evolution of the percentage of edits made by active users in Wikipedia

Power of the few vs. wisdom of the crowd: Wikipedia and the rise of the bourgeoisie has used a history dump of English Wikipedia (58 millions of editions, 4,7 millions of pages). To process this data the have used a powerful environment based in Hadoop. The have calculated the number of edits made and the changes in content between edits (counting words, not lines). Users has also been classified by number of edits, and the editions have been grouped by types of users (from very active to sporadic).

Comparing Wikipedia coverage by domain areas

An Analysis of Topical Coverage of Wikipedia uses a random sample of 3000 articles. These articles have been classified by subjects. The extend of each article has been measured by the size of HTML page in kilobytes. The number of edits for every page has also been recorded.

Measuring Wikipedia

Measuring Wikipedia has used dumps of German, Japanese, Danish and Croatian Wikipedias. While German and Japanese are large-size wikipedias (second and third), the other two are example of middle-size and small size. These date has allowed to study the growth of wikipedia (database size, number of words, internal links, users and active users...), the proportion between articles and related talk pages, articles size distribution, proportion between articles and authors, articles and edits...

Web based bibliographic annotations specifical patterns

Tagging tagging. Analysing user keywords in scientific bibliogra-phy management systems use a dump containing all post uploaded to Connotea (resource name, URL, tags). This dump has been retrieved using the open API of this application to explore tagging patterns.

Data processing:
A team of researchers has examined these data and has made some suggestions for linguistic and functional categories of tags. These suggestions were discussed in a workshop and integrated in a preliminary category model. In a second phase, this category model has been verified.