Interesting Times

Wednesday, August 16, 2006

Social Networks and Space Planning - How Buildings Learn

Seth Earley pointed me towards a book my Stewart Brand “How Buildings Learn” – it’s fair to say that I’ve been consumed by it for the last few weeks – members of my family avoid making eye-contact with the whispered prayer – “Please God don’t let him start up about buildings & time!”

The book is full of surprises but perhaps its most intriguing central idea describes the interaction between buildings and their occupants through time. It seems too obvious to state that we change buildings but in many and varied ways buildings also change how we go about our activities. Brand provides a model for the various layers of interactions and excahnges between buildings and their occupants. He makes the point that these layers can be divided into two broad categories characterized by the timeframes in which they play out. On one hand there are aspects of a building, such as its site and foundation, that change only slowly or not at all; on the other hand there are the fast-changing furnishings, draperies, and wall coverings that are expected to change relatively frequently. A critical insight – the slow-changing features of a design constrain the fast-changing elements.

Brand offers example after rich example of how this plays out in practice and opens a window onto some very juicy questions. Specifically, given that internal “spaces” will be modified to meet evolving needs, and given that via social networks analysis we are beginning to undersatnd how people and organizations interact in detail, what is the best space planning strategy to achieve better collaboration, organizational agility, creativity, productivity etc.?

My initial searches for answers to these questions has come-up short. There are quite a few resources that describe the advantages of modularized space and furniture designs. However, the main measurable advantages seem invariably related to the lowered (facilities) costs associated with reconfiguring work spaces as individual jobs and organizational responsibilities change. Exactly how work spaces should be designed to meet defined organizational challenges seems to be open to a good deal of conjecture and anecdotal commentary (e.g. communal spaces are valuable for collaboration, window-views to the outside are good for morale…with little or no evidence).

Free Image Hosting at allyoucanupload.com

It is interesting to make some guesses about the future of space planning in the context of mappable social networks. Functional responsibilities within organizations can be divided into two broad categories – PROCESS and, CONTROL. The basic idea is the Process groups perform well defined tasks within a specific area while Control groups regulate activites/communication over multiple functions - with lots of opportunity for ambiguity and confusion!

Even though it may not be possible to predict exactly the future relationship between social networks and space planning it is possible to know what roadsigns to look for –

Link Processes and Control - that is, physically arrange organizations so that members of a Control group are close to where the Processes they are responsible for are located. There are many examples where this happens already - Finance and HR will often have personnel imbedded in departmental offices. But, how often does Marketing, Regulatory Compliance interface directly with Production, R&D...

Systematic Reviews – Organizations conduct periodic Communication reviews the same way we hold Design and Performance reviews today.

Connect Virtual and Real – Establish non-virtual connections between centralized and remote employees. Are there desks reserved for field employees so they aren’t just visitors at the corporate office?

Competitive Communication – Is there one person within a department that is responsible for communicating and updating the competitive status of the group? How are we doing against our group goals? Where are we not getting the tools and information we need?

Your turn…

Monday, June 26, 2006

Top Ten Social Networking Sites (April 2006)

Two interconnected views of the world -

By volume and year-over-year growth, here

By stock price/gains, here

Quepasa (QPSA) is a latino-oriented version of MySpace, The Knot (KNOT) is a bridal/wedding planning site, WebMD (WBMD) is well...WebMD. Clearly a central message in these numbers is - if you want satisfied users, have the technology fit the need, not the other way round (as is all too often the case)

It's also worth taking note of the fact that there are some 2 billion teenagers in the world - that's 500 times the number of teenagers at the peak of the baby boom! Moreover, most of these teenagers are in Asia, South America and, to some degree Africa. If you are developing "english-only" sites or applications you need to get you head examined.

Saturday, June 24, 2006

Sweet Holy Moses - Kawasaki interview with Gasperini

The Bay Area is generally so full of overeager VC posers and one-shot-wonder doofuses that you develop a type of blindness to it after a while. But since I'm very concerned that the "bad" money surrounding MySpace, Facebook et al. will drive out whatever crumbs of "good" money there are for sane SNA projects, I try to stay in touch with the latest opinion drivers. When I clicked through to this interview posted on Guy Kawasaki's blog my finely tuned BS-o-matic immediately redlined - although in all fairness it is well worth a look; read both the body of the interview and the posted comments. In a nutshell Gasperini gives an overview of her analysis of "youth culture"...

And since it is Friday night, make sure to click through to MySpace the Movie...mentioned in the article.

Friday, June 16, 2006

Scale-free Networks

When I recommended a few books related to social networks recently Dan Keldsen of the Delphi Consulting Group suggested that I include two additional titles – “Linked” by Albert-Laszlo Barabasi, and “Six Degrees” by Duncan Watts. Barabasi maintains a website with candid photographs of many of the great researchers in this space. Dan also pointed me towards some podcasts that he’s made available on his blog – paricularly a dialogue with Konstantin Guericke from LinkedIn – now three years old and probably one of the only pure-play social websites to actually turn a profit.

Although Barabasi and Watts are skilled numerical scientists there is an interesting lack of any mathematics in both texts – probably due to their publishers not wanting to frighten away too many potential readers! Math here

Wednesday, June 07, 2006

VC Social Networks II

These two images are examples of the types of maps available at LinkSV; as I described in an earlier post. I originally planned to attach them to the earlier post but for some reason it looks like the Blogger DB is going through some growing pains today and may be having a few problems!

The maps give an overview of Global Catalyst Partners -

LinkSV - very cool! Check it out.

Social Networks in Silicon Valley

In “The Silicon Valley Edge – A habitat for innovation and entrepreneurship” (Stanford University Press) there’s a great chapter – Social Networks in the Valley – by Castilla, Hwang, Granovetter and Granovetter. The regional economy of the peninsula between San Francisco and San Jose has proven to be a rich source of SNA investigation due in part to its spectacular growth and the dense interconnectedness of its business structures, but also because various writers and trade organizations have already attempted to track the genealogy of company founders and their boards. Of particular practical interest in these efforts is the role played by venture capitalists - widely viewed as the financial engine of the area.

One juicy quotation from the book – “Born in New York, nurtured in Boston, and almost smothered in Washington, venture capital did not really come of age until it moved to California and joined forces with the brash young technologists who were using bits of silicon to create an information revolution as profound as the industrial revolution a century earlier” (Wilson, 1985) Indeed, by most estimates half the venture capital firms in the United States are located here.

One might imagine therefore that an up-to-date Social Network Map of venture firms would be of particular value both in terms of understanding trends at a high level and to simply get plugged-in! With these ideas in mind I was astonished to find what must be one of the best kept secrets in this area – LinkSV.

LinkSV tracks companies, their founders, and their VCs and displays the relationships between them as a series of navigable maps. While the maps are created through a strategic relationship with GroupScope LinkSV can also connect to people via another key relationship with LinkedIn.

Friday, June 02, 2006

Trendwatch SNA "Profit potential laid bare in e-mail links"

Mark Newman from Morphix pointed me towards an article in The Times

Leaving aside for a moment the peculiar tone of this article (mostly stick with few carrots) it does point out several advantages for email as a basis for SNA/ONA.

E-mail is objective. Using email as a data source provides a reliable and statistically significant estimate of the way information is flowing through an organization. Moreover it avoids the biggest problems associated with traditional interview methods where results tend to reflect a "relationship" network rather than a "communication" network. Although both are important it is ultimately collaboration, communication, and information re-use that creates value.
E-mail is topic specific. Email creates real opportunities to isolate communication networks based on specific topics - for example, by product or activity. How often do we communicate between departments when we introduce a new product? Who are the main contributors with respect to the creation of the annual report?
Results are almost instantaneous. Here the advantage of email-based SNA is as a diagnostic. Imagine your group has just merged with another organization. Take a snap shot of the network during the first few weeks of collaboration, make some organizational changes, as seem necessary, and reexamine the network again in a few months time. This way you'll be able to gauge the relative level of improvement and look for areas for further improvement.

I expect to see a lot of email-based offerings enter this space in the near future.

Sunday, May 28, 2006

Social Networks and Memorial Day

If you have a the day off work and you'd like to catch up on some reading you should check out these materials very generously posted by the University of Melbourne. here

Tuesday, May 16, 2006

NSA and Social Networks

I swore to myself I wouldn't say anything about this...

There's been a lot of flap about NSA and the phone companies recently (USA Today) and a couple of counterarguments have emerged. These are mainly along the lines that either the whole technique is just plain goofy and produces unreliable results or there are clearly much better approaches to uncovering clandestine networks (based on message contents). See Jonathan David Farley in NY Times, and Jeff Jonas here

Still, the initial revelations that some sort of program even existed got so many people upset that you'd have to wonder if there's a non-obvious nugget we are missing. The small number of folks I've met from NSA impressed me as being high on the sharpest-knife-in-the-drawer scale and not given irrational analysis.

While links between phone numbers alone may not be of much use there are other pieces of information within phone records that could be useful. As a minimum, phone logs have directionality built into them, there's also the duration of the call and the timeframe of related calls. For example, if a call from A to B habitually causes B to immediately make a series of short calls to C/C'/C" that might be a pretty interesting pattern to be aware of. (I'm suggesting that B is acting as a "cutout"). After looking at my own phone bill for a few hours I cannot see anything of significance in this context - folks call me about as often as I call them and the calls are of widely different durations.

All I'm getting at is that with sufficient processing power it should be possible to tease quite useful intelligence out of a seemingly very large and random haystack.

Added note - The New Yorker 05.22 here

05.28 Freedom of Information hah! here

06.02 Federal judge allows lawsuit against NSA here

06.06 Can Data Mining catch Terrorists? here

06.09 Pentagon sets its sites on social networking websites here

Friday, May 12, 2006

The Value of Enterprise Social Networks

How do we calculate the value of a social network? How will we know if we are increasing (decreasing!) the value of a network? Answers to these questions are important because they ultimately determine if companies will be willing to make investments in social network tools and applications. Here’s a condensed version of where we are today – references below – fasten your seatbelts and trays in the upright position...!

There are four types of social network: loosely connected, tightly internally connected but without strong external ties, strong external connections but with loose internal ties, and finally a balanced network with strong internal and external ties. It is easy to think of these in terms of archetypes – the real estate office comprised of independent agents, an Amish or Mennonite cooperative, a screen writer’s guild, and a contract manufacturer respectively. Numerous studies conclude that groups that have a well developed internal structure as well as ties to other external communities will, on average, be more competitive, successful, creative, and have higher incomes, and adapt better to unexpected challenges.

The primary purpose of an enterprise social network is to save its participants time. The advantages of networks are (at least) twofold: they enable internal standards of compliance and trust to be quickly established for members of the group, and to rapidly connect with external sources of critical information whenever needed. Indeed, “There is an impressive diversity of empirical evidence showing that social capital is more a function of brokerage across structural holes (connections to other networks) than closure (cooperation) within a network.”

One example of this that very much sticks in my mind was the development of the astronauts’ gloves for the Apollo program. On one hand you had a team of engineers that knew a lot about the specifications that would need to be met but without much real experience making gloves. On the other hand you had a team of women who had worked all their lives sewing baseball and ice-hockey mitts. Clearly a great deal of new value was created in the ties that linked these two very disparate networks.

Therefore, in terms of a business value calculation there are three parameters that need to be estimated: the inherent value of information resources within an organization (and how optimally this information is made available for use), the relative number of connections the organization has to external agents (customers, suppliers, professional organizations, investors, even competitors), and the nominal value of project time (how much would it be worth to shave a month from a new product introduction).

I have worked up some interactive models to capture and validate these parameters – contact me for more information.

The Network Structure of Social Capital, © Ronald S. Burt, May 2000
So Many Ties, So Little Time, © Hansen, Podolny, Pfeffer
Cool Vendors in Social Network Analysis, ref Gartner
Email as Spectroscopy, Tyler, Wilkinson, Huberman, HP Labs

Thursday, April 27, 2006

Social Networks - Books You Must Read

Pretty frequently I get asked (usually by my wife) - "where do you come up with this stuff...can't you watch Deal-No Deal like a normal person?"

While there's a lot of great material on Social Networks and Social Network Analysis available there are a few books that seem to be referenced by others over and over again. Three that I have found pretty insightful are shown: Social Network Analysis is a great nuts-and-bolts introduction to SNA graphs; how they are constructed and what they mean. The Wisdom of Crowds is an excellent piece of writing (I wish I could write like that!) that paints a picture of how people behave in groups. Finally there's Rob Cross's The Hidden Power of Social Networks which has recently done more than any other to bring SNA into public conciousness.

For desert there's Nexus - another well written page-turner with a great introduction to the "small world" effect.

I'll appreciate any and all feedback on this selection.

Social Networks Analysis in the Enterprise

The weekend - so some longish blather... if you'd like to hear more screaming than the night the orphanage burned down start with Nick Carr's most recent post

To be honest I'm not sure what all the screaming is about - although I appreciate Carr's tough questions! A body of information (e.g Wikipedia) is not homogenous - a great proportion of the contributions will be commonplace. Similarly, so called numbskulls (ref Carr's choice of language) can contribute much of the mundane - and even some of the arcane. At the edges, without question, there is a need for highly qualified experts to define the most complex subjects. Basically the world looks like this (diag.)-

At last count Wikipedia had 50,000 contributors responsible for 2.9 million entries, 890,000 in English - however, only 2,081 had contributed 100+ articles. This is being interpreted as maybe bad news if the Wikipedia model is what is to be expected in corporations. But, guess what, this is not much different than you might expect. Let's say the average "habitual" contributor is responsible for about 100 articles - a total of 208,100 articles. This is approximately 23% of the 890,000 total - inline with the above guess of about 25%. A small number of contributors are doing a lot of work but apparently the system overall depends on numbskulls for 70%+ of contributions.

It goes without much saying that we all look forward to a time when online communities will be more like our networks in the real world. This vision may never be fully realized but it seems clear that the next stage, the next proof point, in the development of social networks will be in the context of corporations -so called Web 2.0 for the enterprise. This begs some tough questions: can Social Networks really produce tangible improvements in resource conservation, productivity, or competitive advantage? How will this work and, for a generation imbued with MySpace, Yelp, Digg, del.icio.us, Wikipedia etc., what will it look like?

Predictions -

1. Starting Now. We are currently navigating the “Trough of Disillusionment” of the hype cycle for social networks, corporate wikis and blogs, and we are beginning to see real corporate traction – the next 12 months will be both interesting and exciting. We all still have a lot of work to do but there is light at the end of the tunnel! Corporations are already intrigued by the possibilities of Social Networks and related tools in the context of disruptive events – the introduction of new products, for example, or bringing new facilities online, mergers and acquisitions, and disaster preparedness. Using Social Networks for mentoring and expertise sharing for complex products is available only in a few places although this seems to an extraordinarily valuable application area.

2. Solutions not Components. Companies need solutions that integrate with their existing business processes and that carefully take into consideration any new risks that they may create. New ways of capturing content will be valuable only in so far as they tangibly enable downstream processes. Almost certainly the Legal department will want to be involved – if blogs are to be shared with external readers there must be publication approval processes and clear rules of engagement; wikis must be devoid of personal references to other employees etc. In short, by building features that offer security and conform to well established business standards it becomes more probably that social networking tools will be accepted.

3. A little top-down will go a long way. For god’s sake never forego an opportunity to convince someone in the executive suite to participate in a wiki or start a blog! It’s true that a great many executives are unimaginative, clueless, or risk averse but this just means you have to step up your game. Today only about 4% of the Fortune 500 support a corporate blog but some of them are quite good (Ford Motors) and the field is growing. Social networks are inherently organic but, from my own direct experience, there is still a lot to be said for visionary leadership. My guess is that 10-15% of F500 with corporate blogs will be a critical turning point.

4. Education. Having great ideas that nobody knows about is not very useful. There is a huge open hole where educational books and papers on social networks in business should be. When we begin to see really useful text in this are – for the love of all that’s holy – written by non-academics, it will be time to fasten your spacesuit!

5. Show me the money. Business people are actually fairly easy to understand. Every strategic decision ultimately boils down to: will deploying this social networking tool cost me more (or less) than I can expect to get as a return; how risky is deploying this tool to my personal reputation and that of my company relative to the nominal reward? Social networking tools for the enterprise should focus on building communities within companies, making them more productive, agile, and competitive. When we can demonstrate how to get from here to there the warmth of good fortune will fill all our futures… more on this in a few.

Friday, April 21, 2006

Constraints on Group Size in Social Networks

How many people do you know? OK, not Brad and Angelina - but really KNOW - interact with directly and be influenced by? How many people can you know? After all we are all constrained by time - so there would seem to be limits to how many people we can interact with socially on a productive and regular basis.
Imagine living in a small village or being part of an autonomous tribe where each member has a role to play in achieving group objectives (the butcher, the baker, the candlestick maker; tinker, tailor, soldier, sailor). In this hypothetical village there are five sets of grandparents (10 people), each has given rise to three married adults, and these in turn have three children of their own. This modest group is 85 individuals. This five-family village seems intuitively about as small as a community can get and still be viable (I am not offering any evidence for this just yet - I hope you'll agree that it is at least plausible). If we expand the model to a core group of nine grandparents, with the same average number of married offspring and children, it will lead to a village with about 153 members, and so on as shown.

The relationship between the number of families in the community (with kinship to the core grandparents) and the amount of time available for social interaction on a daily basis is as follows.

What matters here is the overall shape of this curve - as the community size increases the time available for building bonds of cohesion, and social grooming between community members, decreases dramatically. Up to five families there is so much time available that it may seem like the community is truly just "one big family". But, at some point (around 9 families) there is a subtle transition where individuals will have to consciously/unconsciously decide to socially groom and bond-with members of a sub-group (their kin) rather than the community at large.

Surprisingly (to me it was VERY surprising) this line of thinking is supported by a wide selection of examples from the real world. A lot of this has been captured in a beautiful paper by Robin Dunbar from University College London (google him).

Reference - Robin Dunbar "Neocortex size as a constraint on group size in primates"

One quote from this paper -
"...the reason given by the Hutterites for limiting their communities to 150 is particularly illuminating. They explicitly state that when the number of individuals is much larger than this, it becomes difficult to control their behavior by peer pressure alone. Rather than create a police force they prefer to split the community. Forge (1972) came to a similar conclusion on the basis of an analysis of settlement size and structure among contemporary New Guinea "neolithic" cultivators. He argued that the figure of 150 was a key threshold in community size in these societies. When communities exceed this size basic relationships of kinship and affinity were insufficient to maintainsocial cohesion; stability could then be maintained only if formal structures developed which defined specific roles within the group. In other words, large communities are invariably hierarchically structured in some way, whereas small communities are not."

Pretty clearly this presents opportunities for modern Social Networking tools to intelligently identify, strengthen and support working teams and communities of practice.

Saturday, April 15, 2006

Measuring Social Networks

How to calculate the number of connections in a Social Network.

A social network is presented graphically as nodes (points) and connecting lines. The nodes represent people and the lines represent the existence (or absence) of a relationship between them. Let's assume for a moment that the relationships are of equal value in both directions - that is, node A interacts with node B at the same level of intensity as B interacts with A (obviously this will not always be the case, but let's accept this simplification for now). In this case the total number of possible connections in a fully saturated network is given by the formula shown.

For example: with five nodes there are 5x4 = 20 possible interactions, therefore 20/2 = 10 connections.

Rather obviously, and in agreement with common experience, the possible number of connections rises much more rapidly than the rate of increase in the number of nodes. As the community you live in gets larger the more difficult it becomes to objectively evaluate the relationships between othere members. The graph gives a sense of the scale of the problem - for a small company of 100 people there are almost 5,000 possible social connections. It seems clearly impossible for an individual employee to have any objective sense of how information is REALLY flowing through this social network (without some appropriate analysis tools).

How to calculate Network Density.

An important metric for social networks in the real world is their density - how well connected are the nodes in a real network relative to the theoretical number of connections possible. This measurement is intended to give a sense of how well communication pathways in the network are capable of getting information out to the network's participants. The calculation is straightforward - known connections divided by maximum possible connections. (an ideal, fully connected network would have a density of 1.00)

For example: the network in the graphic has 13 nodes and 17 known social connections. But with 13 nodes there are 78 possible connections. This means the density of the network is 17/78 = 0.22 (in a real-world network this calculation is usually completed after discounting any unconnected nodes).

Common experience confirms that the density of networks goes down as the size of the group increases. Research suggests that above a few hundred nodes the maximum density will never be greater than 0.5. Network density is intended only as a rough guide to connectedness it is not a hard-and-fast indicator of group performance - it needs to be interpreted intelligently.

However, these initially simple ideas begin to open doorways to interesting questions - what inferences can be drawn from the structural shape of social networks and what would be an optimal size for a functional team/community?

My next post will discuss metrics associated with optimal community sizes.

Saturday, March 04, 2006

Connectedness in Social Networks

Over the past few years there are signs of widely agreed model for how information can best be connected, managed, and displayed. The umbrella name for this effort is the Semantic Web. There's an original Roadmap document by Tim Berners-Lee here, and a more recent Introduction here.
After many years in the wilderness there are now a few commercial products that show the power of this approach. I am shamelessly prepared to state that the company I am currently working for, Entopia, offers best-of-breed solutions in this space. It is very gratifying to hear prospective customers ask us if our demonstrations are "real" - because the results truly seem like magic! (see "Any sufficiently advanced technology is indistinguishable from magic")

Which reminds me - now that I've seen the semantic web I really hate keywords... More on this below.

Friday, March 03, 2006

Love and Hate on the Web

You may have wondered - which is stronger "love" or "hate". At least in terms of Search engines the jury is very much IN and the results are clear - L-O-V-E triumphs by a significant margin. If you'd like to try this for yourself you can save some time (perhaps as much a 0.5 seconds) if you search both Google and Yahoo simultaneously using the new Frankestein site GahooYoogle.

The bottom-line here (please God let there be some point to this rant) is that "keywords" can be pretty darned misleading when used to navigate large document set. Let's be honest here - a BILLION results! Really what's the point? Even if you were so inclined you'd never get through them in a lifetime.

Even more upsetting though is that within the top-10 results for "hate" we find "Stop the Hate" which, under the premise of our test, should surely be in the "love" column. There's got to be a better way - and there is! It's called the semantic web.

Don't settle for half measures - let's make these great information resources add up to something intelligent other than just a massive stack of keyword results.

Sunday, February 19, 2006

KM (Knowledge Management) KO'd

Or maybe this should be called whistling past the graveyard - but first you need to read this

The basic question is: if KM is such a boon to humanity why, after decades of blabbering and flailing about, hasn't it made more real progress? It's a fantastic question - right up there with "is the emperor wearing any clothes?" There will probably be more reactions over the coming days but if Denham Grey's lame response is typical I think we can call this war well and truly over.

The truth is this - tools, applications, and disciplines are adopted primarily on their ability to create value or produce otherwise tangible results. KM as a discipline hasn't delivered. In many ways KM wrote its own downfall when it granted itself the power to solve ALL problems and its practioners became all but razzle-dazzle snake oil salesmen.

My most earnest suggestion would be to see KM change its name back to the more sensible "Information Management" and that it focus on extending the reach and effectiveness of proven commercial applications.

Social Networks: It's a small world (after all)

What is a social network and how do social networks actually work? SNA, Social Network Analysis is the art of analyzing and, most importantly, representing the flow of information through an organization. It is an astonishingly interesting field mainly because the flow of documents and information through digital networks can be used, for the first time, to accurately describe the level of communication and collaboration between sub-groups and individuals within an organization.

The image above shows the collaboration network within an organization based on a single topic. Each node represents a person and the size of the node represents the relative strength of collaboration between this person and all others in the network. The lines show how strongly an individual collaborates with other individual players in the group. Nodes are colored based on the department in which they work. Imagine how powerful this type of information can be when an organization is going through a strategic change, introducing a new product or responding to a competitive threat.

Perhaps the most unexpected features form this kind of analysis is the observation that some nodes in the network will act as "short circuits" for the flow of information. These nodes are connected in ways that dramatically reduce time it would otherwise take to spread a message throughout an organization by following the hierarchical channels of communication. They make it a small world!

Reference stuff -

http://www.orgnet.com/email.html

The Hidden Power of Social Networks, by Robert L Cross, also see http://www.robcross.org/sna.htm
Social Network Analysis: a handbook, by John P Scott
Email as Spectroscopy - a first rate paper from HP

Friday, February 17, 2006

The Context of Information

I earlier described the differences between ontologies and taxonomies and their importance when it comes to organizing the things we know, or would like to know. Now comes an interesting question (the first of several) - once you have settled on an area of interest, sailing ships and sealing wax for example, how will you learn the connections between the documents and sources that you are likely to run across? I imagine most of us have experienced this type of problem at one time or another - let's say you'd like to know more about bronze statues. You mention this to a friend and almost before you stop speaking you've been handed a thundering fat textbook on "metallurgy through the ages". Like, Oh! Joy, actually I was looking for something with a little more art.
On its face there is no good way to know if this is a great first step or a rat trap of confusion. Ideally what you'd really like to know is: how does this particular text stack-up against similar texts in roughly the same area and, perhaps, how well is the author known and revered by others with similar interests to you own.

Librarians have fortunately crafted a solution to this type of problem - it's called a citation index. It works quite well for scientific journals. Basically, someone goes to a LOT of trouble to methodically count how often an article in a scientific journal has been referenced by other authors. You can usually also find a raw list of publications by an authors name. Taken together this information can be used to select what seems to be the most useful/insightful articles from a stack of similar-looking papers. In fact, these scores have often been pretty good predictors of who will win Nobel prizes.

Unfortunately an index of this type is usually not available when one is presented with documents from the non-scientific world - take websites for example. Without any prior knowledge of the contents of a site is there any way to get a grip on potentially useful relationships between documents stored there? And, can these relationships be presented on-the-fly? With the great power of computers and modern semantics the answer if a resounding YES.

The graphic above shows the relationships between the key concepts related to "coffee" presented on the Kraft website. The engine has clearly identified a relationship between "coffee" and "roselius" - like what-da-f is roselius; Dr. Roselius invented the modern decaffeinating process.

Don't try this on Google where keyword search can't tell the difference between "articles by George Bush" and "articles about George Bush" ... just because they are rich doesn't mean they aren't low class!

Sunday, February 12, 2006

Getting Organized, Ontologies and Taxonomies

It's the blooming middle of February and my New Year resolutions are already a fading memory! One was to get organized - not just neaten-up the same old crap but methodically get organized (Travis Bickle "organizized") - no more loose ends; a place for everything and everything in its place!
'Course right out of the gate I have a problem, two problems actually and their names are ontology and Taxonomy! The ontology for organized living means defining the classes or categories into which your "stuff" is divided. If an object meets the definition of the category it can be managed in the same way as other members of that group. Let's take for example - my socks. By definition socks are those objects, in pairs, that fit on my feet, inside my shoes. From now on whenever I find socks lying about I know that they are managed by being placed in the sock drawer.
The next step is to think about how socks should best be arranged within their category - how should they be arranged within the sock drawer. This is the Taxonomy for my socks. Perhaps I will divide them by color, or by age, or by the date I purchased them...it can all be so complicated!
At least one point worth remembering is that there is NOT necessarily a one-to-one relationship between an ontology and a Taxonomy. In fact, there may be many Taxonomies associated with each ontology.
Clearly, this seems like a pointless narrative but it actually has some serious consequences. Arranging any type of information so that others can find and use it often hinges on defining the categories into which it will be divided - should a library contain only books? What about hardback versus paperback? Once we agree on the forms of the information we intend to manage we then must consider how it should be navigated; fiction versus non-fiction; classics versus young adult, etc. Here the key is to have a structure that seems universally "intuitive" but is also not ambiguous.
Next time I need to think finding information when intuitive signposts are not available.