Clay Shirky’s Talk at Long Now Foundation

Clay Shirky gave a talk at the Long Now Foundation last Monday on “Making Digital Durable”. If you read Clay’s essays, most of this won’t be new, but it was nice to hear him pull several threads together.

Things that jumped out at me

  • “Classes of errors unrelated to the mode of production.”
  • “Who can categorize?” Everyone, at least everyone you care about.
  • Tagging is an ongoing operation: not something that happens in the cataloging department once and for all time

I missed the first 15 minutes of the talk because I was coming up from Cupertino to Fort Mason.

Classification and it’s Discontents

  • 1000 to 10000 items in a kitchen
  • not everything is labeled however
  • items hard to ’see inside’ are labeled since a can of tomatoes weights the same as can of chickpeas

Seeing ‘inside the can’ is magnified in the library

  • classification systems roll up
  • how do systems adapt
  • 200 Dewey Religion

    • fine grained for Christianity,
    • but everything else is shoved in 290
    • Seattle’s library directly reflects the dewey classification system it’s a continuous ramp.
  • Library of Congress, a bigger namespace

    • Balkans, Asia, and Africa are given equal ‘weight’ in the scheme
    • Not designed to be biased
    • Design was an optimization for the number of books on each area
    • History gotcha: category DK still covers everything in the former Soviet Union
    • Re-shelving costs prohibit exploding the category.
  • How do you history-proof this?

Books aren’t inspect-able, you need labels.

  • Yahoo

    • Originally a list of links
    • Then they needed lists of lists soon after.
    • Hired a staff ontology.
    • Pointers: under entertainment, books and literature are a pointer to a node in the tree under humanities
    • They still needed to add the shelf back in.
  • Google

    • Dispense with the shelf
    • Look at what points at what.
    • Only the links are what’s ‘real’.
    • They bought DMOZ, the open source version of Yahoo.

What Has Been Lost

  • What is a fertility symbol?

    • Venus of Willendorf
    • Is it a ‘magical object’ or just porn?
    • We can’t read it.
  • Several examples of things we don’t ‘read’ any more.

    • Ancient writing and calculating systems (Rongo Rongo, Linear A, etc.)
    • Hieroglyphs were almost lost as a written language until we found the Rosetta Stone
    • Three different scripts: common Egyptian, Hieroglyphs, Greek
  • Degeneracy

    • More than one way to do things.
    • If you lose one
    • Christopher Alexander: the city is not a tree, on city planning
    • Cities are degenerate in the sense they have overlap.
    • The world’s non-convex.
  • A question of economics: is the money spent on classification systems worth the money?

    • you current system may be a future person’s rosetta stone
  • Flickr

    • something happens, I go look for it on Flickr
    • type in “mermaid parade”
    • thousands of photos, hundreds of photographers
    • everyone tags photos with “mermaidparade”
    • no coordination, no ontologies, no hierarchies
    • relations and clusters allow you to determine the parade’s on Coney Island in Brooklyn
  • oh and del.icio.us too

    • “linksys router”
    • “making a paper airplane”
    • “CSS vista”
    • different distribution of tags — some things have consensus others float at the interaction
  • Oh hell, RDF

    User asserts Tag describes Photo

    User asserts Tag describes Website

Information Architecture is Social Architecture

  • tagging systems exist in a flat namespace

    • no sense of hierarchy
    • take three random LJ users
    • hierarchy is a second order effect of tagging
  • tag clouds over time example

      • Social Quakes: communities of practice
    • JJG’s article on Ajax grows a tag cloud asserting it’s about “AJAX”

Clay’s Questions

  • how can tagging identify communities of practice
  • how should we handle the thesaurus problem
    • you have to get off the ‘thesaurus bus’ (gay politics is not “gay agenda”)
  • can we apply this to navigation
    • the VP wants a link
  • what, if anything, should we do about popularity risk
    • overwhelming other voices
  • can we detect “concept rot”
    • the “Ajax” tag adoption curve
    • things that die start to stink
  • what can we do about spam
    • we will face a well-funded and

Q&A

  • Attention tracking — when people stop tagging
  • Latent Semantic Analysis augmented with intelligence: Mechanical Turk
  • Links age and die
    • help find links that are broken and save them from the caches (Archive.org, Google)
    • RSS feeds are a latent resource for preserving content (on all those copies of NNW and FeedDemon)
    • what’s the germ line?
  • The whole distribution matters
    • the top five tags have the social weight
    • the rest drive the ecosystem
    • internal shelves: noise or other ‘communities of practice’
  • how does tagging deal with factions
  • how does tagging deal with spam
    • edit wars — that thesaurus problem
    • bump up the relative frequencies of the top five tags
    • return of metatag spam
    • watching obscure tags
    • friends of friends tag clouds?
  • on Wikipedia
    • classification systems aren’t as important
    • tagging is the first great post search interface
  • associative clustering is how biological memory works, is the web thinking (Kevin Kelly)
    • we don’t know how we think
    • it’s more of a tool than a brain
  • anything from history that would had predicted the importance of tagging?
    • we knew that hierarchical systems were brittle
    • usenet: rec.pets.cats — attractors for other things, including flamewars as antagonists have the usenet subject (cats, SF) in common (Cynthia’s LMB list)
  • how do we forget things we don’t ‘remembered’?
    • don’t want a global delete button
    • Stewart Brand’s “You Own Your Own Words” policy caused a storm on Well
    • But you can’t take back a public discussion, other people heard it and may not want to forget it
    • don’t want to accidently lose data either (I thought you were blogging this?)
    • Stewart relates the experience of the “delete everything I said” button
      • also happens on LJ
    • DRM makes things hard to remember (don’t have the magic software/hardware dongles)
    • Conversations are downloaded
  • how to add stink to software?
    • institutional fallbacks
    • a golden month to find in global and local caches
    • resource allocation
  • storage is free, what’s the cost of preservation?
    • falling storage cost increases the problem
    • there’s more
    • real options theory, how much to pay to postpone a decision
    • the 90 year window after which, stuff becomes interesting to us — if storage costs are low, easier to keep stuff over that bridge
    • the tag cloud also makes it easier to find the old stuff and the time series is of interest to itself
  • digital isn’t durable yet, when is it a solved problem?
    • it’s a wicked problem
    • only local solutions
    • always a social layer
    • a fork b/w open and closed culture — Times Direct
    • attack vectors for opinion: Wingnut Daily is free, Times Direct isn’t. Guess what’s linked.

More like this: , , , , .