Using the Digg API, I grabbed the top 10 most-dugg stories of the day (by midnight) for the past year - May 24, 2007 to May 23, 2008. I then rendered a series of tree-ring-like visualizations (moving outwards in time). Rings are colored according to Digg's eight top-level categorizations (see key at bottom of page). Ring thickness is linearly proportional to the number of diggs the story received. I also made a pair of visualizations using Digg's entire archive, which goes back to December 1, 2004.
This series uses the same bigram dataset as the word spectrum visualization. To eliminate occlusion, I developed an entirely different layout. Now, instead of a continuous spectrum of words, words are bucketed into one of 25 different rays. Each of these represent a different tendency of use (ranging from 0 to 100% in 4% intervals). There is a nice visual analogy at play - the "lean" of each ray represents the strength of the tendency towards one of the two terms.
Using Google's enormous bigram dataset, I produced a series of visualizations that explore word associations. Each visualization pits two primary terms against each other, for example war vs. peace, heaven vs. hell, and poor vs. rich. Words prefixed by these terms are rendered by the thousands into a dense spectrum, each positioned according to their frequency of use.
Back in late 2006, Google released a massive set of web n-gram data (basically pieces of sentences). The entire archive, which is almost 100GB uncompressed, contains usage frequencies for unigrams (n=1) through fivegrams (n=5). For this set of visualizations, I used trigrams.
A blog post on Dolores Labs led me to resurrect an old data set of color names Stacey and I had collected in February of 2007. Using more than sixteen thousand colors labeled by people online, I created a series of visualizations, my favorite of which I call Color Flower.
Aaron Swartz crawled Amazon, extracting data on 735,323 books. This included more than ten million links (edges) between books that Amazon noted as being related. I shoved this data into my old wikiviz graph layout engine for the better part of a week, and rendered the resulting spatial output like a huge mosaic of book covers.
Christoph Römhild sent me his interesting biblical cross-references data set. This lead to the first of three visualizations. Intrigued by the complexity of the Bible, I derived a new data set by parsing the King James Bible and extracting people and places. One of the resulting visualizations is a biblical social network. The other visualization shows how people and places are distributed throughout the text.
Search engines are increasingly featuring Wikipedia in their search results. This is causing people to surf onto Wikipedia not only for informational purposes, but also entertainment and news. This unique effect allows Wikipedia to be used like the internet's "pulse." This visualization displays ten months of visit frequency data for Wikipedia's top 50 articles; August 2006 to May 2007.
Using data provided by the Dimes Project, I produced a series of renderings that display how the Internet's routers are connected geographically. Almost 90,000 connections between cities all over the globe are shown.
This visualization shows the structure of three levels of Wikipedia category pages and their interconnections. Links between category pages are illustrated by edges. Nodes are clustered such that edge lengths are minimized. This forces highly connected groups of pages to clump together, essentially forming topical groups. The resulting structure of information is revealing about where fields intersect and diverge, and ultimately about how humans organize information.
Despite the Internet being a global network, the US has traditionally dominated. This is in part due to the prevalence of American web surfers. However, the US market has become saturated. Developing nations are spawning the next generation of web surfers, where a combination of improved urban economy and falling telecommunication costs has made internet cafes on every corner and even connections at home possible.
I was curious about how people used the internet. Specifically, I wanted to see how internet behavior changed over the course of a day. Search engines are the gateway to the internet for most people, and so search queries provide insight into what people are doing and thinking. In order to examine millions of search queries, I built a simple, cyclical, clock-like visualization that displays the top search terms over a 24-hour period.
The Royal Society recently provided access to an archive of papers published in the scientific academy's prestigious journal. Some 25 thousand scholarly works date from 1665 to the present day. Many notable scientific advancements are included in the archive, including, for example, Watson and Crick's discovery of DNA. This interesting data set was ripe for some visual tinkering.
Wikipedia is an interesting dataset for visualization. As an encyclopedia, it's articles span millions of topics. Being a human edited entity, connections between topics are diverse, interesting, and sometimes perplexing - five hops takes you from subatomic particles to Snoop Dog. Wikipedia is revealing in how humans organize data and how interconnected seemingly unrelated topics can be.