Category Archives: Digital Humanities

Looking through the ULAN with Gephi, III

In my last two posts on using Gephi to read the Getty Union List of Artist Names, I have been straightforwardly visualizing the relationships explicitly described by the dataset. This approach was a natural first step. The ULAN’s structure is artist-centric; creating a list of nodes meant reading the list from top to bottom, creating a list of edges was essentially the same task.

However it was the ULAN’s geographic information that had initially interested me. Could we usefully combine a geographic visualization with a social network visualization?

A graph of geographic locations described in the ULAN, connected by artists' relationships. (visualization by Matthew Lincoln, underlying data © 2013 The J. Paul Getty Trust. All rights reserved.)

A graph of geographic locations described in the ULAN, connected by artists’ relationships. (visualization by Matthew Lincoln, underlying data © 2013 The J. Paul Getty Trust. All rights reserved.)

I rewrote my Ruby scripts to create a node list from the many geographic locations mentioned in the database, and to create an edge between cities where a relationship is described between artists inhabiting those locations.

This meant reading against the grain of the ULAN. Its hierarchy is artist-, not location-centric. While each artist entry in the ULAN has all the info you need to define an edge to another artist, this is not true when defining edges between locations. Instead of reading through the ULAN from start to end, the computer would have to skip around the data out of order, climbing up and down the XML hierarchy to find names and addresses of its artists every time it defined a new edge between nodes. To ease this process, I asked Ruby to re-copy the ULAN into a new, slimmed-down format (a series of nested hashes) that the computer could query much, much faster than slogging through the original XML file.

The resulting graph looks a bit sparse. Like my earlier graphs, this is the result of using only a tiny subset of the full ULAN. It also only represents connections between two artists who each have full geographic location, which filters out many of the entries from this subset of the database. But it offers some exciting prospects for a full-fledged geo-graph of the ULAN. Because every artist comes with life dates, we can associate every edge in the graph with a time range, which lets us easily filter our view of the graph. With some more refining, it would be possible to animate the evolving continental (or global) links between artists over the centuries, fleshing out, or even problematizing, our current narratives of artistic communication.

However I felt not a small amount of pride after putting this all together. The initial graphs are exciting and useful, but they are very literal visualizations of the ULAN – i.e., low-hanging fruit. Reading geographically, on the other hand, pushes against the structure of the original dataset. It is a testament to the folks at the Getty that the ULAN is so consistently structured that it is possible to extract a visualization like this, even when it does not follow the dominant grain of its source database.

Looking through the ULAN with Gephi, II

In a recent guest post on the Six Degrees of Francis Bacon project, Shawn Moore discusses the curious networks of Maria Cavendish. Moore writes:

In many ways, her networks are non-traditional in that they often exist outside of and beyond Cavendish herself…

Of significant interest is that the DNB data shows how important extra-personal (looking for a word that indicates connections beyond the intra-personal) connections are for the network Cavendish constructs, and to the networks that are constructed around the reception and reputation of “Margaret Cavendish,” thus exposing an important structure in the sociable practices at play during the period.

In other words, it seems that understanding the reception of a figure’s work through network graphs requires surveying more than the immediate neighborhood, or ego network, of figures just one edge away from Margaret Cavendish (or Rembrandt van Rijn, for that matter.) It is, of course, the relationships between these first, second, or third-level nodes that actually constitute the public conversation about the root author or artist, the conversation we are so eager to better understand.

It was after I dove into Gephi last week that I found Scott Weingart’s excellent overview of network analysis for humanitsts.1 After a cogent introduction to the basics, Weingart offers some pointed warnings to humanists about creating multimodal networks – that is, networks with different classes of nodes (e.g. artists and organizations, illustrated below.) These require their own analytical and layout tools.

A graph of the Black Mountain College network, with both artist and organization nodes. (visualization by Matthew Lincoln, underlying data © 2013 The J. Paul Getty Trust. All rights reserved.)

A graph of the Black Mountain College network, with both artist and organization nodes. (visualization by Matthew Lincoln, underlying data © 2013 The J. Paul Getty Trust. All rights reserved.)

Although for my own purposes I am content to filter out organizations from the ULAN dataset, I also want to take advantage of the rich variations of relationships it describes. This means devising some scheme for weighting relationship types by their attribute (“master of” assigned a weight of 10, for example, while “collaborator with” gets a weight of 5).2

One would have to customize such a scheme depending on what type of influence you were interested in visualizing; interpretation and all its accompanying biases will be layered on fast and thick. I predict this will be one of the biggest hurdles to overcome as we move forward with this project.

  1. Scott B. Weingart, “Demystifying Networks, Parts I & II”, Journal of Digital Humanities 1, no. 1 (Winter 2011) (URL)
  2. I am curious how the next version of Gephi (0.9), which will implement multigraph support, might aid this process.

Looking through the ULAN with Gephi, I

These past few weeks I have been sorting through the data available in the Getty’s Union List of Artist Names, learning how to parse its raw XML and create tables of relevant data.

I was initially interested in the ULAN for its geographic information, hoping to use it in visualizing on a large scale the evolving connections between Netherlandish artists and the broader European, and global, community in the sixteenth and seventeenth centuries. However, I was largely disappointed by the ULAN’s sparse geographic and chronological information. Most artists have only one, maybe two entries for their active locations; for example, poor Peter Paul Rubens, a continent-skipping artist, courtier, and diplomat/spy, is attached only to Antwerp.

The ULAN does, however, richly characterize artistic, professional, and familial relationships between its many entries. After meeting with Abram this week to learn more about his social network map of Benjamin West’s studio generated using Gephi, I was inspired to try the program out on a “small” sample of the ULAN database made available for download by the Getty.

A force-directed graph generated from the association fields contained in a small sample of the Getty's ULAN (visualization by Matthew Lincoln, underlying data © 2013 The J. Paul Getty Trust. All rights reserved.)

A force-directed graph generated from the association fields contained in a small sample of the Getty’s ULAN (visualization by Matthew Lincoln, underlying data © 2013 The J. Paul Getty Trust. All rights reserved.)

There’s no denying that this nicely-styled network graph is pretty. But is it useful?

In the above graph, nodes are scaled by their degree (how many immediate connections they have to other nodes) and colored by their eigenvector centrality (a measure of their relative centrality to the network at large)1 Though the array of artists in the sample data is idiosyncratic (from Raphael to Hans Hoffman), it isn’t surprising to see big names like Jacques-Louis David appear, well, big due to their well-connectedness.

However looking at color in this graph is a bit more interesting. A node (like David) important to its immediate social neighborhood does not necessarily connect as many disparate groups as a node (like David’s close neighbor Jean-Baptiste Regnault) with greater eigenvector centrality.

What does this measure tell us about artists like Regnault? How should it inform the way we define and value relationships when thinking about art historical problems? The ULAN has a comprehensive vocabulary of association types, from professional associations like “teacher of” or “apprentice of”, to familial ones like “spouse of” or “child of”. On the scale of micro-art-history, we treat these relationships individually, valuing relationships sure to communicate stylistic influence over others. At this scale, our weighting can happen on a case-by-case basis; we can generate our narratives holistically.

But when trying to measure influence from a distant perspective, looking at dozens or hundreds of artists at the same time (something uniquely suited to a digital approach), we cannot make decisions case-by-case. We must instead define rules. These may be finely-tuned filters, but nonetheless they call on us to make generalizations. When I graphed the sample ULAN data, I established a filter that would only show artistic relationships, and would cut out familial ones.

A graph of Rembrandt's social network, including only artistic relationships. (visualization by Matthew Lincoln, underlying data © 2013 The J. Paul Getty Trust. All rights reserved.)

A graph of Rembrandt’s social network, including only artistic relationships. (visualization by Matthew Lincoln, underlying data © 2013 The J. Paul Getty Trust. All rights reserved.)

The corner of my graph showing Rembrandt reacted notably when I redrew the graph incorporating every type of relationship. Suddenly, Rembrandt was not only a central connector of individuals, but a connector of distinct communities that would otherwise not be attached.2

A graph of Rembrandt's social network, including artistic, professional, and familial relationships. (visualization by Matthew Lincoln, underlying data © 2013 The J. Paul Getty Trust. All rights reserved.)

A graph of Rembrandt’s social network, including artistic, professional, and familial relationships. (visualization by Matthew Lincoln, underlying data © 2013 The J. Paul Getty Trust. All rights reserved.)

Much of this change is caused by the addition of several more nodes when I allowed Gephi to graph the full range of relationships described by the ULAN. However, note that non-artistic interconnections also appeared between nodes that had no other connections besides Rembrandt in the first iteration of my graph. In cleaning my data, I inadvertently undervalued nodes like Rembrandt that actually connected integrated communities, not just disparate individuals.

I initially thought to myself that this visualization and analysis would require a lot of preparatory planning to establish correct filtering and weighting rules. But distant looking might rather demand that the researcher iterate several visualizations, not to approach some platonic ideal visualization, but instead to generate layers to be superimposed and stitched together in a convincing narrative, much like Tim Sherratt suggested. In other words, I now wonder if this process of distant looking might have its own kind of holistic process distinct from, yet related to the close looking model.

The practice of distant looking at historical networks will need to establish its own critical methods as digital humanities fields mature (look to the folks behind Six Degrees of Francis Bacon for some deep posts on this topic.) Again and again, I think we will find that the processes of the digital humanities have much in common with the alternately-maligned-and-enshrined traditional methodologies.

  1. For more on this measure, and on network mapping in general, see Robert A. and Mark Riddle, Introduction to Social Network Methods. (Riverside, CA, University of California, Riverside, 2005), ch. 10. (online version)
  2. Bear in mind I am speaking only of the attachments described by this small, un-representative sample set of the ULAN. The relationships of these communities would be much more fleshed out in the full version of the ULAN.

Tim Sherratt’s keynote address to the Digisam conference on Open Heritage Data in the Nordic Region (entitled “A map and some pins”) is helping me cope with some of my own frustratingly messy data:

One of the things I love about being a historian is that the more we focus in on the past the more complicated it gets. People don’t always do what we expect them to, and that’s both infuriating and wonderful.

Likewise, while we often have to clean up or “normalise” cultural heritage data in order to do things with it, we should value its intrinsic messiness as a reminder that it is shot through with history. Invested with the complexities of human experience it resists our attempts at reduction, and that too is both infuriating and wonderful.

The glories of messiness challenge the extractive metaphors that often characterise our use of digital data. We’re not merely digging or mining or drilling for oil, because each journey into the data offers new possibilities – our horizons are opened, because our categories refuse to be closed. These are journeys of enrichment, interpretation and creation, not extraction.

Open-Access Humanities Publishing

A recent graduate of our department is exploring digital dissemination of the core discovery in his dissertation. The thought process that led him to this discovery is uniquely well-suited to a digital visualization (a hobby-horse of mine) hosted on a personal website. He is torn, however, between the imperative to market his own work and the fear of “cheapening” his scholarship through digital publication, as well as the not-totally-unjustified fear of theft.

Newly-minted humanities Ph.Ds in the paranoiac job market may be right to fear resistance or disbelief of more retardataire senior scholars. But I’d argue a good number of those same scholars can be convinced of the available grant opportunities, not to mention the wide open research possibilities in digital art history, and the benefits of open-access distribution of scholarly work.

Ross Mounce recently posted a cogent argument for uploading preprints of scholarship before they are accepted for publishing, arguing that overcoming this cultural and psychological barrier results in your work being read and cited more. (hat tip: Digital Humanities Now) Ross notes:

I suspect, like in biology, this practice isn’t yet mainstream in the Arts & Humanities – perhaps just a matter of time before this cultural shift occurs… There is one important caveat to mention with respect to posting preprints – a small minority of conservative, traditional journals will not accept articles that have been posted online prior to submission.

Hercules Segers, Three Books, 1615-1630 - Rijksmuseum,, Amsterdam, Bruikleen van de Rijksacademie van Beeldende Kunsten (RP-P-H-OB-867)

Some very old school scholarship: Hercules Segers, Three Books, 1615-1630 – Rijksmuseum,, Amsterdam, Bruikleen van de Rijksacademie van Beeldende Kunsten (RP-P-H-OB-867)

According to according to the SHERPA/ROMEo survey, the Art Bulletin doesn’t formally support the archiving of pre- or post-print PDFs of articles. But a better entry point for digital scholarship may be nascent open-access, online-only, peer-reviewed journals like the Journal of the Historians of Netherlandish Art. I am curious if open-access humanities journals will, in these early days, trend towards specialized fields of study staked out by already-established scholarly societies, as these societies can bring their ready-made social network of peer-reviewers. The JHNA’s online platform is incredibly basic, but its articles have been a solid mix of both new and established scholars (in other words, they seem to have broken out of the vicious cycle of prestige-chasing). Their organizational commitment having already been made, I look forward to seeing the journal jump some more technical barriers next, such as adopting some good born-digital scholarship tools like Comment Press.

Scholars can also just circumvent this whole process and directly post their work online, critics be damned. If you are interested in getting your work in front of eyeballs, Abram Fox notes the online audience for one of his papers quickly exceeded the in-person audience at its national conference presentation.

In the same vein, I’ve just uploaded three of my projects on here: a more traditional scholarly paper on Emanuel de Witte, and two digital humanities projects on mapping an artist’s diary, and getting Google Earth to recognize pre-eighteenth-century dates.

Play space vs. work place

You should be able to tell from my posts on day 1 and day 2 of THATCamp Prime 2013 that I really enjoyed my time there. I came away with new contacts, fresh ideas, and some clearer signposts as to which skill paths I should pursue. That said, I also came away with tempered expectations for my next THATCamp experience a more critical outlook on the ebullient rhetoric surrounding the un-conference model. Take a look at #THATCamp on Twitter to get a taste of the bubbly enthusiasm.

A play space does not replace a work place

THATCamp prides itself as being a play space, a zone whose deliberate lack of structure enables creativity. The un-conference model THATCamp adopts is a powerful one: the session schedule is decided during breakfast, people are encouraged to walk in and out of sessions as they please, the participants are responsible for shaping their own experience and that of all the other attendees. The un-conference model addresses the well-rehearsed issues endemic to “traditional” conferences like CAA, RSA, or MLA.

This isn’t a magical formula for productivity, however. At my well-attended session, we had a very lively discussion of visualizing geographic and temporal information. Yet it was wide-ranging and unstructured almost to a fault, from my perspective. We touched briefly on many relevant issues, but I came away without any concrete tools to use for my project. Many people chorused their interest in the problem, but we were often at cross-purposes in hashing out guidelines for such visualizations:

  • We agreed on the truism that a good visualization cannot show everything at once, and so must be designed with different facets or discovery layers.
  • We could not agree if granular data access could be married to large-scale data visualization. I firmly believe it should be, in the interest of both transparency and utility, but I and others who advocated this encountered strong pushback.
  • We also could not agree how to classify different levels of uncertainty, particularly when it comes to data drawn from secondary sources (this disagreement may have been born of disciplinary differences, which I address below.)

This debate is in part due to the intractability of the question I posed, so I probably shouldn’t be overly surprised at the outcome. And simply coming to understand that there are difficult choices to be made in these kinds of visualizations is itself a productive result, of a kind. However the experience also clarified for me how important it is to complement the freeform play space of THATCamp with a more strictly-focused group to hash out the difficult, specific labor demanded by an individual project.

Multi-disciplinarity won’t necessarily offer answers to intra-disciplinary questions

The strength of this particular THATCamp’s multi-disciplinarity was, for me, its weakness as well. As I suggested above, there were some big methodological differences between the participants in our panel. It’s good to challenge the basic assumptions of your research, but it does take up a lot of time. I found myself having to explain why dates and locations derived from works of art are as legitimate as those found from text resources, an argument that would never have to be articulated among art historians. Similarly, I had to explain the basic art historical questions of the history and transmission of visual forms as I was trying to explain why anyone would be interested in establishing a large-scale visualization of when artists were in what cities. This is not a waste of time; an academic must always be able to explain their work to any audience. But disciplinary borders have their value in certain settings, and if anything, I came away from THATCamp better appreciating how multi-disciplinary work can complement, but not replace, deep immersion in a focused field.

How would my next session go differently?

I’d emphasize again that I don’t think either of these are problems THATCamp ought to solve. But I am glad to draw lessons from this first experience, and I will be keeping these limitations in mind when preparing for future un-conferences. I’m eager to see if a discipline-specific version of THATCamp, like THATCamp CAA would resolve the disciplinary problems I met this weekend. (I’m sure it would come with its own drawbacks, as well.)

If I propose another session in the future, I will also be more thoughtful in enumerating a smaller or narrower challenge. Far from stifling creativity, I think a little more specificity on my part may have allowed participants to bring their varied and valuable perspectives to bear more quickly and productively, perhaps producing something as actionable as Jeffery McClurken’s outline for a SWAT Team for abandoned DH websites.

Which is all to say, then, that I am now equally energized both for the hard work to be done on my specific projects, as well as for the possibilities of future THATCamps.