The following is a partial lit review from January 2017 of recent conversations surrounding Linked Open Data within the humanities with a focus on what generosity means in the context of the Semantic Web. I’ve experimented with form in this post by lifting key voices out of the text and styling them as block quotes in order to better visualize the range of opinions presented in this review.
I have also set up a corresponding (open) annotated bibliography on Zotero which you can find here. The annotations can be found in both the Notes attached to each citation, as well as under Extra so that readers may quickly move through the citations, rather than having to open each Note separately. Where possible, I have processed the text of the essays, blog posts, and articles featured in this annotated bibliography using Voyant Tools and included up to ten tags of the most frequently used words.
The dream of the Semantic Web and the emergence of LOD
Popular buzz surrounding the Semantic Web has been going on since the early 2000s, later epitomized in a TED talk Tim Berners-Lee delivered in 2009. In his talk, Berners-Lee announced to the world that he needed some help revamping the World Wide Web: a world in which web documents and data coexist. What he asked for was a collective push towards making raw data available on the Web, by which he meant machine-readable data. The same year Berners-Lee, Christian Bizer, and Tom Heath published a paper in the International Journal on Semantic Web and Information Systems titled, “Linked Data – The Story so Far.” In it, they describe Linked Data as “a set of best practices for publishing and connecting structured data on the Web” which opens up the possibility of establishing a “global data space.” Where Open Data means data that is freely accessible on the web in non-proprietary form, Linked Open Data (or LOD) at its most basic is hyperlinked data, meaning data that references and connects to other data on the Web. Structurally, LOD is created through the use of URIs (Unique Resource Identifiers), the vocabularies that identify and define relationships between resources (which make up web ontologies), and RDF (Resource Description Framework). If the World Wide Web is made up of documents, then the Semantic Web is a Web made up of data. For a complete technical overview on LOD, I recommend a look at linkeddata.org, the W3C, or the Berners-Lee and co. paper referenced above. There is still much confusion surrounding the relationship between the Semantic Web and Linked Open Data. Where some believe the Semantic Web and LOD to be one and the same, others understand the Semantic Web as made up of Linked (Open) Data – this review subscribes to the latter. The lack of consensus, however, is interesting and perhaps representational of the spirit of Linked Open Data in that it reflects both its charm and difficulty, that is, the nature of LOD’s conflicting ontologies and unregulated vocabularies.
Tempting the star-collecting achiever in us all, Berners-Lee’s Five Stars of Open Data is a LOD deployment scheme which urges users to free their information from documents that rely on proprietary software so that others may access their data. These five stages towards open data are perhaps best represented in the following graph and legend (taken from their posh site and pasted here):
At the heart of Berners-Lee’s five star systems is a desire for people to make available the data they have now and worry about refining the structure of that data later, a point made clear in his talk:
We want unadulterated data. OK, we have to ask for raw data now. And I’m going to ask you to practice that, OK? Can you say “raw”?…Can you say “data”?…Can you say “now”?
Although this approach is effective at getting data out and onto the Web, the question of how many return to refine or clean up their data, let alone work up the five-star ladder, is still up for debate (see amazing article on “metacrap”). Perhaps the most crucial moment in his talk is a reminder that “data is relationships,” where each node is connected to another and that node to another, making up a complex network of relationships. LOD, then, is a social practice that relies on shared labour for the greater good. This spirit of social responsibility fuels the collective work, a philosophy summarized in the concluding remarks of Berners-Lee’s talk:
It’s about people doing their bit to produce a little bit, and it all connecting. That’s how linked data works. You do your bit. Everybody else does theirs. You may not have lots of data which you have yourself to put on there but you know to demand it.
The structural politics of LOD
LOD is valuable in its ability to publish data that is interoperable and to quickly build up networks of connectivity. In the last ten years, the ecosystem that supports linked and open datasets, more formally known as the LOD cloud, has grown 47.5 times since it was first captured in 2007.
A screenshot of the LOD cloud in 2007 featuring 12 datasets.
A screenshot of the LOD cloud in 2014 featuring a total of 570 datasets (here’s a link to an explorable graph).
Like these LOD cloud graphs emphasize, the structure of RDF itself represents the “data is relationships” philosophy in its subject-predicate-object statements, which describe the relationships between resources within local as well as external datasets. What’s more, LOD supports meaningful, that is context-based, connections between data from a wide range of sources, aided by the easy integration of RDF’s forgiving non-hierarchical structure. In “Zen and the Art of Linked Data,” Dominic Oldman, Martin Doerr, and Stefan Gradmann praise the use of RDF for humanities driven research, writing
Of particular significance to humanists is that semantics can be embedded (rather than described separately) within exactly the same structure. This provides far greater potential for integrating vast repositories of data using the standard Web protocol, and provides the foundation for additional technology layers with increasingly sophisticated levels of expressivity. It also provides the type of flexibility that researchers require to quickly incorporate new information and data structures that are necessary as their research progresses, and creates the opportunity for consistent forms of knowledge representation for all research activities.
In other words, RDF serves as a kind of common language in the world of Linked Data with which to establish semantic connections across the Web. This history of a shared interest in knowledge representation is charted in James Smith’s chapter on “Working with the Semantic Web,” in which he explains
The Semantic Web and linked data are computational applications of pre-existing scholarly practices: linking to primary and secondary sources, signalling trusted vocabularies and authorities, and positioning a work in a larger conversation.
In other words, humanities scholars are uniquely qualified to participate in the creation of the Semantic Web in that the standards of Linked Data mirror the methods and practices we employ in our own scholarly writing. Beyond how we create content, John Unsworth points out the need for increased humanist inquiry in the field of LOD, writing
In some form, the semantic web is our future, and it will require formal representations of the human record. Those representations – ontologies, schemas, knowledge representations, call them what you will – should be produced by people trained in the humanities.
For Unsworth, the creation of “formal representations of the human record” need humanities-authored ontologies with a particular focus on their expertise in the mechanics of knowledge production and representation. Though a still emerging field, Alan Liu reminds us that the task of the digital humanities now is to bring the values of the humanities back into computation and consider “how the digital humanities advances, channels, or resists today’s great postindustrial, neoliberal, corporate, and global flows of information-cum-capital” as a way of addressing the lack of cultural criticism that “blocks the digital humanities from becoming a full partner of the humanities.” Digital humanists, in other words, need to get into the habit of thinking critically about their metadata, about the web applications and tools they use to conduct their research, and about the culturally-bound infrastructures that support those technologies. As Tara McPherson reminds in her essay “Why are the Digital Humanities so White”, as much as computation responds to culture, “we must remember that computers are themselves encoders of culture.” With this history in mind, McPherson (and others like Amy Earhart, Lisa Nakamura, Moya Bailey, and Kim Gallon) urge for attention to be paid to the white epistemologies that underlie the structures of our digital world, writing
We need to privilege systemic modes of thinking that can understand relation and honor complexity, even while valuing precision and specificity. We need nimbler ways of linking the network and the node and digital form and content, and we need to understand that categories like race profoundly shape both form and content. (McPherson)
Corinna Bath takes the task of modelling the future of the Semantic Web as one that must rely on feminist ethics. She draws on the work Donna Haraway and Karen Barad’s concept of diffraction as a way of facing the challenges automatic reasoning pose in an environment that supports competing ontologies within the LOD cloud (3). Pointing to Barad’s term “onto-epistom-ology,” Bath calls for more attention to be paid to the misleading division between ontology and epistemology when creating LOD, especially when conceptualizing ontologies as representational of the “real world”(4). This call for more attention to be paid to feminist ethics as sources of knowledge modelling is echoed in the works of Anita Gurumurthy and Nandini Cham in “Data: the new four-letter word for feminism”. In their article, Gurumurthy and Cham argue the importance of reclaiming data from hegemonic rule, writing
Assuming that data can indeed enable a powerful reconstruction of reality, the process by which it constitutes knowledge for transformative change must be based in deeper ethical-political debates. Unhinged from the complexity of ethics and politics, a world of data – as we are witness to – can end up as an absolutism that endangers the very essence of democracy as feminism would know it.
What’s at stake, then, is a world of data without critical thinking – a world in which the processes by which data is generated, contained, and accessed are left unchallenged. Jeni Tennisonexpresses similar anxieties surrounding the social processes that govern the production and dissemination of information on the web, asking
Is it the case that opening data simply increases the gap between the information haves and have-nots, and that leads to wider economic inequality, or does everyone benefit when information is more widely available? Are there tipping points of availability at which we start realising the benefits of open data? What is the role of government in encouraging data to be more widely available and more widely used? To what extent should government invest in data infrastructure as a public good? How can local or specialist cooperatives pool resources to maintain data?
Others like Ingrid Mason remain weary of the standards (or lack thereof) surrounding the representation of people in data. Put simply, “People matter and representing “people” in data and turning that into linked open data is no small feat.” For Mason, one way of tackling the complexity of representing identity on the Web and avoiding harmful representations of people in LOD – harmful in the sense of placing people in categories that overlook the discursive categories of gender and race – is through collaboration. The organization of post-Summit meetings (ie. Linked Open Data in Libraries, Archives, and Museums Summit 2017) is one small step towards addressing the challenges surrounding the treatment of data about people, but crucial nonetheless.
More than a feeling: cultural challenges, social responsibility, & LOD
If we are to promote broader engagement with LOD and widen the field to include the humanities as full partners, formal standards must be established when it comes to how we publish Linked Data on the Web (ie. context, provenance, and data integration). Despite the incredible growth of LOD within digital humanities and cultural heritage sectors (#LODLAM), the recycling of data, however, what Michele Barbera calls “creative reuse,” has been limited despite the recent technological advances that make it possible (91). What this suggests, Barbera argues, is a need to shift social and cultural habits of digital scholars from humanities and cultural heritage backgrounds. The discomfort around sharing content and collaborating online is a feeling that continues to persist in the humanities. Where collaborative scholarship may be business as usual in the sciences, the humanities still have much work to do in establishing a culture that not only supports but encourages collaborative work. Digital collaboration – indeed collaboration of any kind – will likely always require an initial leap of faith. When done right, however, this kind of work, this effort to make oneself open to the possibilities of working with others, exchanging best practices, and sharing the burden of research and writing (while celebrating the pleasures too) proves powerful and worthwhile. For recent work on collaborative scholarship online see Susan Brown’s “Towards Best Practices in Collaborative Online Knowledge Production” and Natalia Mehlman Petrzela and Sarah Manekin’s “The Accountability Partnership: Writing and Surviving in the Digital Age.
To return to Barbera, beyond the discomforts of sharing content online, cultural heritage and digital humanities researchers continue to remain caught in so-called “two-dimensional paper thinking,” that is, reproducing print technologies on the web rather than designing projects that derive from and are built for the Semantic Web (96). We cannot continue to rebuild old models with new technologies, we must, as Berners-Lee urges, encourage “thinking in the graph.” Likewise, technological innovation in the field of LOD cannot flourish if the shifting cultural demands of the Semantic Web are not first addressed. One way of bringing about the kind of cultural change required to support a rich and diverse linked open data economy, I propose, begins with what Kathleen Fitzpatrick calls “generous thinking.”
Generous scholarship: towards critical cyberinfrastructures
Fitzpatrick’s work (her excellent blog can be read here) is known for advocating for scholarship that is open to displaying works-in progress and honest about mistakes made along the way – including the countless drafts (or version, if you like) a project goes through before “completion.” Her latest project focuses on “the possibilities that might open up for scholars not just in doing more of their work in public but in doing more of that work in conversation with the public.” Drawing on the recent critiques of criticism by Bruno Latour and Rita Felski, “generous thinking” is offered as a way to encourage better practices of communication within the academy. In its most basic form, generous thinking roots the humanities in a practice of generosity, meaning, “the practices of thinking with rather than reflexively against both the people and the materials with which we work” while fostering “more productive relationships and conversations not just among scholars but between scholars and the surrounding community” (Fitzpatrick). For Fitzpatrick, now is as good a time as any to tackle our institutional problems:
We have the opportunity, if we take that care seriously, to create a kind of dialogue that might help further rather than stymie the work we want to do — and that might not simply improve the standing of the humanities in the popular imagination, but dramatically transform the relationship between the university and the broader public.
This philosophy of academic life is compelling in its emphasis on cultivating small moments that affect great change, including: “a greater disposition toward listening, toward patience, toward engaging with what is actually in front of us rather than continually pressing forward to where we want to go” (Fitzpatrick). When faced with the question of what the humanities offer universities and the general public, Fitzpatrick points to the many possibilities we open up when we think generously. For her, “generosity of mind” encourages genuine dialogue that builds rather than stifles a work, an attitude that places value on the importance of listening for the sake of understanding rather than a means to an end (Fitzpatrick). This is the difference between paying attention to your colleague during their talk instead of focusing on what you’re going to say during Q&A (guilty). At the core of Fitzpatrick’s model is a desire to learn and build better together, to work collectively with a reminder to pay attention to fellow collaborators, to honour the subjects we study, and to “encounter the other in all its irreducible otherness.” It’s about trying to slow down the demands of the academy and focus on true engagement, whether it’s with perspectives that are not our own or making time to revisit that project you keep putting off. It’s about hard work, yes, but of a different kind: work that cultivates the ability “to listen — to the text, to our communities, to ourselves — without attaching or rejecting” (Fitzpatrick).
Other voices have been pushing for generosity too. Mitchell Whitelaw takes up the “ethos [of] generosity” in his work on “generous interfaces,” writing:
The qualities of generosity I am interested in here are “to be liberal in giving or sharing”; also to be “large, abundant, ample” . Both of these qualities seem well aligned with the aims and missions of cultural collections. Our digital collections are certainly large, abundant and ample; and the charters of our cultural institutions place a high value on sharing these riches liberally with the public. Generosity seems to be very much in line with the aims of our cultural collections. (2)
Within this context, generosity in interface design means presenting the user with the richness of a collection and empowering them to explore its contents in ways that are both intuitive and delightful. Arguing for a different kind of generosity, Miriam Posner voices her concerns regarding how data is conceptualized within the digital humanities, noting, “most of the data and data models we have inherited deal with structures of power, like gender and race, with a crudeness that would never pass muster in a peer-reviewed humanities publication.” Returning to Fitzpatrick’s definition of “generosity,” the bulk of digital humanities work has been rather ungenerous, that is, not paying attention to the white epistemologies that continue to inform the ways in which concepts like race and gender are treated in our datasets and represented on the Web. To borrow again from Posner, we must
… stop acting as though the data models for identity are containers to be filled in order to produce meaning and recognize instead that these structures themselves constitute data. That is where the work of DH should begin… [we need to be] more ambitious, to hold ourselves to much higher standards when we are claiming to develop data-based work that depicts people’s lives.
She goes on to challenge criticism that would paint calls for more engagement with race and gender theory as “a kind of philanthropic activity.” Generous thinking too can be read can be read in a similar light – but this is of course nonsense. Rather than scoff at attempts to rally efforts and challenge systems of oppression in all its shapes and forms, Posner reminds us that
DH needs scholarly expertise in critical race theory, feminist and queer theory, and other interrogations of structures of power in order to develop models of the world that have any relevance to people’s lived experience. Truly, it is the most complicated, challenging computing problem I can imagine, and DH hasn’t even begun yet to take it on.
What does generosity have to do with LOD?
I’d like to end with some thoughts on “this most complicated, challenging computing problem” (Posner) and imagine a Semantic Web made up of generous Linked Open Data. If the voices I’ve gathered here in this review have demonstrated anything, it’s that the academic community needs to reclaim its sense of responsibility by conducting research that builds rather than fragments, remaining ever conscious of the needs of the communities they serve, and creating a kind of digital legacy worth investing in. In this sense, the hard work that lays ahead for humanities-driven LOD has more to do with Fitzpatrick and Whitehall’s radical application of generosity than it does with technological innovation. Where “generosity” means generous as in linked (an abundance of meaningful connections to external resources), generous as in open (free for others to access, reuse, or build on), and generous as in thoughtfully managed data (with attention paid to how data is categorized, represented, and made explorable).
Major Works Cited
Bath, Corinna. “Towards a Feminist Ethics of Knowledge Modeling for the Future Web 3.0.” 10th IAS-STS Annual Conference. May 2011. Graz, Autria. Abstract.
Oldman, Dominic, Martin Doerr, and Stefan Gradmann. “Zen and the Art of Linked Data.” A New Companion to Digital Humanities. Ed. Susan Schreibman, Ray Siemens, and John Unsworth. John Wiley & Sons, Ltd (2015): 251–273.
Photo credit: Milada Vigerova via Unsplash