The e-flux corpus is a web corpus of English art news digests. The corpus consists of 9538 art announcements released from March 1998 to May 2012 collected from e-flux.

It was collected in collaboration with David Levine and Alexander Provan, as the basis of their research article,‘International Art English’ presented in the online journal Triple Canopy and discussed in the Guardian newspaper here. A follow-up debate can be viewed here.


  • raw HTML data of each article parsed into metadata – year, month, author (institution), title – and textual content of the announcement.
  • tokenized using unitok with English model
  • tagged by TreeTagger using Penn Treebank tagset
  • compiled in the Sketch Engine using English sketch grammar for word sketches


v. 1 (24 May 2012)

  • created, 6.2M tokens