in a word...
Currents in Australian affairs, 2003–2013

About

What can you read in a single word? Drawing on details of ABC Radio current affairs programs stored in Trove, this page presents a word a month from AM, The World Today, and PM.

Which words? The selected words are those that seem most distinctive, based on a statistical measure called TF–IDF (Term Frequency – Inverse Document Frequency). It's a fairly simple method with numerous problems, but this isn’t intended as a rigorous statistical analysis.

The aim of this page is to provide a reminder of the people, events, stories and places that seemed significant at the time, but might now have faded from memory. It’s just one possible way of exploring the rich store of ABC data available through Trove. Hopefully it will inspire others to dig deeper.

Click on any of the words to browse related stories in Trove.

January 2003

The World Today

Sounds

PM

HIH

February 2003

AM

Blix

The World Today

Warne

PM

Warne

March 2003

AM

Basra

The World Today

Perle

PM

Baghdad

April 2003

AM

SARS

The World Today

SARS

PM

Baghdad

May 2003

The World Today

Hollingworth

June 2003

AM

Map

The World Today

Crean

PM

ONA

July 2003

The World Today

Solomons

PM

Solomon

August 2003

AM

Hambali

The World Today

Mello

PM

Amrozi

September 2003

AM

sheep

The World Today

sheep

PM

sheep

October 2003

The World Today

Mahathir

PM

Cormo

November 2003

The World Today

MedicarePlus

December 2003

AM

Saddam

The World Today

Latham

PM

Latham

January 2004

AM

Waugh

The World Today

PM

Hookes

February 2004

AM

Latham

The World Today

PM

Latham

March 2004

The World Today

Madrid

April 2004

The World Today

Flint

PM

Flint

May 2004

The World Today

Falconio

PM

Roche

June 2004

AM

Roche

The World Today

Kingsford

PM

Roche

July 2004

AM

Eadie

The World Today

Eadie

PM

Latham

August 2004

AM

Butler

The World Today

FTA

PM

Butler

September 2004

AM

Latham

The World Today

Latham

PM

Latham

October 2004

AM

Kerry

The World Today

Pitcairn

PM

Latham

November 2004

The World Today

Fallujah

December 2004

The World Today

maiden

PM

Sounds

January 2005

AM

tsunami

The World Today

Beazley

PM

Aceh

February 2005

AM

Rau

The World Today

Rau

PM

Rau

March 2005

The World Today

Lightfoot

April 2005

AM

Pope

The World Today

Pope

PM

Pope

May 2005

AM

Wood

The World Today

Solon

PM

Solon

June 2005

AM

Wood

The World Today

Tegan

PM

Lane

July 2005

AM

Vizard

The World Today

Discovery

PM

Vizard

August 2005

The World Today

Brogden

PM

Editor

September 2005

AM

Rita

The World Today

Hurricane

PM

Orleans

October 2005

AM

flu

The World Today

Bali

PM

Nguyen

November 2005

AM

Nguyen

The World Today

Nguyen

PM

Nguyen

December 2005

AM

Nguyen

The World Today

Nguyen

PM

Nguyen

January 2006

The World Today

Cole

PM

AWB

February 2006

AM

AWB

The World Today

AWB

PM

AWB

March 2006

AM

Games

The World Today

Games

PM

AWB

April 2006

AM

ANZAC

The World Today

Cole

PM

Rini

May 2006

The World Today

Beaconsfield

June 2006

The World Today

Socceroos

July 2006

AM

Lebanon

The World Today

Lebanon

PM

Lebanon

August 2006

AM

Lebanon

The World Today

Lebanon

PM

Lebanon

September 2006

AM

Title

The World Today

Hilliard

PM

Vizard

October 2006

AM

Hilali

The World Today

Moti

PM

Hilali

November 2006

AM

Gemayel

The World Today

Brimble

PM

Brimble

December 2006

AM

Fiji

The World Today

Fiji

PM

Palm

January 2007

AM

Open

The World Today

Snowtown

PM

Chapman

February 2007

AM

Jovicic

The World Today

Hicks

March 2007

AM

Hicks

The World Today

Hicks

PM

Santoro

April 2007

The World Today

Cho

May 2007

AM

volumes

The World Today

Wolfowitz

June 2007

AM

Mokbel

The World Today

Mokbel

PM

Byrne

July 2007

AM

Haneef

The World Today

Haneef

PM

Haneef

August 2007

AM

Haneef

The World Today

Haneef

September 2007

AM

APEC

The World Today

APEC

PM

APEC

October 2007

AM

Uhlmann

The World Today

Cousins

November 2007

The World Today

Musharraf

PM

Banton

December 2007

AM

Bhutto

The World Today

Centro

January 2008

AM

Suharto

The World Today

Giuliani

PM

Bucknor

February 2008

AM

Kosovo

The World Today

apology

PM

Kovco

March 2008

AM

Tibetan

The World Today

Tibet

April 2008

AM

torch

The World Today

relay

PM

torch

May 2008

AM

Nargis

The World Today

FuelWatch

PM

Nargis

June 2008

The World Today

Neal

PM

Belinda

July 2008

The World Today

Karadzic

PM

WYD

August 2008

The World Today

Georgia

PM

Georgia

September 2008

AM

Gustav

The World Today

Lehman

PM

Palin

October 2008

The World Today

Britt

PM

McCain

November 2008

AM

Mumbai

The World Today

Mumbai

December 2008

AM

cholera

The World Today

cholera

PM

Lexi

January 2009

AM

Dokic

The World Today

Cedric

February 2009

The World Today

stimulus

March 2009

AM

Bligh

The World Today

bikie

PM

Lannin

April 2009

AM

swine

The World Today

swine

PM

swine

May 2009

AM

swine

The World Today

swine

PM

swine

June 2009

AM

swine

The World Today

swine

PM

swine

July 2009

AM

Hu

The World Today

Theophanous

PM

Nuttall

August 2009

AM

ETS

The World Today

Gorgon

PM

Uighur

September 2009

The World Today

Renault

PM

Samoa

October 2009

AM

Viking

The World Today

Samoa

PM

Viking

November 2009

AM

Viking

The World Today

ETS

PM

ETS

December 2009

The World Today

Copenhagen

January 2010

AM

Haiti

The World Today

Sounds

PM

Haiti

February 2010

The World Today

insulation

March 2010

AM

Hodgman

The World Today

Chile

PM

Bain

April 2010

AM

Goldman

The World Today

Brumby

PM

Bain

May 2010

AM

profits

The World Today

Shirt

PM

Alle

June 2010

AM

BP

The World Today

Kyrgyzstan

PM

Alle

July 2010

AM

Gillard

The World Today

BP

PM

Alle

August 2010

The World Today

minority

September 2010

AM

Games

The World Today

Koran

October 2010

AM

Games

The World Today

Games

PM

Chilean

November 2010

AM

Brumby

The World Today

Greymouth

PM

Pike

December 2010

AM

Assange

The World Today

WikiLeaks

January 2011

AM

flood

The World Today

Sounds

PM

flood

February 2011

AM

Yasi

The World Today

Yasi

PM

March 2011

AM

Libya

The World Today

Gaddafi

PM

Gaddafi

April 2011

AM

Ivory

The World Today

Ivory

PM

May 2011

AM

Laden

The World Today

Laden

PM

Laden

June 2011

AM

cattle

The World Today

LulzSec

PM

cattle

July 2011

AM

hacking

The World Today

hacking

PM

carbon

August 2011

AM

Tripoli

The World Today

Gaddafi

PM

Tripoli

September 2011

The World Today

Libya

October 2011

AM

Gaddafi

The World Today

Occupy

PM

CHOGM

November 2011

AM

Roebuck

The World Today

Occupy

PM

euro

December 2011

AM

Ruxton

The World Today

Durban

January 2012

AM

Romney

The World Today

Wilkie

February 2012

AM

Romney

The World Today

diamond

March 2012

AM

Wagga

The World Today

Wagga

PM

Wagga

April 2012

AM

Slipper

The World Today

Nystrom

PM

Bahrain

May 2012

AM

Thomson

The World Today

Hastie

PM

Thomson

June 2012

AM

Melinda

The World Today

eurozone

PM

Caviar

July 2012

The World Today

Kurnell

PM

Jurrah

August 2012

AM

Ecuador

The World Today

Assange

PM

Grocon

September 2012

AM

trawler

The World Today

sheep

PM

trawler

October 2012

The World Today

Sandy

PM

Romney

November 2012

AM

AWU

The World Today

Gaza

PM

Nauru

December 2012

AM

cliff

The World Today

Slipper

January 2013

AM

Peris

The World Today

Algerian

February 2013

The World Today

Zygier

PM

Obeid

March 2013

AM

Cyprus

The World Today

Cyprus

PM

Cyprus

April 2013

AM

Boston

The World Today

Boston

PM

Boston

May 2013

AM

tornado

The World Today

NDIS

June 2013

AM

Snowden

The World Today

Gonski

July 2013

AM

Morsi

The World Today

Quigley

PM

Morsi

August 2013

The World Today

Essendon

September 2013

The World Today

Abbott

October 2013

The World Today

Mountains

November 2013

AM

Haiyan

The World Today

Haiyan

PM

Haiyan

December 2013

AM

Mandela

The World Today

Holden

Data

Trove, The National Library of Australia's discovery service, harvests the details of 54 ABC Radio National programs. Currently there are over 200,000 RN records in Trove, including details of every segment of ABC Radio's current affairs programs – AM, PM and The World Today – broadcast since 1999. This is an important record of Australia's recent social and political history.

All of this metadata is available for re-use through the Trove API, opening it up for new forms of analysis and exploration.

Building a harvester to grab data from Trove is pretty straightforward. A while back I created a simple Python TroveHarvester class that handles all the basics. You just need to subclass this, replacing the process_results() method with something more useful.

The harvester I used to build a local copy of all the Radio National data is available on GitHub. It simply queries the Trove API and saves the results to a MongoDB collection.

API results are returned at the ‘work’ level, and a single work may include multiple versions. In the case of the Radio National data, the records of some regular segments have been grouped together as works because they have the same title and creator. To make it easy to get at all the individual records, the harvester opens up each work and extracts and saves the metadata for all the versions inside. As a result, the total number of records saved by the harvester will be greater than the number of work-level search results returned by Trove.

Method

TF–IDF (Term Frequency – Inverse Document Frequency) provides a measure of how significant a word is in a particular document by comparing it’s frequency within that document to it’s frequency within a collection of similar documents. Words that are common across all documents will have a low TF–IDF value, while words that appear frequently in just a small subset of documents will have a high value.

I’d previously played around with TF-IDF during my Harold White Fellowship at the National Library of Australia. Using a set of 10,000 newspaper articles harvested from Trove, I created The Future of the Past, which used words with high TF–IDF values as a way of navigating through time (and creating your own tweetable fridge poetry!).

TF-IDF is used all over the place by search engines to calculate the similarity of documents, but what entranced me was the evocative power of the words themselves. They seemed to tell a story...

For In A Word I wanted to try and capture the ebb and flow of current affairs – particularly, those topics that seemed to capture our attention for days or weeks and then disappear. TF–IDF seemed quite well suited to identifying what was different over time.

In this case a ‘document’ is actually the combined titles and summaries of every segment of a program for a month. The TF–IDF values are calculated by comparing this ‘document’ to every other month from the complete decade, 2003–2013.

I used the Python Natural Language Toolkit (NLTK) to actually perform the calculations. So the basic method for each program went something like:

  • Loop through program records by year/month, writing titles and summaries for each month to a separate text file
  • Create a corpus of all the text files using NLTK
  • Loop through the corpus file by file, breaking the content up into individual words (known as tokenising)
  • Loop through the words calculating a TF–IDF for each (ignoring common 'stopwords', just to speed things up a bit)
  • Select the word with the highest TF–IDF for each month and write them to a data file to use in the web interface

Problems

You might be wondering about the word ‘Documentary’ that turns up in the PM column a few times in December and January. What’s going on? In the holiday period, many of the stories broadcast on PM included the words ‘Documentary Special’ in the title. So the word ‘Documentary’ is really common in December and January, but pretty rare throughout the rest of the year. As a result it wins a good TF–IDF score.

Similarly, you might be straining to recall some of the names that turn up in PM – Alle, Glanville and Woolrich for example. Were they politicians, criminals, corporate big-wigs...? They’re actually business and finance reporters. You’d generally expect that the names of journalists would be common enough across the whole corpus that their TF–IDF values would be fairly low. But PM seems to have turned over business and finance reporters on a regular basis, so they're very prominent for relatively short periods. And that means they score well.

So why did I leave these oddities in? I thought about simply adding words like ‘Documentary’ to my list of stopwords so they’d be ignored. But what if there really was a case where a documentary had become controversial? Similarly, I looked at ways of extracting the names of journalists from the program metadata so I could exclude them from the calculations. But it was hard to separate journalists from the subjects of their stories.

In the end I thought that instead of massaging the data it was better to leave the problems in, and make the limitations of this method and the complications of the data obvious. Perhaps I’ll create a cleaned up version in the future for comparison.

Credits

In a word was created by Tim Sherratt (@wragge) using the Trove API.

Thanks to the developers, maintainers and communities of the following software: