I’ve pretty much given up on watching the news or reading newspapers. Yea, yea, we all know that the news is slanted to the bad shit that is going on.
But I’ve been wondering if I do consume mass media, who should I turn to for a more positive slant.
Enter the goodwords project. The idea is simple, I have a list of positive words, I scrape various news sources and see who’s the happy chappies and who are the grumpies.
I’ve been dabbling with this idea on and off for quite a while. And now finally have a bit of data to start fiddling with. I initially was tracking a number of international sources, but lost the data (long boring story). And since I’ve been summering in South Africa and people are so news conscious here, I decided to start here. I’m actually scraping a few times a day, but these initial results are based on midnight editions.
The results are based on a percentage of good words vs the number of total words on the page. And yea, I know words in context can mean different things, but then this was never meant to be scientific. (And I was tracking other papers – such as the Indepent Newspaper Groups Papers – but they changed something on their site and my scrapes have stopped working).
Enough you say, let’s see the results.

Firstly, note the results are measured in percentage of goodwords on a page and reflect only two months of tracking so far.
Well seems, if I did want to read the news, I should stick to the Times and avoid the business newspapers – especially the Financial Mail (kinda expected). However, the financial Mail also has the least words on the page. Google News South Africa has the most (and has Google News recently become a happierplace?). The Mail and Guardian, which used to be the paper I respected the most, has kinda become a bit of a naysayer these days – and the results seem to reflect that.
Here are the top words from all the tracked papers, which probably proves I need to adjust my word list.

I’ve started tracking a number of international newspapers, but it’s too early to have interesting results.
Technical notes:
I am using PHP to pull the data via a cron job into a SQL database, and using Processing to draw the graphs. I am using the SQLibrary by Florian Jenett to pull the database stuff into processing. The code is not so exciting, and kind of messy, but I will keep releasing it anyway. I am generating the source_id's manually simply because I haven't got round to implementing that yet.