Saltar para: Post [1], Pesquisa e Arquivos [2]



Big Data e as perguntas que não estamos a fazer

19.01.13

 

 

Though Cristianini and his team didn't deploy the level of analysis Lim did to presidential speech, they did a general test for news media readability using Flesch scores, which assess the complexity of writing based on the length of words and sentences. Shorter, on this account, means simpler, although that doesn't mean that shortness of prose breadth always denotes simplicity; for general purposes, Flesch scores tend to hit the mark: writing specifically aimed at children tends to have a higher score over, say, scholarship in the humanities, and these differences in readability tend to reflect differences in substantive content.

According to Cristianini et al.'s analysis of a subset of the media—eight leading newspapers from the US and seven from the UK; a total of 218,302 stories—The Guardian is considerably more complex a read than any of the other major publications, including The New York Times. Surprisingly, so is the Daily Mail, whose formula of "celebrity 'X' is" happy/sad/disheveled/flirty/fat/pregnant/glowing/ and so on apparently belies its complexity (when I mentioned this to a friend who is an intellectual historian and Guardian reader, she said, "actually,sometimes I do read the Mail"). Together, according to Comscore, these three publications are the most read news sources in the world.

Cristianini et al. also measured the percentage of adjectives expressing judgments, such as "terrible" and "wonderful" in order to assess the publication's degree of linguistic subjectivity. Not surprisingly, tabloid newspapers tended to be more subjective, while the Wall Street Journal, perhaps owing to its focus on business and finance, was the most linguistically objective. Despite The Guardian and the Daily Mail's seemingly complex prose, the researchers found that, in general, readability and subjectivity tended to go hand-in-hand when they combined the most popular stories with writing styles. "While we cannot be sure about the causal factors at work here," they write, "our findings suggest the possibility, at least, that the language of hard news and dry factual reporting is as much as a deterrent to readers and viewers as the content." When political reporting was 'Flesched' out, so to speak, it was the most complex genre of news to read, and one of the least subjective.

When I asked Lim what he thought of the study via email, this, he said, was the pattern that stood out. "This means that at least in terms of the items included in the dataset, the media is opinionated and subjective at the same time that it is rendering these judgments in simplistic, unsubtle terms. This is not an encouraging pattern in journalistic conventions, especially given that the public appears to endorse it (given the correlation between the popularity of a story, its readability, and subjectivity)."

(...)

That all this data mining points to the importance of style is just one of the delightful ways that Big Crit can challenge our assumptions about the way markets and consumers and the world works. Of course, we are still, in analytical terms, learning to scrawl. As Colleen Cotter—perhaps the only person to have switched from journalism to linguistics and to then have produced a deep linguistic study of the language of news—cautions, we need to be careful about reading too much into "readability."

"If 'readability,' is just a quantitative measure, like length of words or structure of sentences (ones without clauses)," she says via email, "then it's a somewhat artificial way of 'counting.' It doesn't take into account familiarity, or native or intuitive or colloquial understandings of words, phrases, and narrative structures (like news stories or recipes or shopping lists or country-western lyrics)." Nor do readability formulas take into account "the specialist or local audience," says Cotter, who is a Reader in Media Linguistics at Queen Mary University in London. "I remember wondering why we had to have bridge scores published in the Redding, CA, paper, or why we had to call grieving family members, and the managing editor's claims that people expect that."

There are other limits to algorithmic content analysis too, as Lichter notes. "A content analysis of Animal Farm can tell you what Animal Farm says about animals," he says. "But it can't tell you what it says about Stalinism."

Autoria e outros dados (tags, etc)




Pesquisar

Pesquisar no Blog