Whether you think they’re an outdated metric with no business judging art, or the undisputed champions of consumer rights, review scores have real power in the games industry. But how is a 7.7 really different to a 7.8? And how much is just too much water?
One group that feels the consequences of review scores more than most, are the people that actually make the games: developers. A classic case of this was the critical response to Fallout: New Vegas, which led to the staff at Obsidian Entertainment missing out on thousands in bonuses over a single metascore point.
One could be forgiven for thinking that devs hate everything about review scores, which, at the end of the day, quantify sometimes years of hard work into a single number. But for most, they just come with the territory.
“Reviews are an integral part of our business and provide the gaming public with critical information to aid in the purchase process,” says Ben Jones, director at Fugitive Games and ex-lead designer at DICE.
However, scores can become a problem when they’re taken out of context.
“But often,” Jones continues. “The nuance and depth of the reviews themselves are overlooked because the write up has a number attached to it.
“The true value of a game goes well beyond a Metacritic score, and I’ve always held that if you’re interested in finding games that appeal to you, then your search must extend past a numerical value.”
This sentiment is reflected in how some large sites critique games. Speaking in 2009 about why they don’t put scores on reviews, Michael McWhertor, then of Kotaku, said:
“We’d prefer you read our review instead of just skipping to the score and forming an opinion based on a number, a number that doesn’t represent a reviewer’s assessment.”
Kotaku’s system has remained largely unchanged since then, and the point stands. What’s the point of reading three pages when you can just tell by the score, right? A score can mask a critic’s real opinion, and not give a game the praise it deserves. For example, I really like No More Heroes, a self-aware hack-and-slash from unhinged video game auteur SUDA51.
I enjoy the silly humour, its sense of style and thoughtful design. But although it’s a personal favourite, there’re so many things about it that are truly awful, like poor graphical performance, stiff movement and bizarre motorcycle mechanics. How would this be represented in a score? A panning wouldn’t do its core ideas justice, but a perfect ten would overlook some unforgivable sins.
Another issue with assigning numbers to video games is that, as they develop as a medium, they’re becoming much more diverse. This means that it’s becoming even harder to quantify elements of game and judge them all by the same measure.
“I’d say that as game-types themselves have expanded and become more varied, the idea of a single score that can encapsulate an entire game has become obsolete,” says Ed Orman, designer at Australian studio, Uppercut Games. “I’ve always preferred a ‘pros/cons’ analysis, or better yet, a verbose scoreless review that actually tries to get a sense of what the game is across to the user.”
Experimental experiences are often the most divisive, drawing rave reviews from critics, but a lukewarm response from the wider community.
“To put a review score on Proteus doesn’t even make sense,” says Stuart Maxwell, a developer working on Black Tusk Studios’ forthcoming Gears of War, as well as indie walking-sim Shape of the World. “I can’t see how a review score could help or hurt that game, because it’s very limited, but it’s also perfect, and could warrant a one or a ten, depending on how you look at it. How do you rate Gone Home? It’s not about that, you don’t rate a painting in an art gallery.”
Evaluating everything using the same homogenous system means you’re often comparing apples to crunchy mutfruit, which all vault dwellers know is a perilous comparison to make.
But despite these concerns, there are situations where review scores help people make informed choices.
“For mainstream media, they’re pretty helpful,” says Maxwell. “Because you’ve got all of these similar products and you’re distilling which one’s achieved slightly higher quality than another. When I think about Metacritic, I think, ‘How does The Order compare to Dragon Age and how do they compare to Far Cry 4?’ In a way, they’re all very similar products separated by small details.
“In the realm of art, it doesn’t work very well. In the realm of AAA games, where you’re refining a formula game after game, it makes perfect sense.”
Metacritic, and review scores in general, offer a quick snapshot of what people who like games think’s worth your time, and for all their faults, are something that I regularly look at when I want to know what’s good. However, when sweeping generalisations are made using what’s essentially a critical Polaroid, things start to become problematic.
“The complexity of game release timelines, Early Access, alphas, betas, patches, makes a ‘verdict’ obsolete in a really small amount of time,” says David Léon, designer at Spanish studio Lince Works, who’re currently working on Twin Souls: Path of Shadows. “Recommendations and a simple ‘Good’ or ‘Bad’ opinion is much more useful to both consumers and developers than a number nowadays.”
Taking this into account, the rising popularity of new media could be signalling a change in where public opinion on games is being formed. A one off review can be made irrelevant by an update, whereas a series of YouTube videos can keep abreast of any changes.
“In the last few years there’s been a shift from trusting magazines or websites, to trusting personalities, critics or lone journalists,” Léon says. “A shift to personalization.”
Often, the critical response to a game closely matches the user rating, but this isn’t always the case. One example of this is the Call of Duty franchise, which continues to be received positively by critics, but mauled by users. This suggests that public opinion can be formed independently of professional reviewers, on forums or within YouTube and Twitch communities, and doesn’t rely on traditional gatekeepers and aggregators.
TLDR; Review scores are like McDonald’s. Good when you’re in a hurry or just want something easy. But you can’t it eat all the time, mate! It messes with your blood! Didn’t you watch Supersize ME?