While the Xbox 360’s Kinect has proven popular with the mass consumer, developing games that accurately reflect player movement, and really take advantage of the 3D motion-sensing capabilities, has been a major challenge.
Here, David Quinn, who works at Microsoft’s Rare studio in the UK as a Kinect engineer, details how he has approached different challenges when developing for the system and how he’s handled them, over the course of developing Kinect Sports and its sequel.
How do you do a game like darts, where most of the player’s arm is occluded by the body? How do you handle golf, when the Kinect camera loses track of the player’s arms during the swing? How do you handle different accents across the UK and the U.S.?
Since Rare is a Microsoft first party, does the stuff you write end up going back into the Kinect SDK?
DQ: There are a couple of things that Rare has done have gone into the SDK. The avateering system, we did that at Rare; that was when you take the 20-joint skeleton and turn it into the 70-joint avatar. That was done at Rare. And this machine learning system that we’ve recently built with the platform team for Kinect Sports 2; we helped out with that, as well. They did the more mathematical side, and we worked on the tools.
Have you seen implementations of Kinect in third party games that have impressed you or that do things that you weren’t expecting?
DQ: Sure. What Mass Effect had recently done with Kinect’s speech system is an excellent use of speech. We pushed speech in Sports 2; that was always going to be a huge thing for us. It was going to be a key thing, a differentiator from Sports 1. But what the Mass Effect guys have done is bring it into a core title, showing it could be used with a controller. It doesn’t have to be the “get up and dance” kind of experience. You can use speech in Kinect in a more core title, and it really demonstrated that. I think from here on in you’ll see a lot of speech in core games.
Are you primarily concentrating on the skeleton and the visual tracking, or do you work a lot with speech as well?
DQ: I work with both of them, yeah. It’s odd; Kinect is like a brand, but it’s actually a group of technologies, really. I’m kind of the Kinect rep at the studio, so I kind of touch both. I did all the speech work for Sports 2, basically by myself, and then quite a bit of gesture work as well. The machine learning system in golf was kind of my responsibility as well.
Can you describe what that accomplishes?
DQ: For golf, the major problem is the player’s side faces the camera, so we don’t actually get a great feed off the skeleton tracking system, because the back half of the body is completely occluded. All those joints are kind of inferred, basically. It gives a good guess of where it thinks it is, but it has no real meaning.
So when the player does a backswing, it cuts their hands a little, detecting when they do a forward swing. We worked out a codey, hacky job — “hacky” is a bad word — an unscientific job of running the animation. But when the player actually hits the ball and it flies off into the air, that has to be very reliable, because it’s so detrimental to gameplay. Obviously, that’s the entire game: hitting the ball.
So, early days of golf, we kind of had it so you could to do a full backswing and we’d just kind of drop your hands, because we didn’t want the ball to go, but our hand-coded system would actually release the ball.
That’s when we went to the ATG guys, the advanced tech group in Microsoft: “This is kind of where we’re seeing. We’ve got a problem with the golf swing; do you have any recommendations?” They came back with this idea of creating a machine learning system for gestures.
What we basically ended up doing was recording about 1600 clips of people doing golf swings in front of Kinect, tagging in the clip where the ball should release, and then getting the computer itself to work out what’s consistent among all those clips.
Then what happens is it creates a trainer and a classifier and move around that classifier at runtime, so we can pipe in a live feed into the classifier, and it can go, “Yes, the ball should release now,” because it’s been trained on a load of clips. It knows when it should happen. When the golf ball flies off in golf, it’s done in that system; there’s no hand-written code. It’s all mathematical.