Kinect 2.0 for the Xbox One: Here It Comes (In Less Than Two Weeks)

Back in July, I mentioned that the upcoming "Kinect 2.0" peripheral for the next-generation Xbox One would implementation-transition from the structured light technology used in its first-generation Kinect precursor to a time-of-flight approach. One thing I didn't explicitly note back then was that unlike with the Xbox 360, for which Kinect 1.0 was an optional add-on, Kinect 2.0 will be bundled with all Xbox Ones. You might guess that a guaranteed installed base would result in application developers being even more enthusiastic about leveraging Kinect 2.0 than they were with the first-generation peripheral, and you'd be right, judging from comments recently published by Ars Technica:

I think having Kinect everywhere is the main feature that truly differentiates [the Xbox One]. The Xbox Live integration is cool, but it's not something that wasn't there before, and the graphics resolution is higher, but it's always higher. I think doing something with Kinect that you can't do anywhere else is a huge thing. All of us developers are figuring out "how do you really emotionally connect with people?" and I think you saw the reflection of the face [in a monkey that imitates your facial expression]. I think there's something there; I think it's the beginning of something...

I think [having the Kinect] is important in that you can go a little bit farther than you could go otherwise. I think voice is extra convenient, honestly. It's what I do—I lean back and just give my commands and it does everything. The gesture stuff—we did a bunch of Kinect games where we did gesture stuff, for example Kinectimals, and I get e-mails from people with two-year-old daughters... and Kinect couldn't pick up everybody. So that's why we did both. When you want to, you can use [gestures], but you don't have to. I think that's a better approach.

-Jorg Neumann, executive producer, Zoo Tycoon

Regarding Neumann's comments about the inability for the first-generation Kinect to always accurately discern multiple simultaneous players, the preview video at the top of this page shows how far Microsoft has come with its Canesta-acquired technology. The demonstration is conducted by Microsoft's Xbox Chief Marketing & Strategy Officer Yusuf Mehdi and Corporate VP Marc Whitten, and as I watched it I was struck by how different it was than first-generation Kinect demos.

Specifically, the only gesture interface example shown in the entire 12-plus minutes signified a hang-up at the end of a Skype call. Conversely, Microsoft dedicated notable on-screen time to two other impressive vision-processing features. Beginning at 1:00, you'll see how the Xbox One used facial recognition algorithms to automatically recognize the two spokespersons and log them into the console, presenting each of them with a personalized account interface.

And speaking of Skype, beginning at 5:20 you'll be able to see for yourself how the camera uses object tracking to "follow" (via digital zoom, selective focus and other computational photography techniques) Harry Goodwin, Microsoft's Xbox Executive Demos and Communications Manager, who was on the other end of the Skype session. You'll likely also notice the much wider field of view that Kinect 2.0 delivers, along with its 1080p video capture capabilities. And an already leaked setup manual (PDF) indicates that you'll be able to use Kinect 2.0 at much closer camera-to-subject distances, 1.4 meters (~4.5 feet) to be exact.

Lest you think that the demo video above overstates reality, take a look at the comments from journalists at Ars Technica, The Verge and Wired, who were able to spend some advance preview time with the shipping-soon console. I'll close with some excerpted thoughts from Ars Technica's Kyle Orland...advance warning, they might compel you to grab a credit card and submit a preorder:

Skeleton detection and automatic login

Aside from voice, the most significant system-level feature enabled by the Kinect is the ability of the Xbox One to log a user in automatically based only on their visuals. The first time you set up the system, it takes you through a 30-second process where you log in to your Microsoft account. Kinect then builds a personal profile it will associate with that account based on facial recognition but also the camera's basic skeletal model of your body. This process forms a unique biometric ID that the Kinect uses to automatically identify a user, logging them in to Xbox Live and bringing up a personalized menu that includes their recent apps and favorite items.

The login process was practically instantaneous in our demo; Henshaw simply stepped in front of the Kinect and was immediately greeted with a "Hello, Jeff" atop the system menu. This recognition worked even though Henshaw was standing to the side and not directly facing the Kinect sensor. Henshaw said the skeletal and facial data Kinect uses to identify someone is "like fingerprints—no two are exactly alike" and that the system is even sensitive enough to notice minor variations between twins.

One of the coolest moments of the demo combined the Kinect's voice controls and this personal recognition capability, letting PR Manager Jose Pinero take over control of the system instantly without a controller. Pinero simply said, "Xbox, show my stuff," and the system quickly popped up a "Hello, Jose" message, repopulating the menu with his recent activity and pinned favorites. Pinero pointed out that the data for this personalized menu was stored in the cloud, so it could be brought up on any Xbox One. (Personal biometric data, on the other hand, is never uploaded on the cloud. Pinero would have to set up recognition on any new system he wanted to use).

The Kinect didn't identify Jose by his voiceprint, but it instead recognized where in the room his command was coming from, triangulating the location using an internal array of five microphones (Henshaw said this process is "within five-inch accuracy"). The system then looked at the figure it detected in that location, recognized it was Jose, and automated the login process for him. It was an impressive, friction-free moment that I can see being used a lot in multi-user households (and abused a lot in households with annoying little brothers).


I also got a quick demo of a Kinect-powered Skype video chat with a Microsoft employee working from his home. The most interesting feature of this demo was the Kinect's ability to track the "engaged users" in each room and to do a live, software-level pan and zoom in on their location. This digital zoom features can tell between people who are engaged in the conversation and others in the room who are just sitting there reading or doing something else, for instance. (Additionally, as Henshaw put it, "we know that dogs are dogs, we know that cats are cats, and we ignore them.")