[FoRK] It's Time to Start 3D Scanning the World

Eugen Leitl eugen at leitl.org
Wed Jan 11 06:59:18 PST 2012


It's Time to Start 3D Scanning the World

POSTED BY: Erin Rapacki  /  Tue, January 10, 2012

This is a guest post. The views expressed in this article are solely those of
the blogger and do not represent positions of Automaton, IEEE Spectrum, or
the IEEE.

matterport scan

When Microsoft was developing its Kinect 3D sensor, a critical task was to
calibrate its algorithms to rapidly and accurately recognize parts of the
human body, especially hands, to make sure the device would work in any home,
with any age group, any clothing, and any kind of background object. Using a
computer-based approach to do the calibration had limitations, because
computers would sometimes fail to identify a human hand in a Kinect-generated
image, or would "see" a hand where none existed. So Microsoft is said to have
turned to humans for help, crowdsourcing the image-tagging job using Amazon’s
Mechanical Turk, the online service where people get paid for performing
relatively simple tasks that computers are not good at. As a result the
Kinect now knows what all (or most) hands look like. Great!

Well, that's great if all you care about is gesture-based gaming, but from my
commercial robotics-oriented perspective, the problem is that a human hand is
just one "thing" among thousands -- millions?! -- out there that we would
like machines to be able to identify. Imagine if a robot could promptly
recognize any object in a home or office or factory: Anything that the robot
sees or picks up it would instantly know what it is. Now that would be great.

So the question is: Can we ever achieve that goal? Can we somehow automate or
crowdsource image tagging of almost every object imaginable?

This type of data collection presents a chicken-and-egg problem: If you have
a data set with objects properly tagged, you can start to build applications
that rely on the "knowledge" stored in that set, and these applications in
turn can generate more data and you can refine the "knowledge" further. The
problem is, you need a data set in the first place! Sometimes companies
decide that there's a compelling value proposition in building such a set.
That's what Microsoft did with the Kinect. Another example is Google's "voice
actions," which let you search, email, and do other tasks using speech. Every
time you say a word and your Android phone asks, "Did you mean…?" and gives
you a list of words to select from, you're helping to improve Google's voice
recognition system. Over time, the variation and nuances of different
people’s speech patterns are being captured as voice data that could match an
actual regression. Speech to text would never be any good without this kind
of continuous improvement.

Now back to robotics. What I think the robotics community should pay more
attention to is the importance of data. There are many things in robotics
that require a large data set (emphasis on large) in order to become
technically feasible (such as recognizing objects) and therefore this
functionality is outside the hands of pure research, roboticists, and
algorithms, and more in the hands of current market trends with "tangential
technologies" such as the Web and smartphones. So, in order to make robots
"happen" one day, we need to keep an eye out for those technologies that have
the potential of collecting lots of data, for reasons other than robotics,
and apply it to robotics when the time is right.

And the type of data we need the most is 3D. So how do we collect 3D data for
every possible object? Luckily, a large hacker community formed around the
Kinect sensor, and startups like MatterPort are enabling quick 3D rendering
of objects just by taking images with the Kinect at a few angles. The results
are still crude, but as sensors and algorithms improve, you can imagine that
"3D-fying" a scene will become as easy as snapping a picture of it. In fact,
technologies like the Lytro and other "computational cameras" that capture
both intensity and angle of light, allowing users to refocus already-snapped
photos, could also help with the creation of 3D images. Here's a demo of a
Kinect-based system from MatterPort:

As I said before, roboticists alone can't do all the 3D scanning. The hope is
that other technologies would drive this trend. So here's an idea. If online
retailers saw value in showcasing detailed 3D models of objects for sale
(instead of the usual 2D photos we see on most websites), and tagged images
with descriptions such as color, weight, and function, then thousands of
objects could be in principle searchable by a robot. Google discussed a
notion similar to this at the 2010 IEEE Humanoids conference and again at
Google I/O last May. And maybe not only retailers would offer 3D scans, but
consumers would too, realizing that adding 3D views would be a more effective
way of selling stuff on eBay, for example.

If this scenario becomes reality, then all of the 3D images could be
aggregated into a robot-friendly database that bots would use as reference. A
robot would take 3D sensor data of an object it is seeing and check whether
it matches one or more of the reference images. Over time, and with feedback
("Yes, Rosie, this is a plate"), the robot's object-recognition capabilities
would continually improve.

So you want smarter robots? Then start demanding that online retailers offer
3D scans of their products -- and start creating your own scans. With this
data set, robots will finally start to be able to recognize and understand
our world of objects.

Erin Rapacki is a product marketing manager at Adept Technology. She lives in
the San Francisco Bay Area.

Image and video: MatterPort

More information about the FoRK mailing list