Alongside that potential come several unique challenges, of which the main one is body tracking. An AR engine needs to identify what sections of the camera feed belong to a person as well as how and where they move — perhaps even tracking the position and orientation of individual body parts. And once we have that information, we can take it a step further to address an even more specific hurdle: segmentation.
What is Segmentation?
Segmentation is the process of identifying a body part or real object within a camera feed and isolating it, creating a “cutout” that can be treated as an individual object for purposes like transformation, occlusion, localization of additional effects, and so on.Types of Segmentation:
Hair Segmentation: Changing a user’s hairstyle requires precise segmentation of that user’s real hair so that it can be recolored, resized, or even removed from the rendered scene and replaced entirely without affecting other parts of the scene, such as the user’s face. Body Segmentation: Allows for the user’s background to be replaced without tools like a green screen, throwing the user into deep space, lush jungles, the Oval Office, or anywhere else you would like to superimpose your body outline against. Skin Segmentation: Skin segmentation identifies the user’s skin. This could power an experience in which a user wears virtual tattoos that stop at the boundaries of their clothes and move along with their tracked body parts — almost perfectly lifelike. Object Segmentation: Gives us the ability to perform occlusion so that AR objects might be partially hidden under or beneath real ones as they would logically be in reality, or even to “cut and paste” those real objects into virtual space.Achieving Segmentation
How do we achieve segmentation? Approximating shapes from a database would never be even close to realistic. Identifying boundaries by color contrast is a no go for people with hair or clothes that are close to their skin tone. Establishing a body position at experience start (“Strike a pose as per this outline:”) and then tracking changes over time is clunky and unreliable. We need something near-instantaneous that can recalibrate on the fly and have a wide margin of approximation for adjustment. We need something smarter! Of course, then, the answer is artificial intelligence. These days, “AI” is more often than not a buzzword thrown around to mean everything and yet nothing at all, but in this case we have a practical application for a specific form of AI: neural networks. These are machine learning algorithms that can be trained to recognize shapes or perform operations on data. By taking huge sets of data (for example, thousands and thousands of photos with and without people in them) and comparing them, neural networks have been trained to recognize hands, feet, faces, hair, horses, cars, and various other animate and inanimate entities…perfect for our use case.Training a neural network to identify objects and remove backgrounds. Credit to Cyril Diagne, 2020.
Next step: adding an effect to the face. Problem: we don’t have a built-in “clothes” segmentation! We have “hair” and “body”, but nothing that will allow us to easily separate face and skin from clothes. Snapchat’s Lens Studio is nice enough to provide built-in “upper garment” segmentation, but Spark AR is not so forthcoming. We’ll have to get a little creative with the options available to us. Quick thinkers may have already seen the simple mathematical solution. Our segmentation options are “person”, “hair”, and “skin”. Person minus hair and skin is…exactly what we’re looking for. By combining the hair and skin segmentation textures and subtracting that from the person texture, we get the clothes left behind. Let’s get cracking on what exactly this looks like in patch form.