This guide outlines how to enhance and introduce new features to the HoloKit Unity SDK.
HoloKit Unity SDK optimizes render parameters based on iPhone hardware specifications, ensuring accurate viewport positioning and sizing.
To add a new iPhone model to the Assets/ScriptableObjects/iOSPhoneModelList
asset:
-
Model Name
: This is Apple's identifier for each device. Locate the model name here. -
Description
: Provides clarity for internal developers, supplementing the less descriptive Model Name. -
Screen Resolution
: Found on Apple’s official website, list the screen height (larger number) first, followed by the width. -
Screen DPI
: Also obtainable from Apple’s website, refers to the screen's DPI (Dots Per Inch), synonymous with PPI (Pixels Per Inch). -
Viewport Bottom Offset
: Indicates the offset between the device screen bottom and the viewport bottom measured in meters. For a detailed explanation, see the Phone Calibration Guide. However, you can only measure this property manually. You can leave this value to 0 and useScreen Bottom Border
instead. -
Camera Offset
: The 3D distance from the iPhone's main camera to the screen's bottom center, detailed in the Phone Calibration Guide. Calculate this using Accessory Design Guidelines for Apple Devices. -
Screen Bottom Border
: Distance between the screen's bottom border and the phone's frame bottom. Determine this using the Accessory Design Guidelines for Apple Devices. You can use eitherViewport Bottom Offset
orScreen Bottom Border
to set viewport positioning, but not both. WhileViewport Bottom Offset
offers higher accuracy, it is more challenging to calculate. In contrast,Screen Bottom Border
is easier to compute but less precise. If both values are provided, the SDK defaults to usingScreen Bottom Border
for viewport positioning. IfViewport Bottom Offset
is set to 0, the SDK will automatically utilizeScreen Bottom Border
.
The Accessory Design Guidelines for Apple Devices provide precise specifications for Apple devices, crucial for accurately calculating Camera Offset
and Screen Bottom Border
.
The HoloKit Unity SDK primarily focuses on two functionalities: stereoscopic rendering and hand pose detection. This section delves into the intricacies of these systems.
The SDK offers two rendering modes: Mono
and Stereo
.
-
Mono
Mode: Operates as a standard ARFoundation app, utilizingARCameraBackground
. -
Stereo
Mode: The SDK's key feature, it renders two separate viewports on the iPhone screen using dual cameras against a black background.HoloKitCameraManager
, attached to theMain Camera
GameObject, transforms an ARFoundation camera into a HoloKit camera.
Key Components:
-
Mono Camera
: Used inMono
mode. -
Center Eye Pose
: Represents the midpoint between the user's eyes inStereo
mode, aligning with the iPhone camera inMono
mode. -
Black Camera
: Renders the black background inStereo
mode. -
Left/Right Eye Camera
: Render the respective viewports inStereo
mode. -
IPD
: stands for inter-pupillary distance, the distance between the user's two eyes. -
Far Clip Plane
: Sets the farthest visible boundary for the stereo cameras. -
Show Alignment Marker In Stereo Mode
: An optional feature to display an alignment marker in the top-right corner inStereo
mode. -
Supported Mono Screen Orientations
: The list of supported screen orientations inMono
mode. Since the screen orientation is locked toLandscapeLeft
underStereo
mode, when switching back toMono
mode, the SDK needs to know which screen orientations are supported. -
HoloKit Generation
: Specifies the model of HoloKit being used. -
iOS Phone Model List
: Specs for supported iOS devices. -
Default Android Phone Model List
: Specs for supported Android devices. -
Custom Android Phone Model List
: Custom list for unsupported Android models. Refer to the Phone Calibration Guide for custom model specifications.
HoloKit Unity SDK offers two hand pose detection releated features: hand tracking and hand gesture recognition, both utilizing Apple Vision's hand pose detection algorithm. This algorithm, primarily 2D in nature, identifies the 2D coordinates of 21 joints in the user's hands. While these coordinates are adequate for recognizing hand gestures, they fall short for 3D hand tracking, which aims to ascertain the 3D positions of these joints. To overcome this, we integrate the Apple Vision algorithm with the iPhone's LiDAR sensor. This combination allows us to first pinpoint the 21 joints' 2D coordinates and then match them with their depth values from the LiDAR sensor's depth map, thus achieving accurate 3D tracking.
Scripts related to hand pose detection are located in the Runtime\iOS
folder, reflecting their exclusivity to iOS devices. We have written native Objective-C code to capture the user's hand pose and facilitate data marshalling from Objective-C to C#. This approach underpins the implementation of both hand tracking and hand gesture recognition functionalities.
The HandTrackingManager
script manages the provision of 3D positions for the user's hand joints, while the HandGestureRecognitionManager
script identifies the user's hand gestures. Both scripts utilize AppleVisionHandPoseDetector
to server as a conduit, fetching native data from the Objective-C side.
The Runtime/iOS/NativeCode
folder houses all the native Objective-C code for the SDK. Each native functionality is represented by three types of files: a header file, an implementation file, and a bridge file. The header file outlines the interface of the native class, the implementation file provides its actual functionality, and the bridge file facilitates marshalling between the unmanaged (Objective-C) and managed (C#) code. For instance, within the Runtime/iOS/NativeCode/AppleVisionHandPoseDetector-C-Bridge
file, there are four native C functions. These functions correspond to and are linked with four marshalling functions in the AppleVisionHandPoseDetector
C# class.
AppleVisionHandPoseDetector
offers the flexibility to process video frame images in either 2D or 3D mode. The HandGestureRecognitionManager
requires only 2D hand poses, while the HandTrackingManager
necessitates 3D hand poses. Consequently, when using HandGestureRecognitionManager
alone, 2D hand poses can be obtained without activating the LiDAR sensor. In contrast, HandTrackingManager
usage mandates turning on the LiDAR sensor to capture the user's 3D hand poses. Both managers rely on AppleVisionHandPoseDetector
to access hand data from native code. When operated concurrently, they share the same AppleVisionHandPoseDetector
instance, optimizing efficiency.
See HoloKit Low Latency Tracking repository for detailed explanation.
This SDK, being a Unity package, requires a carrier Unity project for development and testing. If you want to contribute or modify the SDK, start by cloning the carrier project. This repository includes the SDK as a git submodule. Within the carrier project, the SDK is integrated as a local folder, allowing for direct modifications and testing.
In the Assets/Samples
folder of the carrier project, you'll find multiple SDK samples. To update existing samples or create new ones, simply copy the desired sample folder into the SDK's Samples~
folder. After making changes or additions, don't forget to update the package.json
file in the SDK to reflect these modifications. This process ensures that your contributions are properly integrated and accessible within the SDK structure.
The inherent limitation of using Apple Vision's natively 2D hand pose detection algorithm for 3D hand tracking introduces certain inaccuracies. A notable issue arises from the misalignment between hand pose detection and the depth map. Often, fingertips are incorrectly mapped to the background of the depth map, resulting in exaggerated depth values. Currently, we employ a basic method of using the second fingertip to correct these anomalies, but this leads to occasional glitches. A more effective solution would involve interpolating the movement of each hand joint between frames to achieve smoother results. Although this correction is presently handled in the Objective-C code, it could potentially be implemented either there or on the C# side for improved performance.
At present, our system recognizes only two hand gestures: Pinched and Five. Theoretically, it's feasible to expand this range to include gestures like One, Two, Three, and others. However, the main challenge lies in the potential conflicts that may arise when HandGestureRecognitionManager
supports a broader array of gestures simultaneously. For instance, distinguishing between gestures like Pinched and Four can be particularly difficult. This area calls for further research and development to improve accuracy and ensure that the system can robustly recognize a wider variety of gestures without confusion.
There is a continuous need for further optimization in our low latency tracking system. This involves refining the system's responsiveness and accuracy to minimize latency even further. For detailed information and specific areas of focus, please refer to the HoloKit Low Latency Tracking repository.