- Capture the 3D motion of a group of people engaged in a social interaction
- The studio structure
-
Massively multiview system
- Hardware-based synchronized 480 VGA cameras views
- 640 x 480 resolution, 25 fps
- Hardware-based synchronized 31 HD cameras views
- 1920 x 1080 resolution, 30 fps
- Camera calibration
- 10 RGB-D sensors (10 Kinect Ⅱ Sensors)
- 1920 x 1080 (RGB), 512 x 424 (depth), 30 fps
- Synchronized with HD cameras
- Hardware-based synchronized 480 VGA cameras views
-
Multiple people
- 3D body pose
- 3D facial landmarks
-
Data list examples
- VoxelPose: 160422_ultimatum1, 160224_haggling1, 160226_haggling1 etc.
|-- 160224_haggling1
| |-- hdImgs
| |-- hdvideos
| |-- hdPose3d_stage1_coco19
| |-- calibration_160224_haggling1.json
|-- 160226_haggling1
|-- ...
|-- 160224_haggling1
| |-- calibration_160422_haggling1.json
| |-- 00_01
| | | |-- annotations
| | | | | |-- 00_03_00000206_gt.json
| | | | | |-- ...
| | | |-- origin_images
| | | | | |-- 00_03_00000206.jpg
| | | | | |-- ...
| | | |-- vis_images
| | | | | |-- 00_03_00000206_vis.jpg
| | | | | |-- ...
| |-- 00_02
| |-- ...
|-- 160226_haggling1
|-- ...
{"bodies": [
{ "view_id": view id (HD camera id),
"id": person id,
"num_person": number of the people,
"input_width": image width (1920),
"input_height": image height (1080),
"transformed_joints_3d": GT transformed joints 3d,
"transformed_joints_3d_vis": visualization flags of joints 3d,
"projected_joints_2d": GT joints 2d projected by joints 3d using camera parameters in each view,
"projected_joints_2d_vis": visualization flags of joints 2d,
"bbox": bounding boxes created by adding and subtracting an offset from the min/maxvalues of x and y values of each person's GT 2D keypoint,
"bbox_clip": bbox cliped by image size,
"vis_bbox": bounding boxes created by adding and subtracting an offset from the min/max values of x and y values of each person's GT 2D keypoint that visualization flag value is true,
"vis_bbox_clip": vis_bbox cliped by image size }
, ...
]
}
0: Neck
1: Nose
2: BodyCenter (center of hips)
3: lShoulder
4: lElbow
5: lWrist,
6: lHip
7: lKnee
8: lAnkle
9: rShoulder
10: rElbow
11: rWrist
12: rHip
13: rKnee
14: rAnkle
15: lEye
16: lEar
17: rEye
18: rEar
- 3d keypoints: [x0, y0, z0, x1, y1, z1, ...]
- 2d keypoints: [x0, y0, x1, y1, ...]
- Box format: [left_top_x, left_top_y, right_bottom_x, right_bottom_y]
- A box of people that has 3d coordinates but is not visible in the 2d view has coordinates [0, 0, 0, 0]
docker pull qbxlvnf11docker/panoptic_dataset_env:latest
nvidia-docker run -it -p 9000:9000 -e GRANT_SUDO=yes --user root --name panoptic_dataset_env --shm-size=4G -v {folder}:/workspace -w /workspace qbxlvnf11docker/panoptic_dataset_env bash
- Select the dataset and camera id to extract annotations by editing config file
python main.py --panoptic_config_file_path ./Panoptic_configs/Panoptic_annotations_builder_config.yaml
- Variable 'datasets': select the sequences to download
- Variable 'nodes': select the camera ids to download
apt-get install wget
cd ./Panoptic_download_toolbox_scripts
./getData_list.sh
./extractAll_list.sh
@article{CMU Panoptic Dataset,
title={Panoptic Studio: A Massively Multiview System for Social Interaction Capture},
author={Hanbyul Joo et al.},
journal = {arXiv},
year={2016}
}
https://www.cs.cmu.edu/~hanbyulj/panoptic-studio/
https://paperswithcode.com/dataset/cmu-panoptic
https://github.com/CMU-Perceptual-Computing-Lab/panoptic-toolbox
https://github.com/microsoft/voxelpose-pytorch