Skip to content

Latest commit

 

History

History
760 lines (551 loc) · 29 KB

data.md

File metadata and controls

760 lines (551 loc) · 29 KB

Data Description

All the files mentioned below can be downloaded here.

Google Drive

Baidu Netdisk

Valid data includes:

Dataset Pose Estimator 3D Pose 2D Pose SMPL
Sub-JHMDB SimplePose
3DPW EFT
3DPW PARE
3DPW SPIN
Human3.6M FCN
AIST++ SPIN

All the models have the same settings with the original paper (e.g. training dataset and hyperparameters). There results are tested by us for fair comparison. We have make sure the dataset we test on have no overlap with the training dataset the model trained on. Specifically, we used architecture of '384x384_pose_resnet_101_d256d256d256' with trained weight on MPII for Simplepose.

If you want to add your own datasets, please

  • Organize the groundtruth data format following our settings(we recommend you to use the same format as follows) to generate \data\groundtruth_poses\[new_dataset]\[new_dataset]_gt_test.npz and \data\groundtruth_poses\[new_dataset]\[new_dataset]_gt_train.npz.

  • Organize the detected data format following our settings(we recommend you to use the same format as follows) to generate \data\detected_poses\[new_dataset]\[estimator]\[new_dataset]_[estimator]_test.npz and \data\detected_poses\[new_dataset]\[estimator]\[new_dataset]_[estimator]_train.npz.

  • Add groundtruth data path, detected data path, keypoint number, and keypoint root in lib\core\config.py.

  • write \lib\core\dataset\[new_dataset]_dataset.py following the files under \lib\core\dataset\.

How to transfer custom data into our data?:

  • First, our 3d position is the root-relative 3d position in meter; 2d position is the normalized 2d pixel position in an image; SMPL parameters are the original outputs from estimators (e.g., PARE).

  • To facilitate the transformation from raw output data into our data, we provide these transformation functions as follows.

    • For 2D pose transformation, if inputting the 2d positions under the pixel coordination, you can use normalize_screen_coordinates to normalize the pixel-wise 2d position into [-1, 1], and then put them into the model for training and inference. Lastly, you can use image_coordinates to denormalize the position into a pixel unit for error calculation and visualization.
    • For 3D pose transformation, if inputting the 3d positions under the world coordinate, you can use world_to_camera and then subtract the root 3d position to get the root-relative 3d position in meter. We calculate the MPJPE and Accel under the root-relative 3d position in millimeter. Also, you can use camera_to_world for visualization.
    • Besides, if you need to get the projected 2d positions from 3d positions under the camera coordinate, you can use project_to_2d with distortion parameters or project_to_2d_linear without distortion parameters.

3DPW

The sructure of the data should look like this:

|-- data
    |-- groundtruth_poses
        |-- pw3d 
            |-- pw3d_gt_test.npz
            |-- pw3d_gt_train.npz
        |-- ...
    |-- detected_poses
        |-- pw3d
            |-- spin
                |-- pw3d_spin_test.npz
                |-- pw3d_spin_train.npz
            |-- pare
                |-- pw3d_pare_test.npz
                |-- pw3d_pare_train.npz
            |-- eft
                |-- pw3d_eft_test.npz
                |-- pw3d_eft_train.npz
        |-- ...
    |-- checkpoints
    |-- smpl
    |-- videos
  • pw3d_gt_test.npz

    For ease of use, we processed the raw testing set of 3DPW dataset and re-stored the valid poses (campose_valid==1) in testing set.

    The .npz-file contains a dictionary with the following fields:

    • imgname

      Strings containing the image and sequence name with format [sequence_name]/[image_name]. The length of the list is 37 and the order of the sequence is as follows. Duplicate sequence name means there are two person in one video sequence. There are 35515 frames in total. The order of parameter shape, pose, and joints_3d are the same with imgname

      downtown_enterShop_00
      flat_packBags_00
      downtown_walkBridge_01
      downtown_bus_00
      downtown_bus_00
      downtown_weeklyMarket_00
      downtown_walkUphill_00
      downtown_warmWelcome_00
      downtown_warmWelcome_00
      office_phoneCall_00
      office_phoneCall_00
      downtown_crossStreets_00
      downtown_crossStreets_00
      downtown_upstairs_00
      downtown_stairs_00
      downtown_walking_00
      downtown_walking_00
      downtown_downstairs_00
      downtown_car_00
      downtown_car_00
      flat_guitar_01
      downtown_arguing_00
      downtown_arguing_00
      downtown_runForBus_00
      downtown_runForBus_00
      downtown_rampAndStairs_00
      downtown_rampAndStairs_00
      downtown_windowShopping_00
      downtown_cafe_00
      downtown_cafe_00
      downtown_bar_00
      downtown_bar_00
      downtown_sitOnStairs_00
      downtown_sitOnStairs_00
      downtown_runForBus_01
      downtown_runForBus_01
      outdoors_fencing_01
    • shape

      Ground_truth SMPL shape parameter. The shape of each sequence is corresponding_sequence_length*10.

    • pose

      Ground_truth SMPL pose parameter. The shape of each sequence is corresponding_sequence_length*72.

    • joints_3d

      Ground_truth 3D joint position. The shape of each sequence is corresponding_sequence_length*(17*3). Joints are in Human3.6M-format:

      'hip',  # 0
      'lhip',  # 1
      'lknee',  # 2
      'lankle',  # 
      'rhip',  # 4
      'rknee',  # 5
      'rankle',  # 6
      'Spine (H36M)',  # 7
      'neck',  # 8
      'Head (H36M)',  # 9
      'headtop',  # 10
      'lshoulder',  # 11
      'lelbow',  # 12
      'lwrist',  # 13
      'rshoulder',  # 14
      'relbow',  # 15
      'rwrist',  # 16
  • pw3d_gt_train.npz

    For ease of use, we processed the raw training set of 3DPW dataset and re-stored the valid poses (campose_valid==1) in training set.

    The .npz-file contains a dictionary with the following fields:

    • imgname

      Strings containing the image and sequence name with format [sequence_name]/[image_name]. The length of the list is 34 and the order of the sequence is as follows. Duplicate sequence name means there are two person in one video sequence. There are 22735 frames in total. The order of parameter shape, pose, and joints_3d are the same with imgname

      outdoors_freestyle_00
      courtyard_laceShoe_00
      courtyard_bodyScannerMotions_00
      courtyard_capoeira_00
      courtyard_capoeira_00
      courtyard_relaxOnBench_00
      courtyard_giveDirections_00
      courtyard_giveDirections_00
      courtyard_box_00
      outdoors_climbing_02
      outdoors_slalom_01
      courtyard_arguing_00
      courtyard_arguing_00
      outdoors_climbing_00
      courtyard_shakeHands_00
      courtyard_shakeHands_00
      courtyard_relaxOnBench_01
      courtyard_captureSelfies_00
      courtyard_captureSelfies_00
      courtyard_golf_00
      courtyard_backpack_00
      outdoors_climbing_01
      courtyard_goodNews_00
      courtyard_goodNews_00
      courtyard_rangeOfMotions_00
      courtyard_rangeOfMotions_00
      courtyard_dancing_01
      courtyard_dancing_01
      courtyard_basketball_00
      courtyard_basketball_00
      outdoors_slalom_00
      courtyard_jacket_00
      courtyard_warmWelcome_00
      courtyard_warmWelcome_00
    • shape

      Ground_truth SMPL shape parameter. The shape of each sequence is corresponding_sequence_length*10.

    • pose

      Ground_truth SMPL pose parameter. The shape of each sequence is corresponding_sequence_length*72.

    • joints_3d

      Ground_truth 3D joint position. The shape of each sequence is corresponding_sequence_length*(17*3). Joints are in Human3.6M-format.

  • pw3d_spin_test.npz

    The .npz-file contains a dictionary with the following fields:

    • imgname

      Same with pw3d_gt_test.npz

    • shape

      The predicted SMPL shape parameter, with the same format as pw3d_gt_test.npz

    • pose

      The predicted SMPL pose parameter, with the same format as pw3d_gt_test.npz

    • camera

      The predicted camera parameter. The shape of each sequence is corresponding_sequence_length*3.

    • joints_3d

      The predicted 3D joint position, with the same format as pw3d_gt_test.npz

  • pw3d_spin_train.npz

    The .npz-file contains a dictionary with the following fields:

    • imgname

      Same with pw3d_gt_train.npz

    • shape

      The predicted SMPL shape parameter, with the same format as pw3d_gt_train.npz

    • pose

      The predicted SMPL pose parameter, with the same format as pw3d_gt_train.npz

    • camera

      The predicted camera parameter. The shape of each sequence is corresponding_sequence_length*3.

    • joints_3d

      The predicted 3D joint position, with the same format as pw3d_gt_train.npz

  • pw3d_pare_test.npz

    Same with pw3d_spin_test.npz

  • pw3d_pare_train.npz

    Same with pw3d_spin_train.npz

  • pw3d_eft_test.npz

    Same with pw3d_spin_test.npz

  • pw3d_eft_train.npz

    Same with pw3d_spin_train.npz

Human3.6M

The sructure of the data should look like this:

|-- data
    |-- groundtruth_poses
        |-- h36m 
            |-- h36m_gt_test.npz
            |-- h36m_gt_train.npz
        |-- ...
    |-- detected_poses
        |-- h36m
            |-- fcn
                |-- h36m_fcn_test.npz
                |-- h36m_fcn_train.npz
        |-- ...

  • h36m_gt_test.npz

    For ease of use, we processed the raw testing set of Human3.6M dataset and re-stored the valid poses in testing set.

    The .npz-file contains a dictionary with the following fields:

    • imgname

      Strings containing the subject id, action name, camera id and image id with format S[subject_id]/[action_name]/camera[camera_id]/[image_id]. The length of the list is 236. There are 543344 frames in total. The order of parameter joints_3d is the same with imgname. The camera parameters are the same order with the dictionary shown as follows.

      h36m_cameras_intrinsic_params = [
          {
              'id': '54138969',
              'center': [512.54150390625, 515.4514770507812],
              'focal_length': [1145.0494384765625, 1143.7811279296875],
              'radial_distortion': [-0.20709891617298126, 0.24777518212795258, -0.0030751503072679043],
              'tangential_distortion': [-0.0009756988729350269, -0.00142447161488235],
              'res_w': 1000,
              'res_h': 1002,
              'azimuth': 70,  # Only used for visualization
          },
          {
              'id': '55011271',
              'center': [508.8486328125, 508.0649108886719],
              'focal_length': [1149.6756591796875, 1147.5916748046875],
              'radial_distortion': [-0.1942136287689209, 0.2404085397720337, 0.006819975562393665],
              'tangential_distortion': [-0.0016190266469493508, -0.0027408944442868233],
              'res_w': 1000,
              'res_h': 1000,
              'azimuth': -70,  # Only used for visualization
          },
          {
              'id': '58860488',
              'center': [519.8158569335938, 501.40264892578125],
              'focal_length': [1149.1407470703125, 1148.7989501953125],
              'radial_distortion': [-0.2083381861448288, 0.25548800826072693, -0.0024604974314570427],
              'tangential_distortion': [0.0014843869721516967, -0.0007599993259645998],
              'res_w': 1000,
              'res_h': 1000,
              'azimuth': 110,  # Only used for visualization
          },
          {
              'id': '60457274',
              'center': [514.9682006835938, 501.88201904296875],
              'focal_length': [1145.5113525390625, 1144.77392578125],
              'radial_distortion': [-0.198384091258049, 0.21832367777824402, -0.008947807364165783],
              'tangential_distortion': [-0.0005872055771760643, -0.0018133620033040643],
              'res_w': 1000,
              'res_h': 1002,
              'azimuth': -110,  # Only used for visualization
          },
      ]
      
      h36m_cameras_extrinsic_params = {
          'S1': [
              {
                  'orientation': [0.1407056450843811, -0.1500701755285263, -0.755240797996521, 0.6223280429840088],
                  'translation': [1841.1070556640625, 4955.28466796875, 1563.4454345703125],
              },
              {
                  'orientation': [0.6157187819480896, -0.764836311340332, -0.14833825826644897, 0.11794740706682205],
                  'translation': [1761.278564453125, -5078.0068359375, 1606.2650146484375],
              },
              {
                  'orientation': [0.14651472866535187, -0.14647851884365082, 0.7653023600578308, -0.6094175577163696],
                  'translation': [-1846.7777099609375, 5215.04638671875, 1491.972412109375],
              },
              {
                  'orientation': [0.5834008455276489, -0.7853162288665771, 0.14548823237419128, -0.14749594032764435],
                  'translation': [-1794.7896728515625, -3722.698974609375, 1574.8927001953125],
              },
          ],
          'S5': [
              {
                  'orientation': [0.1467377245426178, -0.162370964884758, -0.7551892995834351, 0.6178938746452332],
                  'translation': [2097.3916015625, 4880.94482421875, 1605.732421875],
              },
              {
                  'orientation': [0.6159758567810059, -0.7626792192459106, -0.15728192031383514, 0.1189815029501915],
                  'translation': [2031.7008056640625, -5167.93310546875, 1612.923095703125],
              },
              {
                  'orientation': [0.14291371405124664, -0.12907841801643372, 0.7678384780883789, -0.6110143065452576],
                  'translation': [-1620.5948486328125, 5171.65869140625, 1496.43701171875],
              },
              {
                  'orientation': [0.5920479893684387, -0.7814217805862427, 0.1274748593568802, -0.15036417543888092],
                  'translation': [-1637.1737060546875, -3867.3173828125, 1547.033203125],
              },
          ],
          'S6': [
              {
                  'orientation': [0.1337897777557373, -0.15692396461963654, -0.7571090459823608, 0.6198879480361938],
                  'translation': [1935.4517822265625, 4950.24560546875, 1618.0838623046875],
              },
              {
                  'orientation': [0.6147197484970093, -0.7628812789916992, -0.16174767911434174, 0.11819244921207428],
                  'translation': [1969.803955078125, -5128.73876953125, 1632.77880859375],
              },
              {
                  'orientation': [0.1529948115348816, -0.13529130816459656, 0.7646096348762512, -0.6112781167030334],
                  'translation': [-1769.596435546875, 5185.361328125, 1476.993408203125],
              },
              {
                  'orientation': [0.5916101336479187, -0.7804774045944214, 0.12832270562648773, -0.1561593860387802],
                  'translation': [-1721.668701171875, -3884.13134765625, 1540.4879150390625],
              },
          ],
          'S7': [
              {
                  'orientation': [0.1435241848230362, -0.1631336808204651, -0.7548328638076782, 0.6188824772834778],
                  'translation': [1974.512939453125, 4926.3544921875, 1597.8326416015625],
              },
              {
                  'orientation': [0.6141672730445862, -0.7638262510299683, -0.1596645563840866, 0.1177929937839508],
                  'translation': [1937.0584716796875, -5119.7900390625, 1631.5665283203125],
              },
              {
                  'orientation': [0.14550060033798218, -0.12874816358089447, 0.7660516500473022, -0.6127139329910278],
                  'translation': [-1741.8111572265625, 5208.24951171875, 1464.8245849609375],
              },
              {
                  'orientation': [0.5912848114967346, -0.7821764349937439, 0.12445473670959473, -0.15196487307548523],
                  'translation': [-1734.7105712890625, -3832.42138671875, 1548.5830078125],
              },
          ],
          'S8': [
              {
                  'orientation': [0.14110587537288666, -0.15589867532253265, -0.7561917304992676, 0.619644045829773],
                  'translation': [2150.65185546875, 4896.1611328125, 1611.9046630859375],
              },
              {
                  'orientation': [0.6169601678848267, -0.7647668123245239, -0.14846350252628326, 0.11158157885074615],
                  'translation': [2219.965576171875, -5148.453125, 1613.0440673828125],
              },
              {
                  'orientation': [0.1471444070339203, -0.13377119600772858, 0.7670128345489502, -0.6100369691848755],
                  'translation': [-1571.2215576171875, 5137.0185546875, 1498.1761474609375],
              },
              {
                  'orientation': [0.5927824378013611, -0.7825870513916016, 0.12147816270589828, -0.14631995558738708],
                  'translation': [-1476.913330078125, -3896.7412109375, 1547.97216796875],
              },
          ],
          'S9': [
              {
                  'orientation': [0.15540587902069092, -0.15548215806484222, -0.7532095313072205, 0.6199594736099243],
                  'translation': [2044.45849609375, 4935.1171875, 1481.2275390625],
              },
              {
                  'orientation': [0.618784487247467, -0.7634735107421875, -0.14132238924503326, 0.11933968216180801],
                  'translation': [1990.959716796875, -5123.810546875, 1568.8048095703125],
              },
              {
                  'orientation': [0.13357827067375183, -0.1367100477218628, 0.7689454555511475, -0.6100738644599915],
                  'translation': [-1670.9921875, 5211.98583984375, 1528.387939453125],
              },
              {
                  'orientation': [0.5879399180412292, -0.7823407053947449, 0.1427614390850067, -0.14794869720935822],
                  'translation': [-1696.04345703125, -3827.099853515625, 1591.4127197265625],
              },
          ],
          'S11': [
              {
                  'orientation': [0.15232472121715546, -0.15442320704460144, -0.7547563314437866, 0.6191070079803467],
                  'translation': [2098.440185546875, 4926.5546875, 1500.278564453125],
              },
              {
                  'orientation': [0.6189449429512024, -0.7600917220115662, -0.15300633013248444, 0.1255258321762085],
                  'translation': [2083.182373046875, -4912.1728515625, 1561.07861328125],
              },
              {
                  'orientation': [0.14943228662014008, -0.15650227665901184, 0.7681233882904053, -0.6026304364204407],
                  'translation': [-1609.8153076171875, 5177.3359375, 1537.896728515625],
              },
              {
                  'orientation': [0.5894251465797424, -0.7818877100944519, 0.13991211354732513, -0.14715361595153809],
                  'translation': [-1590.738037109375, -3854.1689453125, 1578.017578125],
              },
          ],
      }
      
      
    • joints_3d

      Ground_truth 3D joint position. The shape of each sequence is corresponding_sequence_length*(17*3). Joints are in Human3.6M-format.

  • h36m_gt_train.npz

    For ease of use, we processed the raw training set of Human3.6M dataset and re-stored the valid poses in training set.

    The .npz-file contains a dictionary with the following fields:

    • imgname

      Strings containing the subject id, action name, camera id and image id with format S[subject_id]/[action_name]/camera[camera_id]/[image_id]. The length of the list is 600. There are 1559752 frames in total. The order of parameter joints_3d is the same with imgname.

    • joints_3d

      Ground_truth 3D joint position. The shape of each sequence is corresponding_sequence_length*(17*3). Joints are in Human3.6M-format.

  • h36m_fcn_test.npz

    • imgname

      Same with h36m_gt_test.npz

    • joints_3d

      Predicted 3D joint position. The shape of each sequence is corresponding_sequence_length*(17*3). Joints are in Human3.6M-format.

  • h36m_fcn_train.npz

    • imgname

      Same with h36m_gt_train.npz

    • joints_3d

      Predicted 3D joint position. The shape of each sequence is corresponding_sequence_length*(17*3). Joints are in Human3.6M-format.

AIST++

The sructure of the data should look like this:

|-- data
    |-- groundtruth_poses
        |-- aist
            |-- aist_gt_test.npz
            |-- aist_gt_train.npz
        |-- ...
    |-- detected_poses
        |-- aist
            |-- spin
                |-- aist_spin_test.npz
                |-- aist_spin_train.npz
        |-- ...

  • aist_gt_test.npz

    For ease of use, we processed the raw testing set of AIST++ dataset and re-stored the valid poses in testing set.

    The .npz-file contains a dictionary with the following fields:

    • imgname

      Strings containing the sequnce name and image id with format [sequence_name]/[image_id]. The length of the list is 3840. There are 2882640 frames in total. The order of parameter pose, trans, scaling, joints_3d is the same with imgname.

    • pose

      Ground_truth SMPL pose parameter. The shape of each sequence is corresponding_sequence_length*72.

    • trans

      Ground_truth motion 3D trajectory. The shape of each sequence is corresponding_sequence_length*3.

    • scaling

      Ground_truth human body scaling factor. A scalar value for each sequence.

    • joints_3d

      Ground_truth 3D joint position. The shape of each sequence is corresponding_sequence_length*(14*3). The order of the joints are as follows.

      "rankle",    # 0
      "rknee",     # 1 
      "rhip",      # 2 
      "lhip",      # 3 
      "lknee",     # 4 
      "lankle",    # 5  
      "rwrist",    # 6 
      "relbow",    # 7  
      "rshoulder", # 8  
      "lshoulder", # 9  
      "lelbow",    # 10  
      "lwrist",    # 11  
      "neck",      # 12  
      "headtop",   # 13  
      
  • aist_gt_train.npz

    For ease of use, we processed the raw training set of AIST++ dataset and re-stored the valid poses in training set.

    The .npz-file contains a dictionary with the following fields:

    • imgname

      Strings containing the sequnce name and image id with format [sequence_name]/[image_id]. The length of the list is 7292. There are 5916474 frames in total. The order of parameter pose, trans, scaling, joints_3d is the same with imgname.

    • pose

      Ground_truth SMPL pose parameter. The shape of each sequence is corresponding_sequence_length*72.

    • trans

      Ground_truth motion 3D trajectory. The shape of each sequence is corresponding_sequence_length*3.

    • scaling

      Ground_truth human body scaling factor. A scalar value for each sequence.

    • joints_3d

      Ground_truth 3D joint position. The shape of each sequence is corresponding_sequence_length*(14*3). The order of the joints are the same as aist_gt_test.npz.

  • aist_spin_test.npz

    The .npz-file contains a dictionary with the following fields:

    • imgname

      Same with aist_gt_test.npz

    • shape

      The predicted SMPL shape parameter.

    • pose

      The predicted SMPL pose parameter, with the same format as aist_gt_test.npz

    • camera

      The predicted camera parameter. The shape of each sequence is corresponding_sequence_length*3.

    • joints_3d

      The predicted 3D joint position, with the same format as aist_gt_test.npz

  • aist_spin_train.npz

    The .npz-file contains a dictionary with the following fields:

    • imgname

      Same with aist_gt_train.npz

    • shape

      The predicted SMPL shape parameter.

    • pose

      The predicted SMPL pose parameter, with the same format as aist_gt_train.npz

    • camera

      The predicted camera parameter. The shape of each sequence is corresponding_sequence_length*3.

    • joints_3d

      The predicted 3D joint position, with the same format as aist_gt_train.npz

Sub-JHMDB

The sructure of the data should look like this:

|-- data
    |-- groundtruth_poses
        |-- jhmdb
            |-- jhmdb_gt_test.npz
            |-- jhmdb_gt_train.npz
        |-- ...
    |-- detected_poses
        |-- jhmdb
            |-- simplepose
                |-- jhmdb_simplepose_test.npz
                |-- jhmdb_simplepose_train.npz
        |-- ...

  • jhmdb_gt_test.npz

    For ease of use, we processed the raw testing set of Sub-JHMDB dataset and re-stored the valid poses in testing set.

    The .npz-file contains a dictionary with the following fields:

    • imgname

      Strings containing the action name, sequnce name and image id with format [action_name]/[sequence_name]/[image_id]. The length of the list is 261. There are 9228 frames in total. The order of parameter joints_2d is the same with imgname.

    • joints_2d

      Ground_truth 2D joint position. The shape of each sequence is corresponding_sequence_length*(15*2). The order of the joints are as follows.

      1: neck
      2: belly
      3: face
      4: right shoulder
      5: left  shoulder
      6: right hip
      7: left  hip
      8: right elbow
      9: left elbow
      10: right knee
      11: left knee
      12: right wrist
      13: left wrist
      14: right ankle
      15: left ankle
      
  • jhmdb_gt_train.npz

    For ease of use, we processed the raw training set of Sub-JHMDB dataset and re-stored the valid poses in training set.

    The .npz-file contains a dictionary with the following fields:

    • imgname

      Strings containing the action name, sequnce name and image id with format [action_name]/[sequence_name]/[image_id]. The length of the list is 687. There are 24372 frames in total. The order of parameter joints_2d is the same with imgname.

    • joints_2d

      Ground_truth 2D joint position. The shape of each sequence is corresponding_sequence_length*(15*2).

  • jhmdb_simplepose_test.npz

    The .npz-file contains a dictionary with the following fields:

    • imgname

      Same with jhmdb_gt_test.npz

    • joints_2d

      Predicted 2D joint position. The shape of each sequence is corresponding_sequence_length*(15*2).

  • jhmdb_simplepose_train.npz

    The .npz-file contains a dictionary with the following fields:

    • imgname

      Same with jhmdb_gt_train.npz

    • joints_2d

      Predicted 2D joint position. The shape of each sequence is corresponding_sequence_length*(15*2).