How do I trim padding off of a frame? #851

oczkoisse · 2021-08-11T01:43:12Z

oczkoisse
Aug 11, 2021

Overview

I'm trying to render a yuvj420p frame as an OpenGL texture. In my testing, frames that do not have padding work great. However, frames with padding are a mess. To prepare data for texture, I'm using the following function which is mostly the same as useful_array function in av/video/frame.pyx.

def _remove_padding(self, plane):
    """Remove padding from a video plane.

    Args:
        plane (av.video.plane.VideoPlane): the plane to remove padding from

    Returns:
        numpy.array: an array with proper memory aligned width
    """
    buf_width = plane.line_size
    bytes_per_pixel = 1
    frame_width = plane.width * bytes_per_pixel
    arr = np.frombuffer(plane, np.uint8)
    if buf_width != frame_width:
        arr = arr.reshape(-1, buf_width)[:, :frame_width]
    return arr.reshape(-1, frame_width)

With frames that have padding, the above function does not work properly.

Expected behavior

The given function should be able to remove any padding from the frame and be rendered as a texture properly.

Actual behavior

The output frame is completely messed up:

Traceback:

N\A

Investigation

I felt that the above behavior may be a result of plane.width being off by some pixels. In my case, the width is 1198 px for which 16 pixel alignment would be 1200 px. So, I adjusted the above function to align the width value:

def _remove_padding(self, plane):
    """Remove padding from a video plane.

    Args:
        plane (av.video.plane.VideoPlane): the plane to remove padding from

    Returns:
        numpy.array: an array with proper memory aligned width
    """
    buf_width = plane.line_size
    bytes_per_pixel = 1
    frame_width = plane.width * bytes_per_pixel
    arr = np.frombuffer(plane, np.uint8)
    if buf_width != frame_width:
        align_to = 16
        # Frame width that is aligned up with a 16 pixel boundary
        # See avcodec_align_dimensions2 function and FFALIGN macro
        frame_width = (frame_width + align_to - 1) & ~(align_to - 1)
        # Slice (create a view) at the aligned boundary
        arr = arr.reshape(-1, buf_width)[:, :frame_width]
    return arr.reshape(-1, frame_width)

Now at least Y plane seems to be aligned properly, but chroma planes still are not. In my case, the chroma width is 599 which when adjusted by the above function comes out to be 608.

If I adjust chroma planes' width manually to 600, it seems to work:

Research

I have done the following:

Additional context

This answer on SO seems useful.

I also tried using frame.to_ndarray() directly with same results.

def frame_to_ycbcr(self, frame):
    frame_data = frame.to_ndarray(format=frame.format.name)
    start_offset = 0
    end_offset = frame.height
    y_data = frame_data[start_offset:end_offset, :]
    start_offset = end_offset
    end_offset = (5 * frame.height) // 4
    cb_data = frame_data[start_offset:end_offset, :].reshape(-1, frame.width // 2)
    start_offset = end_offset
    end_offset = None
    cr_data = frame_data[start_offset:, :].reshape(-1, frame.width // 2)
    return y_data, cb_data, cr_data

FFALIGN macro
avcodec_align_dimensions2

oczkoisse · 2021-08-11T02:48:14Z

oczkoisse
Aug 11, 2021
Author

I'm attaching the sample video:

RGB.video.mov

0 replies

jlaine · 2021-12-30T15:35:54Z

jlaine
Dec 30, 2021
Maintainer

Is anything expected from the maintainers here or is this just for future reference from users?

0 replies

oczkoisse · 2021-12-30T18:56:31Z

oczkoisse
Dec 30, 2021
Author

I was hoping that a maintainer or someone experienced with PyAV/ffmpeg would be willing to help as to how to go about trimming padding off of frames.

It seems that _remove_padding function adapted from useful_array function in PyAV isn't trimming padding correctly because the plane width returned by PyAV is incorrect. Making a guess about plane width works in this particular case, but it's not clear how to generalize this across all kinds of videos that may or may not have padded frames. It serves only to show that choosing a plane width different from that currently returned by PyAV may be the right thing to do in some cases.

Now, I could be doing this all wrong in the first place, hence the request for help. But if I'm not, then it may be a bug in PyAV which may be worth looking into.

0 replies

jlaine · 2021-12-31T14:30:48Z

jlaine
Dec 31, 2021
Maintainer

OK I've just enabled the "discussions" feature, this seems like a good place to engage with community members.

0 replies

FirefoxMetzger · 2022-02-28T08:17:51Z

FirefoxMetzger
Feb 28, 2022

Yep alignment problems are a pain to debug if you are unfamiliar with numpy internals. Try this and see if it solves the problem:

def _remove_padding(self, plane):
    """Remove padding from a video plane.

    Args:
        plane (av.video.plane.VideoPlane): the plane to remove padding from

    Returns:
        numpy.array: an array with proper memory aligned width
    """
    buf_width = plane.line_size
    bytes_per_pixel = 1
    frame_width = plane.width * bytes_per_pixel
    arr = np.frombuffer(plane, np.uint8)
    if buf_width != frame_width:
        arr = arr.reshape(-1, buf_width)[:, :frame_width]
    return np.ascontiguousarray(arr.reshape(-1, frame_width))  # I changed this line

Regarding the approach of using to_ndarray on a yuv420p frame: I never understood why PyAV chooses to return an array of this shape. Luma can be reconstructed somewhat easily, but chroma is very awkward to get from this shape:

def frame_to_ycbcr(self, frame):
    frame_data = frame.to_ndarray()  # no need to pass the format explicitly
    start_offset = 0
    end_offset = frame.height
    y_data = frame_data[start_offset:end_offset, :]  # y-plane gets reconstructed correctly
    start_offset = end_offset
    end_offset = (5 * frame.height) // 4  # this is weird, try:
    end_offset = (frame.height // 4) // 2
    cb_data = frame_data[start_offset:end_offset, :].reshape(-1, frame.width // 2)
    start_offset = end_offset
    end_offset = None
    cr_data = frame_data[start_offset:, :].reshape(-1, frame.width // 2)
    return y_data, cb_data, cr_data

I didn't test this though, so take it with some salt. The logic is that chroma is subsampled by a factor of 4 in height (hence we can reconstruct the plane's height via frame.height // 4. Further, chroma is subsampled by a factor of 2 in width, which means that there are 2 chroma rows in every 1 row in frame_data, so the total number of rows in frame_data that belong to a chroma plane should be (frame.height // 4) // 2.

Edit You could sidestep all of this by promoting your frames to YUV444p: y_data, cb_data, cr_data = list(frame.reformat("yuv444p").to_ndarray())

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I trim padding off of a frame? #851

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

How do I trim padding off of a frame? #851

oczkoisse Aug 11, 2021

Overview

Expected behavior

Actual behavior

Investigation

Research

Additional context

Replies: 5 comments

oczkoisse Aug 11, 2021 Author

jlaine Dec 30, 2021 Maintainer

oczkoisse Dec 30, 2021 Author

jlaine Dec 31, 2021 Maintainer

FirefoxMetzger Feb 28, 2022

oczkoisse
Aug 11, 2021

oczkoisse
Aug 11, 2021
Author

jlaine
Dec 30, 2021
Maintainer

oczkoisse
Dec 30, 2021
Author

jlaine
Dec 31, 2021
Maintainer

FirefoxMetzger
Feb 28, 2022