Basic doubts and basic ideas #16

jnebrera · 2023-08-14T16:15:27Z

jnebrera
Aug 14, 2023

Hi all, sorry, I opened it first as Issue without noticing you had discussions enabled

I'm currently using PyAV in Google Colab to decode a video file (audio + video), process the video frame using object detection, and resize it, and finally, reencode the video while keeping the original audio.

Unfortunately, by using PyAV we have been hot with a big performance drop compared to Norgear OpenCV video grabber (that lacks audio part)

I'm seeking for a Python alternative that fully exploits hardware acceleration both in the decoding (NVDEC) and encoding (NVEND) parts, but also allows me to work with the audio part (no processing itself needed, but cannot lose it)

Is python-ffmpegio the right library to achieve this?

If yes, in the future I hope to grab the video from a RTSP camera and push the stream to a RTMP server like Youtube and such. Is this doable too?

And a final question, if both are ok, one thing I have in mind is to overlay HTML content into the video stream. I'm aware ffmpeg allows for image and text overlays, but I wonder if anybody knows about such capability for ffmpeg.

Thank you very much for your help.

tikuma-lsuhsc · 2023-08-14T18:00:03Z

tikuma-lsuhsc
Aug 14, 2023
Maintainer

@jnebrera - I'm surprised PyAV lags OpenCV in performance. I thought both are based on FFmpeg codebase (at one point at least).

If the performance is the most important aspect of your project, ffmpegio isn't likely a solution for you because it trades off (small) overhead for convenience features. Also, the package currently is written for processing a single media stream. Handling multiple stream requires you to use the low level functions, at which point you might as well use the subprocess built-in package yourself to 100% eliminate the overhead.

Is ffmpeg faster than pyav/opencv? Advocates of the latter will say they are faster because you can read/write more efficiently (less memory copying) but you may find letting ffmpeg implicitly running multiple threads yielding in a better results. I don't know. Worth comparing.

You need to learn FFmpeg and identify the command arguments that suit your need, and you spawn 2 sp.Popen processes one to decode, other to encode. If you're interested in learning more, I'd be happy to lay out a bit more info later.

If yes, in the future I hope to grab the video from a RTSP camera and push the stream to a RTMP server like Youtube and such. Is this doable too?

If FFmpeg can do it and you can formulate the command line options, this package can as well.

And a final question, if both are ok, one thing I have in mind is to overlay HTML content into the video stream. I'm aware ffmpeg allows for image and text overlays, but I wonder if anybody knows about such capability for ffmpeg.

Yes.

Do some research around the interweb on ffmpeg and see what kind of info you can gather. I'd be happy to help you putting them together.

0 replies

jnebrera · 2023-08-14T20:16:58Z

jnebrera
Aug 14, 2023
Author

Hi Kesh, comments inline

@jnebrera <https://github.com/jnebrera> - I'm surprised PyAV lags OpenCV in performance. I thought both are based on FFmpeg codebase (at one point at least).

I think the reason for the performance drop is because is not using hardware acceleration, and is quite significant, like 5 times less The author claims they will not support hardware decoding (nor they have intention to do so) and hardware encoding is quite difficult to get to as it requires custom compilation and such) I wonder why both decision from the author

If the performance is the most important aspect of your project, ffmpegio isn't likely a solution for you because it trades off (small) overhead for convenience features.

No, performance is not the main goal (wouldn't be using Python then) but clearly not such a hit

Also, the package currently is written for processing a single media stream. Handling multiple stream requires you to use the low level functions, at which point you might as well use the subprocess built-in package yourself to 100% eliminate the overhead.

This is not an issue. Our goal is just single stream butt both audio and video

Is ffmpeg faster than pyav/opencv? Advocates of the latter will say they are faster because you can read/write more efficiently (less memory copying) but you may find letting ffmpeg implicitly running multiple threads yielding in a better results. I don't know. Worth comparing. You need to learn FFmpeg and identify the command arguments that suit your need, and you spawn 2 sp.Popen processes one to decode, other to encode.

Sure, the thing is PyAV doesn't seem to allow for that. In the past I used Vidgear and was able to fully exploit hardware, but doesn't support audio

If you're interested in learning more, I'd be happy to lay out a bit more info later.

Sure. My project is https://github.com/jnebrera/basketball-broadcaster We will check how to work with all hardware solution in Colab. Actually, our goal is to maintain the video processing pipeline fully in the GPU. Decoding with NVDEC, then in some frames object detection, cropping and resizing with CV-CUDA and finally encoding with NVENC The decision logic on where to crop will be done in the CPU based on the object detection data and a Multi Object Tracker If yes, in the future I hope to grab the video from a RTSP camera and push

the stream to a RTMP server like Youtube and such. Is this doable too? If FFmpeg can do it and you can formulate the command line options, this package can as well.

Ok, I have to confirm ffmpeg can push data to RTMP. I recall gstreamer did

Yes. Do some research around the interweb on ffmpeg and see what kind of info you can gather. I'd be happy to help you putting them together.

That would be really appreciated. At some moment I hope the project will be able to live stream games, and in that case, it will be a huge gain to be able to overlay some live scoreboard. There are some interesting providers allowing you to manage scoreboards as html content to overlay in tools like OBS but I didn't find an open source alternative for ffmpeg (I found one for gstreamer from RidgeRun) Kind regards

0 replies

tikuma-lsuhsc · 2023-08-15T02:27:06Z

tikuma-lsuhsc
Aug 15, 2023
Maintainer

Also, the package currently is written for processing a single media stream. Handling multiple stream requires you to use the low level functions, at which point you might as well use the subprocess built-in package yourself to 100% eliminate the overhead.
This is not an issue. Our goal is just single stream butt both audio and video

From media processing perspective, you have 2 streams, 1 video stream and 1 audio stream. The top-level ffmpegio functions and classes (e.g., stream classes, video and audio submodules) are written to handle one stream at a time. You can copy an audio stream from the source video while encoding the new video frames. But, this won't work for your eventual streaming goal.

If you're interested in learning more, I'd be happy to lay out a bit more info later.
1 Decoding with NVDEC,
2 then in some frames object detection,
3 cropping and resizing with CV-CUDA and
4 finally encoding with NVENC

Steps 1, 3, & 4 can be done with ffmpeg.

A goo dfirst place to look is the official ffmpeg wiki:

https://trac.ffmpeg.org/wiki

Then, nvidia has a page dedicated for the ffmpeg as well:

https://docs.nvidia.com/video-technologies/video-codec-sdk/12.0/ffmpeg-with-nvidia-gpu/index.html

Skip the compile and install part as the precompiled binary likely comes with cuda support. This page gives several example ffmpeg commands

The decision logic on where to crop will be done in the CPU based on the object detection data and a Multi Object Tracker

Forget the audio stream and if you want to do this detection, you can use the ffmpegio stream reader class. I would let ffmpeg do the full transcoding from the input file to the output (for the file-based initial dev).

Here is a sketch of how I'd approach it:

import ffmpegio as ff

with ffmpegio.open('myvideo.mp4', 'rv', blocksize=100, **cudadec_options) as fin:
    for i, frames in enumerate(fin): # reads blocksize frames at a time
        j = my_detector(frames):
        if j>=0:
            print(f'my detector found a target on {i*100+j}th frame!')

Once you figure out the target range to extract the movie, then use transcode function

i0 # start frame
i1 # end frame

fs = ff.probe.video_streams_basic('myvideo.mp4', 0)[0]['frame_rate'] # assume the first video stream
ss = i0/fs
to = i1/fs

video_cuda_options = {...} # arguments to do your CUDA based processing

ff.transcode('myvideo.mp4', 'output.mp4', ss=ss, to=to,  **video_cuda_options)

Leaving the transcoding off Python probably be the fastest.

If yes, in the future I hope to grab the video from a RTSP camera and push the stream to a RTMP server like Youtube and such. Is this doable too?

I'd think so. The key is how to cache the video and audio stream as you run the detector. Actually, I'd create a transcoder process which you run/kill by the detection process. These processes can read the input stream separately with different ffmpeg processes.

allowing you to manage scoreboards as html content to overlay in tools like OBS

FFmpeg won't do that unless you're OK overlaying the scoreboard on the video stream. If so, you need to figure out a way to create scoreboard image in PNG or other supported image format.

FFmpeg can do a lot of things and you must learn how to use it by studying its documentation, wiki, and other online resources. I cannot do this part for you, but if you have a FFmpeg command you are testing and need help. Feel free to ask, especially to run the command with ffmpegio.

0 replies

jnebrera · 2023-08-15T07:29:48Z

jnebrera
Aug 15, 2023
Author

Comments inline From media processing perspective, you have 2 streams, 1 video stream and 1

audio stream. The top-level ffmpegio functions and classes (e.g., stream classes, video and audio submodules) are written to handle one stream at a time. You can copy an audio stream from the source video while encoding the new video frames. But, this won't work for your eventual streaming goal.

Keeping audio in the final video is key for a pleasent watching video experience. It is true I could do this in a two stage process, first generate the cropped video, second add the audio from the original video, but as you say, goes against our goal to live stream games Steps 1, 3, & 4 can be done with ffmpeg.

A goo dfirst place to look is the official ffmpeg wiki: https://trac.ffmpeg.org/wiki Then, nvidia has a page dedicated for the ffmpeg as well: https://docs.nvidia.com/video-technologies/video-codec-sdk/12.0/ffmpeg-with-nvidia-gpu/index.html Skip the compile and install part as the precompiled binary likely comes with cuda support. This page gives several example ffmpeg commands The decision logic on where to crop will be done in the CPU based on the object detection data and a Multi Object Tracker Forget the audio stream and if you want to do this detection, you can use the ffmpegio stream reader class. I would let ffmpeg do the full transcoding from the input file to the output (for the file-based initial dev). Here is a sketch of how I'd approach it: import ffmpegio as ff with ffmpegio.open('myvideo.mp4', 'rv', blocksize=100, **cudadec_options) as fin: for i, frames in enumerate(fin): # reads blocksize frames at a time j = my_detector(frames): if j>=0: print(f'my detector found a target on {i*100+j}th frame!') Once you figure out the target range to extract the movie, then use transcode function i0 # start framei1 # end frame fs = ff.probe.video_streams_basic('myvideo.mp4', 0)[0]['frame_rate'] # assume the first video streamss = i0/fsto = i1/fs video_cuda_options = {...} # arguments to do your CUDA based processing ff.transcode('myvideo.mp4', 'output.mp4', ss=ss, to=to, **video_cuda_options) Leaving the transcoding off Python probably be the fastest.

Yes, all could be left to ffmpeg but is a bit more tricky On each frame the object detector or the multi object tracker position players and the ball. Based on that info , a cropping decision is made, to simulate pan, tilt, zoom camera movement on the original ultra panoramic video Thus, neither the cropping nor the resizing is known before hand and they are completely dynamic

FFmpeg won't do that unless you're OK overlaying the scoreboard on the video stream.

That's exactly the goal If so, you need to figure out a way to create scoreboard image in PNG or

other supported image format.

Yes, but as previously, the process is fully dynamic. The image will change every 1/10th of a second Thank you very much for your help. I will keep investigating

1 reply

tikuma-lsuhsc Aug 15, 2023
Maintainer

Keeping audio in the final video is key for a pleasent watching video
experience.

Yes, and even more importantly audio being in sync with video. Having an independent transcoding process (which accepts commands from a main process) minimizes that potential headache.

Thus, neither the cropping nor the resizing is known before hand and they are completely dynamic

What I had in my mind is that FFmpeg supports real-time commands to modify its filters via stdin (or any input pipe). So, once you figure out how to format the video, you can send in the new config. For example,

support the commands which you can inject via sendcmd filter.

There is a caveat, a fairly big one, and that is none of the CUDA versions of these filters enable these commands... So, a GPU solution is a prerequisite, you're sol here.

OK, so if you want to pass everything through Python, I'd go the subprocess.Popen route to spawn 2 concurrent FFmpeg processes. The decoder process having 2 input pipes to Python and the encoder process with 2 output pipes. I'd run a thread for each I/O pipe pair (i.e., one for video one for audio).

overlaying the scoreboard in PNG

What I meant there is that as the scoreboard gets updated, you need to create a new image to overlay.

Good luck

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic doubts and basic ideas #16

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Basic doubts and basic ideas #16

Uh oh!

jnebrera Aug 14, 2023

Replies: 4 comments · 1 reply

Uh oh!

tikuma-lsuhsc Aug 14, 2023 Maintainer

Uh oh!

jnebrera Aug 14, 2023 Author

Uh oh!

tikuma-lsuhsc Aug 15, 2023 Maintainer

Uh oh!

jnebrera Aug 15, 2023 Author

Uh oh!

tikuma-lsuhsc Aug 15, 2023 Maintainer

jnebrera
Aug 14, 2023

Replies: 4 comments 1 reply

tikuma-lsuhsc
Aug 14, 2023
Maintainer

jnebrera
Aug 14, 2023
Author

tikuma-lsuhsc
Aug 15, 2023
Maintainer

jnebrera
Aug 15, 2023
Author

tikuma-lsuhsc Aug 15, 2023
Maintainer