Replies: 4 comments 1 reply
-
|
@jnebrera - I'm surprised PyAV lags OpenCV in performance. I thought both are based on FFmpeg codebase (at one point at least). If the performance is the most important aspect of your project, ffmpegio isn't likely a solution for you because it trades off (small) overhead for convenience features. Also, the package currently is written for processing a single media stream. Handling multiple stream requires you to use the low level functions, at which point you might as well use the Is ffmpeg faster than pyav/opencv? Advocates of the latter will say they are faster because you can read/write more efficiently (less memory copying) but you may find letting ffmpeg implicitly running multiple threads yielding in a better results. I don't know. Worth comparing. You need to learn FFmpeg and identify the command arguments that suit your need, and you spawn 2 sp.Popen processes one to decode, other to encode. If you're interested in learning more, I'd be happy to lay out a bit more info later.
If FFmpeg can do it and you can formulate the command line options, this package can as well.
Yes. Do some research around the interweb on ffmpeg and see what kind of info you can gather. I'd be happy to help you putting them together. |
Beta Was this translation helpful? Give feedback.
-
|
Hi Kesh, comments inline
@jnebrera <https://github.com/jnebrera> - I'm surprised PyAV lags OpenCV
in performance. I thought both are based on FFmpeg codebase (at one point
at least).
I think the reason for the performance drop is because is not using
hardware acceleration, and is quite significant, like 5 times less
The author claims they will not support hardware decoding (nor they have
intention to do so) and hardware encoding is quite difficult to get to as
it requires custom compilation and such)
I wonder why both decision from the author
If the performance is the most important aspect of your project, ffmpegio
isn't likely a solution for you because it trades off (small) overhead for
convenience features.
No, performance is not the main goal (wouldn't be using Python then) but
clearly not such a hit
Also, the package currently is written for processing a single media
stream. Handling multiple stream requires you to use the low level
functions, at which point you might as well use the subprocess built-in
package yourself to 100% eliminate the overhead.
This is not an issue. Our goal is just single stream butt both audio and
video
Is ffmpeg faster than pyav/opencv? Advocates of the latter will say they
are faster because you can read/write more efficiently (less memory
copying) but you may find letting ffmpeg implicitly running multiple
threads yielding in a better results. I don't know. Worth comparing.
You need to learn FFmpeg and identify the command arguments that suit your
need, and you spawn 2 sp.Popen processes one to decode, other to encode.
Sure, the thing is PyAV doesn't seem to allow for that. In the past I used
Vidgear and was able to fully exploit hardware, but doesn't support audio
If you're interested in learning more, I'd be happy to lay out a bit more
info later.
Sure. My project is https://github.com/jnebrera/basketball-broadcaster We
will check how to work with all hardware solution in Colab.
Actually, our goal is to maintain the video processing pipeline fully in
the GPU. Decoding with NVDEC, then in some frames object detection,
cropping and resizing with CV-CUDA and finally encoding with NVENC
The decision logic on where to crop will be done in the CPU based on the
object detection data and a Multi Object Tracker
If yes, in the future I hope to grab the video from a RTSP camera and push
the stream to a RTMP server like Youtube and such. Is this doable too?
If FFmpeg can do it and you can formulate the command line options, this
package can as well.
Ok, I have to confirm ffmpeg can push data to RTMP. I recall gstreamer did
Yes.
Do some research around the interweb on ffmpeg and see what kind of info
you can gather. I'd be happy to help you putting them together.
That would be really appreciated. At some moment I hope the project will be
able to live stream games, and in that case, it will be a huge gain to be
able to overlay some live scoreboard. There are some interesting providers
allowing you to manage scoreboards as html content to overlay in tools like
OBS but I didn't find an open source alternative for ffmpeg (I found one
for gstreamer from RidgeRun)
Kind regards
|
Beta Was this translation helpful? Give feedback.
-
From media processing perspective, you have 2 streams, 1 video stream and 1 audio stream. The top-level ffmpegio functions and classes (e.g., stream classes,
Steps 1, 3, & 4 can be done with ffmpeg. A goo dfirst place to look is the official ffmpeg wiki: Then, nvidia has a page dedicated for the ffmpeg as well: https://docs.nvidia.com/video-technologies/video-codec-sdk/12.0/ffmpeg-with-nvidia-gpu/index.html Skip the compile and install part as the precompiled binary likely comes with cuda support. This page gives several example ffmpeg commands
Forget the audio stream and if you want to do this detection, you can use the ffmpegio stream reader class. I would let ffmpeg do the full transcoding from the input file to the output (for the file-based initial dev). Here is a sketch of how I'd approach it: import ffmpegio as ff
with ffmpegio.open('myvideo.mp4', 'rv', blocksize=100, **cudadec_options) as fin:
for i, frames in enumerate(fin): # reads blocksize frames at a time
j = my_detector(frames):
if j>=0:
print(f'my detector found a target on {i*100+j}th frame!')Once you figure out the target range to extract the movie, then use i0 # start frame
i1 # end frame
fs = ff.probe.video_streams_basic('myvideo.mp4', 0)[0]['frame_rate'] # assume the first video stream
ss = i0/fs
to = i1/fs
video_cuda_options = {...} # arguments to do your CUDA based processing
ff.transcode('myvideo.mp4', 'output.mp4', ss=ss, to=to, **video_cuda_options)Leaving the transcoding off Python probably be the fastest.
I'd think so. The key is how to cache the video and audio stream as you run the detector. Actually, I'd create a transcoder process which you run/kill by the detection process. These processes can read the input stream separately with different ffmpeg processes.
FFmpeg won't do that unless you're OK overlaying the scoreboard on the video stream. If so, you need to figure out a way to create scoreboard image in PNG or other supported image format. FFmpeg can do a lot of things and you must learn how to use it by studying its documentation, wiki, and other online resources. I cannot do this part for you, but if you have a FFmpeg command you are testing and need help. Feel free to ask, especially to run the command with ffmpegio. |
Beta Was this translation helpful? Give feedback.
-
|
Comments inline
From media processing perspective, you have 2 streams, 1 video stream and 1
audio stream. The top-level ffmpegio functions and classes (e.g., stream
classes, video and audio submodules) are written to handle one stream at
a time. You can copy an audio stream from the source video while encoding
the new video frames. But, this won't work for your eventual streaming goal.
Keeping audio in the final video is key for a pleasent watching video
experience.
It is true I could do this in a two stage process, first generate the
cropped video, second add the audio from the original video, but as you
say, goes against our goal to live stream games
Steps 1, 3, & 4 can be done with ffmpeg.
A goo dfirst place to look is the official ffmpeg wiki:
https://trac.ffmpeg.org/wiki
Then, nvidia has a page dedicated for the ffmpeg as well:
https://docs.nvidia.com/video-technologies/video-codec-sdk/12.0/ffmpeg-with-nvidia-gpu/index.html
Skip the compile and install part as the precompiled binary likely comes
with cuda support. This page gives several example ffmpeg commands
The decision logic on where to crop will be done in the CPU based on the
object detection data and a Multi Object Tracker
Forget the audio stream and if you want to do this detection, you can use
the ffmpegio stream reader class. I would let ffmpeg do the full
transcoding from the input file to the output (for the file-based initial
dev).
Here is a sketch of how I'd approach it:
import ffmpegio as ff
with ffmpegio.open('myvideo.mp4', 'rv', blocksize=100, **cudadec_options) as fin:
for i, frames in enumerate(fin): # reads blocksize frames at a time
j = my_detector(frames):
if j>=0:
print(f'my detector found a target on {i*100+j}th frame!')
Once you figure out the target range to extract the movie, then use
transcode function
i0 # start framei1 # end frame
fs = ff.probe.video_streams_basic('myvideo.mp4', 0)[0]['frame_rate'] # assume the first video streamss = i0/fsto = i1/fs
video_cuda_options = {...} # arguments to do your CUDA based processing
ff.transcode('myvideo.mp4', 'output.mp4', ss=ss, to=to, **video_cuda_options)
Leaving the transcoding off Python probably be the fastest.
Yes, all could be left to ffmpeg but is a bit more tricky
On each frame the object detector or the multi object tracker position
players and the ball. Based on that info , a cropping decision is made, to
simulate pan, tilt, zoom camera movement on the original ultra panoramic
video
Thus, neither the cropping nor the resizing is known before hand and they
are completely dynamic
FFmpeg won't do that unless you're OK overlaying the scoreboard on the
video stream.
That's exactly the goal
If so, you need to figure out a way to create scoreboard image in PNG or
other supported image format.
Yes, but as previously, the process is fully dynamic. The image will change
every 1/10th of a second
Thank you very much for your help. I will keep investigating
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all, sorry, I opened it first as Issue without noticing you had discussions enabled
I'm currently using PyAV in Google Colab to decode a video file (audio + video), process the video frame using object detection, and resize it, and finally, reencode the video while keeping the original audio.
Unfortunately, by using PyAV we have been hot with a big performance drop compared to Norgear OpenCV video grabber (that lacks audio part)
I'm seeking for a Python alternative that fully exploits hardware acceleration both in the decoding (NVDEC) and encoding (NVEND) parts, but also allows me to work with the audio part (no processing itself needed, but cannot lose it)
Is python-ffmpegio the right library to achieve this?
If yes, in the future I hope to grab the video from a RTSP camera and push the stream to a RTMP server like Youtube and such. Is this doable too?
And a final question, if both are ok, one thing I have in mind is to overlay HTML content into the video stream. I'm aware ffmpeg allows for image and text overlays, but I wonder if anybody knows about such capability for ffmpeg.
Thank you very much for your help.
Beta Was this translation helpful? Give feedback.
All reactions