[DNM][WiP][PoC] userspace LL scheduling: LLEXT & multicore#10945
Draft
lyakh wants to merge 95 commits into
Draft
[DNM][WiP][PoC] userspace LL scheduling: LLEXT & multicore#10945lyakh wants to merge 95 commits into
lyakh wants to merge 95 commits into
Conversation
Add a built option HOST_DMA_IPC_POSITION_UPDATES to control whether functionality to send IPC stream position updates is enabled or not. Most platforms provide more efficient means for host to monitor DMA state, so this code is in most cases unncessary. The current IPC sending code (from audio context) also assume kernel context, so making this functionality user-space compatible will require extra work. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Drop the IRQ disable/enable in ipc4_search_for_drv(). The driver list is only modified at FW boot and when a new driver is registered at runtime via SOF_IPC4_GLB_LOAD_LIBRARY IPC. ipc4_search_for_drv() is only used when processing IPC messages. As IPC processing is serialized, it is not possible for the driver list to be modified concurrently with a call to ipc4_search_for_drv(). Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
The component driver list is only modified at FW boot and at runtime when a library is loaded. At boot, module init runs serially on the primary core (Zephyr SYS_INIT at APPLICATION level, before secondary cores are started; .initcall walked on a single core for XTOS). At runtime, registration happens from the IPC thread, which is serialized with only one command processed at a time. These two phases never overlap, as IPC message processing only begins after boot completes, so the list can never be modified concurrently. The lock was also already incoherent: comp_set_adapter_ops() iterate the list without holding the lock, so it provided no real mutual exclusion. Drop the spinlock from comp_register() and comp_unregister(), and from the UUID search in the IPC3 get_drv() reader. Remove the now-unused lock field from struct comp_driver_list and its initialization. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Add support for registering user-space LL tasks, and ability to use the task scheduling functions from user-space. The implementation splits scheduler list into kernel and user portions if SOF is built with CONFIG_SOF_USERSPACE_LL. A scheduler type can be either maintained in kernel or user, never both. With this patch, the SOF_SCHEDULE_LL_TIMER is moved to user managed if CONFIG_SOF_USERSPACE_LL is used. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Ensure the scheduler objects and lists of schedulers are allocated such that they can be used with both kernel and user-space LL scheduler implementations. The SOF_MEM_FLAG_KERNEL flag is removed. This flag has been a no-op for a while, and given scheduler list is not always in kernel anymore, it would be highly confusing to keep it. When CONFIG_SOF_USERSPACE_LL is set, the context of all schedulers is managed in the LL user-space domain. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
The real fix is to remove the locking around dai_get_properties() altogether, but this depends on fixes in Zephyr DAI drivers. To unblock user-space work, remove the calls to spinlocks for now. This opens up possibility to hit issues with concurrent playback and capture cases on multiple cores, so this commit remains a WIP until fixes in Zephyr drivers land. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Modify code to allocate DAI properties object on stack and use dai_get_properties_copy(). This is required when DAI code is run in user-space and a syscall is needed to talk to the DAI driver. It's not possible to return a pointer to kernel memory, so instead data needs to be copied to caller stack. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Turn the pdata->sem into a dynamic object in userspace LL builds, implemented with Zephyr k_sem. Add POSIX no-op stubs for sys_sem to maintain testbench build compatibility. Keep statically allocated semaphore for kernel LL builds. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Add function scheduler_get_data_for_core() to look up scheduler data for a particular type of scheduler. This variant allows to pass the core number as an argument, so it can be called from unprivileged code. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Add function user_ll_grant_access() to allow other threads to access the scheduler mutex. This is needed if work is submitted from other threads to the scheduler. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
When LL scheduler is run in user-space, use a different Zephyr thread name. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
The COHERENT_CHECK_NONSHARED_CORES debug macros call cpu_get_id() which invokes arch_proc_id() - a privileged hardware register read that faults in user-space context. Disable the entire debug block at compile time when CONFIG_SOF_USERSPACE_LL is enabled. This also fixes the same latent issue in CORE_CHECK_STRUCT and CORE_CHECK_STRUCT_INIT. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Add new functions to lock/unlock the LL scheduler for a given core. This is intended for audio application code that needs to modify the audio pipelines and needs an interface to get exclusive access to the pipelines on a particular core. This interface is specific to SOF builds with CONFIG_SOF_USERSPACE_LL. If LL scheduler is running in kernel space, there is option to disable interrupts for similar effect. For now these code paths are kept separate. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
In user-space LL builds (CONFIG_SOF_USERSPACE_LL), the IPC user thread cannot block interrupts while making modifications to the audio graph. To workaround this limitation, one could either protect each pipeline object with locks, or keep the LL level lock held while executing LL tasks. This patch implements support for the latter approach. If building SOF for user LL, do not release the lock when running a task. This reduces number of syscalls during a LL iteration, and allows to safely implement IPC handlers that need to modify the audio graph. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Modify the locking approach for CONFIG_SOF_USERSPACE_LL builds. Kernel LL implementation heavily relies on ability to disable interrupts when IPC handler is modifying the graph. This ensures a new LL tick and execution of a new graph cycle does not start before the graph modifications done by IPC handler are complete. In user-space, this approach is not available as user-space thread cannot disable interrupts. In commit 1e59ce2 ("pipeline: protect component connections with a mutex"), a sys_mutex based locking was implemented to protect the component list and modifications to it. This approach does not scale in the end as this would require taking the mutex for each component of each pipeline, and take the locks on every LL cycle tick. This results in significant system call overhead. Additionally Zephyr sys_mutex does not work correctly if the lock object is put into dynamically allocated user memory. In this commit, locking the LL graph is moved to a higher level. A single lock is used to protect the whole LL graph, and the lock is taken at start of LL tick. The same lock is taken by the IPC handlers when modifications to the graph are taken. The mutex interface supports priority inversion, so this usage is safe if LL timer tick happens while IPC processing is still in progress. The patch only changes behaviour for userspace LL SOF builds. If LL scheduling is kept in kernel, locking is done as before. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Modify the checks in zephyr_ll_assert_core() to make them safe to call from user-space LL threads. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Needs more review, but makes the tests pass again. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Build fails when building with CONFIG_THREAD_NAME disabled. Fix the issue by conditional compilation of code using CONFIG_THREAD_MAX_NAME_LEN. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Add support to run pipeline_schedule_triggered() in user-space. Use the user_ll_lock/unlock_sched() interface if building with CONFIG_SOF_USERSPACE_LL. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
In user-space LL builds the low-latency scheduler runs its work in a dedicated privileged domain thread, created together with its timer and access grants by scheduler_init_context() (zephyr_ll_init_context() -> domain_thread_init()). This context is per-core and must exist on every core that runs LL tasks. So far it was only established for the primary core, so LL tasks could not be scheduled on secondary cores when CONFIG_SOF_USERSPACE_LL is enabled. Allocate an LL task in secondary_core_init() and run scheduler_init_context() on it, giving each secondary core its own LL domain thread. A dedicated sec_core_init UUID is registered for the task. The whole block is compiled in only for CONFIG_SOF_USERSPACE_LL. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Copier set_chmap() blocks IRQs to atomically update the converters. This code is not safe to be moved to user-space, so replace the locks with calls to block LL scheduler execution. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Place the pipeline position lookup table in the sysuser memory partition and replace k_spinlock with a dynamically allocated k_mutex when CONFIG_SOF_USERSPACE_LL is enabled. Spinlocks disable interrupts which is a privileged operation unavailable from user-mode threads. The mutex pointer is stored in a separate APP_SYSUSER_BSS variable outside the SHARED_DATA struct so Zephyr's kernel object tracking can recognize it for syscall verification. Move pipeline_posn_init() from task_main_start() to primary_core_init() before platform_init(), so the mutex is allocated before ipc_user_init() grants thread access to it. In pipeline_posn_get(), bypass the sof_get() kernel singleton and access the shared structure directly when running in user-space. Grant the ipc_user_init thread access to the pipeline position mutex via new pipeline_posn_grant_access() helper. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
…pace task zephyr_ll_task_sched_free() frees an active (RUNNING/RESCHEDULE) task by setting pdata->freeing and waiting on pdata->sem for the scheduler thread to remove the task from its run list before the memory is released. Under CONFIG_SOF_USERSPACE_LL this function runs in kernel context while pdata->sem is a sys_sem allocated on the user heap. sys_sem_take() returns -EINVAL immediately when called from kernel context, so the wait is a no-op: pdata is freed (and the struct task is subsequently freed by pipeline_free()) while the task is still linked in sch->tasks with n_tasks != 0 and the scheduling domain handler still set. Because n_tasks is non-zero, schedule_free() does not stop the LL timer, and the next timer tick runs zephyr_ll_run() over the dangling task, dereferencing freed memory and taking a load/store-privilege exception (EXCCAUSE 26) in the user-space LL thread. Stop relying on the cross-privilege semaphore handshake in this path. When the task must be waited for, mark it cancelled so that, should it actually be mid-execution on the scheduler's temporary list, it is removed via the cancel path without re-running task->run() on resources the caller may already have freed. If the task is still linked on the run list, the scheduler thread is provably not executing it (a running task is moved off sch->tasks with the lock dropped), so remove it directly and skip the wait. This guarantees the task is delisted (n_tasks -> 0, handler -> NULL) before pdata is freed, eliminating both the dangling list entry and the stray timer wakeups. Verified on PTL with the standalone user-space LL boot tests: the userspace_ll suite, including pipeline_two_components_user, now passes without the fatal exception at teardown. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
If SOF is built with CONFIG_SOF_USERSPACE_LL, the IPC user handled will require access to coldrodata sections to initialize audio modules. This logic is not required for LLEXT modules, which have existing code to add access to coldrodata (and other sections). This commit is needed for builds where LLEXT is not used. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
This is a set of temporary changes to audio code to remove calls to privileged interfaces that are not mandatory to run simple audio tests. These need proper solutions to be able to run all use-cases in user LL version. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
The .coldrodata partition can be empty, avoid a failure in such cases. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Add a missing header for the zephyr_ll_(un)lock_sched() functions. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Extract a privileged LLEXT-related part from lib_manager_module_create() into a separate function and make it a system call. At the same time ilib_manager_mod_free_priv() already executes privileged operations, to make it callable in userspace convert lib_manager_free_module() to a system call too. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
ll_schedule_domain.h is needed for user_ll_lock_sched() and user_ll_unlock_sched() Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Use a pointer type-cast instead of copying a structure. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Use "%p" to log a pointer. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
scheduler_get_task_info_ll() and zephyr_ll_domain() are only needed when CONFIG_SOF_USERSPACE_LL=y Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Map library data in DRAM to the LL memory domain but only with kernel access. This is needed for LLEXT ELF linking. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
When CONFIG_SOF_USERSPACE_USE_DRIVER_HEAP isn't selected, dynamically allocated driver objects should still be accessible to the userspace. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
When loading and linking LLEXT modules map them automatically for the LL memory domain, unless they belong to the DP domain. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Even if userspace LL is disabled but generic userspace is enabled, IPC syscalls can be enabled. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Drivers are now accessible to userspace LL, remove now superfluous copies. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
LLEXT is now working with userspace LL and can be enabled. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
"Trace context" isn't used any more, no need to warn about it. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Memory zones are only used with IPC3, mark them as such. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Eliminate multiple instances of IPC4 data copying, use simple type- casts instead. This removes stack objects and replaces run-time copying with compile-time pointer substitution. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
When initialising in userspace use the userspace heap for channel memory allocations. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Instead of allocating semaphores during global initialisation, do that later when initialising the domain for specific cores. This also automatically grants access rights to the allocating thread. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
This is needed at least to set the .priv_data pointer to NULL. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Also when userspace is used scheduler instances have to be allocated uncached. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Prepare for multi-core support: allocate the IPC thread dynamically and extract thread initialisation into a separate function. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Userspace IPC context is global, allocate it uncached. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Use current core when calling scheduler_get_data_for_core(). Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Sometimes 10ms aren't enough for userspace IPC processing, increase it to 20ms. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Make scheduling LL thread and synchronisation objects per-core and forward IPCs and scheduling events accordingly. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This includes #10558 and my patches on top to enable LLEXT and multicore. Current status: passes simple tests with nocodec with both core 0 and core 1 streaming. 2 streams simultaneously run into a problem when the first of them terminates. WiP.