Skip to content

Refactor installation and plugin management system #219

@zwimer

Description

@zwimer

This issue is mostly the 'meeting notes' of a long conversation between @mahaloz and I about binsync's current plugin management issues.

The Problem

There are multiple common issues with the current system as of: f6eaf70

  1. Plugin / core version desyncs. Right now the plugins are not really versioned, but we require their installed code to come from the same commit that core package came from.
  2. Installed in the wrong environment; it is a common problem to install binsync into the wrong python enviornment; as each plugin needs itself and a copy of binsync to be installed into the environment of the interpreter of the desired decompiler.
  3. Inflexibility in adding decompiler support; plugins must be first party.
  4. In ability to easily update the binsync of users running angr.
  5. People running old versions of binsync core despite upgraded plugins; and visa-versa.
  6. No ability to add a true plugin system to binsync due to the inability to reliably install python packages
  7. Difficulties with editable installs on Windows due to symlinks issues: @mahaloz

Multiple of the above issues compound. For example, multiple installations of binsync might be on different versions simultaneously without the user knowing. Another simpler example: updating may be avoided due to installation tribulations.

Why this is hard

  1. Each decompiler uses its own interpreter, each of which requires binsync support
  2. A decompiler's interpreter may be bundled or may be 'the first found system interpreter'
  3. Some interpreters may not come with pip or any package managing tool
  4. Some decompilers (like single file angr-management) do not have a site-packages directory to edit
  5. Some decompilers (like single file angr-management) require binsync code be copied and included directly in the main project: https://github.com/angr/angr-management/tree/6751e9831e5758e74f8079f019731ca13e7d4741/angrmanagement/plugins/angr_binsync Thus updates must be done on the decompiler's end
  6. Installing and updating should be out-of-the-box trivially easy for inexperienced users; i.e. should be almost entirely automated and somewhat idiot proof (for end users).
  7. Developers should easily be able to edit / tinker on any given install
  8. Some decompilers do not use a fixed python interpreter; the version can change either because the system version changes, the user changes the version, etc.
  9. Users may accidentally clobber items within their environment
  10. Should work across OSes
  11. We intentionally store a global config that all binsync's share, and desire to continue to do so
  12. Uninstallation should be relatively easy
  13. Various decompilers require some sort of entry point for binsync to be installed; i.e. a user must tell IDA to execute the a specific function as a plugin, this needs to be automated as well.
  14. Some decompilers need additional non-python code (like Gihdra which currently needs https://github.com/mahaloz/binsync-ghidra-plugin/releases/tag/v1.1.0 )

Proposal Desires

  1. We want to use pip for packages; even if something wraps it, pip underneath
  2. Each plugin should become a stand-alone pip-installable package on pypi.org; these packages would depend on binsync>=a.b.c as needed.
  3. On plugin install, binsync or rather the plugin plugins, likely using a function defined in binsync, should install hook files into
    each decompiler.*
  4. binsync's API / library functions should be usable and ideally installable into other python environments without worrying about concurrent installs clobbering each other's global state.
  5. No longer require a version sync between binsync and plugins, we want a simply dependency.
  6. We are leaning towards namespaced packages for the various binsync packages to-be; but this is an implementation detail we can address later.
  7. A unified interface plugins can use to check for newer versions and warn the user, an interface binsync could use for itself (likely using the https://pypi.org/project/outdated/ package)

* This might be hard since this may require manually querying the user and we do not want people to have to do pip install binsync.plugins.ida && binsync install-hook ida separately if possible; likewise pip shouldn't have to query the user; so either pip would have to query the user, so we might want binsync to manage the plugin installation

The desired hook would look something like:

from pathlib import Path
import sys
# Adjust sys.path as needed
# sys.path.insert(0, "abc")

entrypoint = None
try:
    import binsync.plugins.ida
    binsync.plugins.ida.entrypoint()
except ModuleNotFoundError:
    pass
# finally:
# Restore sys.path if desired

if entrypoint is not None:
    entrypoint()

Each plugin would then have to define .entrypoint() which could be as simple as binsync.core.main(name, __version__) or something.

Proposals

The following proposals are listed in increasing order of complexity with later proposals building off the former proposals.

Per-interpreter user managed default environments

For each interpreter, the user would be expected to manually pip install binsync, the plugin for the given interpreter, and the plugin hook.
We do not want users manually installing a plugin hook themselves so we would expose a CLI like: binsync plugin hook ida

Downsides

  1. May not work for all decompilers; some vendor python interpreters without a pip package; we can potentially vendor our own pip or just use https://bootstrap.pypa.io/get-pip.py to bypass this issue.
  2. May not work for all decompilers; for (at least) some versions of angr-management will have no site-packages directory to install our binsync libraries into.
  3. We do not want to require the user manage this for each interpreter, not manually.
  4. Ideally we would also prefer not to require an additional step after the install or along-with an uninstall or reinstall.
  5. This uses the default environment of the interpreter; things can clobber and users can uninstall dependencies accidentally; likewise if the decompiler interpreter changes
  6. We have distinct copies of binsync per-decompiler, potentially with version desyncs
  7. Because this is per-interpreter, updates are significantly more effort, as they must also be done per interpreter

Per-interpreter binsync managed default environments

For each interpreter, the user would pip install binsync then binsync install plugins.ida (or whichever decompiler is desired); this would wrap a pip install and installing the hook. binsync install would support -U for upgrade and -e for editable installs, as it would just pass these arguments along to pip.

A CLI possibility:

$ binsync --help
  --version
  --config   # Print global config and path

  list       # Lists all installed packages (core, plugin.ida, etc)
  install  # `pip install` or update a binsync package; install hooks for programs as needed
    --help
    -e, --editable  # pip install -e wrapper
    --U, --update   # pip install -U wrapper
    arg [arg2...]   # pip install wrapper
  reinstall [arg...]  # Reinstall a binsync package and hooks
  uninstall [arg...]  # Uninstall a binsync package

The install command would be interactive and might prompt a user for information; it would also save such information to the global config for use by reinstall and uninstall; for example, where a hook is installed.

In this case, plugins would:

  1. Be pip installable packages that should never be manually installed by the user.
  2. To avoid confusion, user installations should fail: perhaps they could fail if PIP_INSTALL_BY_BINSYNC is not in the environment to achieve this?
  3. Depend on binsync as a package dependency
  4. Contain a custom hook.py to be installed by binsync, or if not, binsync can use a default version

Downsides

  1. May not work for all decompilers; some vendor python interpreters without a pip package; we can potentially vendor our own pip or just use https://bootstrap.pypa.io/get-pip.py to bypass this issue.
  2. May not work for all decompilers; for (at least) some versions of angr-management will have no site-packages directory to install our binsync libraries into.
  3. This uses the default environment of the interpreter; things can clobber and users can uninstall dependencies accidentally; likewise if the decompiler interpreter changes
  4. We have distinct copies of binsync per-decompiler, potentially with version desyncs
  5. Because this is per-interpreter, updates are significantly more effort, as they must also be done per interpreter

Single binsync CLI concurrently managing per-interpreter binsync-managed default environments

Building atop the binsync package manager concept, we additionally we break binsync out into:

  1. binsync: A package containing the CLI / plugin-manager
  2. binsync.api The binsync api / data interface (lets programs read binsync binary files, etc)
  3. binsync.core: The core logic of binsync, the bit the core of binsync that plugins utilize; depends on binsync.api for writing out data files and such.

Using binsync to install / upgrade / uninstall things now does each environment in sync.
Version output may look like:

$ binsync --version
CLI: 1.0.1
Core: 3.0.1
Plugins:
  ida: 2.0.4
  angr: 5.2.9

Downsides

  1. May not work for all decompilers; some vendor python interpreters without a pip package; we can potentially vendor our own pip or just use https://bootstrap.pypa.io/get-pip.py to bypass this issue.
  2. May not work for all decompilers; for (at least) some versions of angr-management will have no site-packages directory to install our binsync libraries into.
  3. This uses the default environment of the interpreter; things can clobber and users can uninstall dependencies accidentally; likewise if the decompiler interpreter changes

Single binsync CLI concurrently managing per-interpreter binsync-managed virtualenvs

Building upon the binsync package manager concept with a broken out binsync.api and binsync.core:

The binsync cli would create and install packages into per-interpreter virtualenvs ~/.binsync/ida/venv/ (for IDA, for example).
Decompiler hooks would import this code; it could be done by appending (or prepending) (temporarily or permanently) this to their sys.path, or perhaps with importlib; the method of importing is an implementation detail.

Benefits:

  1. These hooks would persist and remain functional even if a decompiler changed python interpreters.
  2. Decompilers that have no site-pacakges directory to install binsync plugins to would be supported as they code installed in the virtualenv

Downsides

  1. May not work for all decompilers; some vendor python interpreters without a pip package; we can potentially vendor our own pip or just use https://bootstrap.pypa.io/get-pip.py to bypass this issue.
  2. All binsync packages would be installed here, thus there would be no need to keep synchronized multiple concurrent copies.
  3. Potential version clobbering issues: If we require toml as a dependency of a plugin, and toml already exists in the interpreter's path, and has been loaded, ours in the virtualenv may be ignored, which might cause issues if we need specific features from specific versions.*

* This may not be an issue in practice; the current binsync and all of its plugins require python3.6 and has a small list of dependencies; this would also require a distinct module loaded into the decompiler that runs in the same interpreter, loads in before binsync, and requires a version of one of our dependencies that does not support 3.6. @mahaloz doubted we would have to worry about it but it is still a possible issue worth mentioning. We could also just vendor dependencies.

Single binsync CLI managing single binsync-managed virtualenv

Building upon the binsync virtualenv manager idea:

The binsync cli would create and install packages into a single virtualenv: ~/.binsync/venv/, which would again be hooked by the decompiler hooks.

Benefits over multiple virtualenvs:

  1. We no longer need to keep multiple version-synced copies of binsync.
  2. Since we aren't necessarily using the exact interpreter of the decompilers, we can utilize pip

Technically our plugins might not support our given python version (i.e. theoretically pip could grab a version for the version of the interpreter using pip rather than the decompiler's).
This is really not an issue though, since the plugin to be loaded would be coded specifically to work with the given decompiler, so we simply choose not to require python3.8 if we know IDA might use 3.6.
We could also add runtime checks to prevent this, if desired.

Downsides

  1. Potential version clobbering issues: If we require toml as a dependency of a plugin, and toml already exists in the interpreter's path, and has been loaded, ours in the virtualenv may be ignored, which might cause issues if we need specific features from specific versions.*

* This may not be an issue in practice; the current binsync and all of its plugins require python3.6 and has a small list of dependencies; this would also require a distinct module loaded into the decompiler that runs in the same interpreter, loads in before binsync, and uses a version of one of our dependencies that does not support 3.6. @mahaloz doubted we would have to worry about it but it is still a possible issue worth mentioning. We could also just vendor dependencies.

Overall

I think we should have binsync manager binsync packages; though the in my opinion the biggest question is whether we want:

  1. A CLI which manages the binsync installs across each interpreter environment
  2. A CLI that manages a singular binsync virtualenv that decompilers load code from

In my opinion, the biggest downside with a virtualenv is possible dependency version collisions due to our misuse of virtualenvs; though given the requirements to hit this issue, and after my conversation with @mahaloz it seems like that is a rather unlikely issue to hit. In that case, the benefits are a lack of keeping multiple version-synced copies of this package across multiple environments a user could easily mess up / clobber.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions