Permit mounting via volumes-from by passing orchestrator ID #924

cheald · 2019-08-26T21:57:56Z

tl;dr This helps CodeClimate engines not need intimiate docker host knowledge, which permits the usage of CodeClimate outside of docker-in-docker setups. In particular, this permits for easily running CodeClimate checks in Gitlab while retaining Docker layer caching, vastly improving the runtime of each build.

In contexts like self-hosted Gitlab, we sometimes have a context where we have an invoking runner like Gitlab CI running the Docker executor, which exposes the Docker socket to the running job, so that the running job may invoke its own Docker jobs on the host. Gitlab's top-level job will set up some filesystem context (/builds, mounted as a Docker volume, in the Gitlab case).

Right now, Gitlab can only support CodeClimate in a Docker-in-Docker runner, because CodeClimate performs volume mounting for the individual engines via Docker's --volume flag, which mounts not the path from the invoking container, but rather a path on the docker host. This requires that the path passed to CodeClimate as the CODECLIMATE_CODE variable match the real host path, and in the Gitlab CI case, we don't want that, so we have to "hide" the host with a DinD approach. However, this means that we also don't get any layer caching between jobs, which makes running CodeClimate prohibitively expensive, as all the engines etc have to be downloaded for each job.

By supporting Docker's volumes-from mounting option, we can instead tell the engines to inherit any mounts from the invoking orchestrator. This permits CodeClimate to allow the top-level context set up a Docker volume, bind it to the orchestrator, and then allow the orchestrator to pass that to invoked children. This sidesteps the issue of the Engines needing to know the actual host path; as long as the orchestrator's /code directory is mounted, the children can just presume to use it as-is.

To accomplish this, we just a) name the top-level container, and b) pass that name via the CODECLIMATE_ORCHESTRATOR env var:

    docker run \
      --interactive --tty --rm \
      --name codeclimate_orchestrator \
      --env CODECLIMATE_ORCHESTRATOR="codeclimate_orchestrator" \
      --env CODECLIMATE_CODE="/code" \
      --volume "$PWD":/code \
      --volume /var/run/docker.sock:/var/run/docker.sock \
      --volume /tmp/cc:/tmp/cc \
      codeclimate/codeclimate-wrapped analyze

In the bare-metal case, this doesn't change anything - we're mounting the real host path, which then gets passed to the individual children mounted on the /code mount.

While not immediately pertinent to the CodeClimate PR, In Gitlab, we can invoke the Gitlab codequality image like so:

script:
    - CONTAINER_ID=$(docker ps -q -f "label=com.gitlab.gitlab-runner.job.id=${CI_JOB_ID}")
    - BUILDS_VOLUME_ID=$(docker inspect $CONTAINER_ID --format '{{ range .Mounts }}{{ if eq .Destination "$CI_BUILDS_DIR" }}{{ .Name }}{{ end }}{{ end }}')
    - SOURCE_CODE="/code/${CI_PROJECT_DIR#$CI_BUILDS_DIR}"
    - docker run
        --rm
        --name "codeclimate_orchestrator_${CI_JOB_ID}"
        --env SOURCE_CODE=$SOURCE_CODE
        --env REPORT_FILENAME="gl-code-quality-report.json"
        --env CODECLIMATE_IMAGE="codeclimate:latest"
        --env ORCHESTRATOR_ID="codeclimate_orchestrator_${CI_JOB_ID}"
        --volume /var/run/docker.sock:/var/run/docker.sock
        --volume "${BUILDS_VOLUME_ID}":/code
        gitlab/codequality:latest $SOURCE_CODE

Because this job must be executed in a context that is visible to Docker, we can query Docker to get the current job's container ID, and from there get the volume ID mounted as $CI_BUILDS_DIR. We then volume mount that volume as /code, and specify /code as the "host" location of our code to be evaluated. The orchestrator will use the passed volume as /code, which is then passed onto the engine jobs, allowing the entire process to run against an ephemeral Docker volume rather than requiring a known path on the host.

tl;dr This helps CodeClimate engines not need intimiate docker host knowledge. In contexts like self-hosted Gitlab, we sometimes have a context where we have an invoking runner like Gitlab CI running the Docker executor, which exposes the Docker socket to the running job, so that the running job may invoke its own Docker jobs on the host. Gitlab's top-level job will set up some filesystem context (/builds, mounted as a Docker volume, in the Gitlab case). Right now, Gitlab can only support CodeClimate in a Docker-in-Docker runner, because CodeClimate performs volume mounting for the individual engines via Docker's --volume flag, which mounts not the path from the invoking container, but rather a path on the docker host. This requires that the path passed to CodeClimate as the CODECLIMATE_CODE variable match the real host path, and in the Gitlab CI case, we don't want that, so we have to "hide" the host with a DinD approach. However, this means that we also don't get any layer caching between jobs, which makes running CodeClimate prohibitively expensive, as all the engines etc have to be downloaded for each job. By supporting Docker's `volumes-from` mounting option, we can instead tell the engines to inherit any mounts from the invoking orchestrator. This permits CodeClimate to allow the top-level context set up a Docker volume, bind it to the orchestrator, and then allow the orchestrator to pass that to invoked children. This sidesteps the issue of the Engines needing to know the actual host path; as long as the orchestrator's /code directory is mounted, the children can just presume to use it as-is. To accomplish this, we just a) name the top-level container, and b) pass that name via the CODECLIMATE_ORCHESTRATOR env var: docker run \ --interactive --tty --rm \ --name codeclimate_orchestrator \ --env CODECLIMATE_ORCHESTRATOR="codeclimate_orchestrator" \ --env CODECLIMATE_CODE="/code" \ --volume "$PWD":/code \ --volume /var/run/docker.sock:/var/run/docker.sock \ --volume /tmp/cc:/tmp/cc \ codeclimate/codeclimate-wrapped analyze In the bare-metal case, this doesn't change anything - we're mounting the real host path, which then gets passed to the individual children mounted on the /code mount. While not immediately pertinent to the CodeClimate PR, In Gitlab, we can invoke the Gitlab codequality image like so: script: - CONTAINER_ID=$(docker ps -q -f "label=com.gitlab.gitlab-runner.job.id=${CI_JOB_ID}") - BUILDS_VOLUME_ID=$(docker inspect $CONTAINER_ID --format '{{ range .Mounts }}{{ if eq .Destination "/builds" }}{{ .Name }}{{ end }}{{ end }}') - docker run --rm --name "codeclimate_orchestrator_${CI_JOB_ID}" --env SOURCE_CODE="/code" --env CODECLIMATE_VERSION="volumes-from" --env ORCHESTRATOR_ID="codeclimate_orchestrator_${CI_JOB_ID}" --volume /var/run/docker.sock:/var/run/docker.sock --volume "${BUILDS_VOLUME_ID}":/code codequality:orch /code ("volumes-from" is my local Docker image for the altered CodeClimage build, and "codequality:orch" is my altered Gitlab codequality image) Because this job _must_ be executed in a context that is visible to Docker, we can query Docker to get the current job's container ID, and from there get the volume ID mounted as `/builds`. We then volume mount that volume as /code, and specify /code as the "host" location of our code to be evaluated. The orchestrator will use the passed volume as /code, which is then passed onto the engine jobs, allowing the entire process to run against an ephemeral Docker volume rather than requiring a known path on the host.

CLAassistant · 2019-08-26T21:58:07Z

All committers have signed the CLA.

HenningCash · 2019-09-04T10:25:08Z

We are facing the same issue: Our application has no access to the Docker host, only to the Daemon itself via remote API. With this approach we could create a codeclimate container, copy all files to /code using the Docker API and run the analysis without touching the host's filesystem.

👍 Would love to see this PR merged in the near future.

bufferoverflow · 2020-07-07T02:42:01Z

@efueger I think this is worth to look at, WDYT?

frakman1 · 2020-08-18T18:58:14Z

@cheald I can't find codeclimate/codeclimate-wrapped anywhere. Where are you getting this from?

HenningCash · 2020-08-19T08:27:12Z

@cheald I can't find codeclimate/codeclimate-wrapped anywhere. Where are you getting this from?

I guess the image was built locally from this PR's branch and tagged codeclimate/codeclimate-wrapped.

codeclimate-hermes assigned efueger Aug 26, 2019

Remove trailing comma to conform with CC checks

Loading status checks…

c1c0b00

erikhofer mentioned this pull request Sep 4, 2019

Use original codeclimate image codefreak/codefreak#149

Open

codeclimate / codeclimate

Permit mounting via volumes-from by passing orchestrator ID #924

Permit mounting via volumes-from by passing orchestrator ID #924

cheald commented Aug 26, 2019

CLAassistant commented Aug 26, 2019 •

edited

HenningCash commented Sep 4, 2019

bufferoverflow commented Jul 7, 2020

frakman1 commented Aug 18, 2020 •

edited

HenningCash commented Aug 19, 2020

codeclimate / codeclimate

Join GitHub today

Permit mounting via volumes-from by passing orchestrator ID #924

Permit mounting via volumes-from by passing orchestrator ID #924

Conversation

cheald commented Aug 26, 2019

CLAassistant commented Aug 26, 2019 • edited

HenningCash commented Sep 4, 2019

bufferoverflow commented Jul 7, 2020

frakman1 commented Aug 18, 2020 • edited

HenningCash commented Aug 19, 2020

Essential cookies

Always active

Analytics cookies

CLAassistant commented Aug 26, 2019 •

edited

frakman1 commented Aug 18, 2020 •

edited