update server docs

b439c858 · Lennard Gäher · 920cb09d · b439c858
Verified Commit b439c858 authored 8 months ago by Lennard Gäher
--- a/srvr-doc.md
+++ b/srvr-doc.md
 ## Documentation for CI server config
-two servers are relevant
+This document describes the setup of Iris CI servers at MPI-SWS.
- coop.mpi-sws.org
-  + main CI machine, doesn't do anything else (CI-Jobs are running via docker)
+### Architecture
-  + accurate timings require some work wrt hardware & configs
+The following servers are relevant:
-  + two sockets (10cores each) to prevent interference when measuring timings
+- coq-speed, hosting coq-speed.mpi-sws.org
-    + one socket for timing jobs, one socket for everything else
+- coop-normal and coop-speed for running the CI
-    + different sockets means different memory controllers, important for timings
-    + config: `/etc/systemd/system.conf` sets cpu-affinity for system to all the cores on one socket (cpu topology can be validated with `lscpu`):
+#### coq-speed
-      `CPUAffinity=1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39`
+coq-speed runs in a VM somewhere, serving the `coq-speed.mpi-sws.org` frontend.
-  + `/etc/gitlab-runner/config.toml` is the main config for the CI runner
-    + here we set cpu-affinity for timing jobs to the cores on the other socket -- but only one core for each hyper-threading pair:
+The server hosts two services:
-      `cpuset_cpus = "0,2,4,6,8,10,12,14,16,18"`
+- a webhook behind an nginx reverse proxy for authentication, which receives data from CI and puts it in a PostgreSQL database.
-      (so effectively we disable hyper-threading on that socket)
+  + available at coq-speed.mpi-sws.org/webhook
-    + the other jobs use the same cores as the system itself:
+  + webhook is written in Rust: https://gitlab.mpi-sws.org/iris/ci-timing-webhook
-      `cpuset_cpus = "1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39"`
+  + if something goes wrong with the webhook, the systemd log for the webhook service has the error output: `systemctl status webhook.service`
-    + the timing runner has the seccomp filter disabled to be able to run perf. Once https://gitlab.com/gitlab-org/gitlab-runner/-/issues/27235 is fixed, one could instead use a seccomp filter that allows the `perf_even_open` syscall in Docker, see https://stackoverflow.com/a/44748260 . (Previously, the job was running with privileged = true to enable perf.)
+- a Grafana frontend, reachable at coq-speed.mpi-sws.org, which visualizes the timing data.
-  + using S3-Server for caching (served by MPI)
+  + The Grafana config is fairly tricky with queries that put the data in the right format.
-  + for timing measurements: #instructions are measured with `perf` (less noisy than exact times)
+  + This is configured through the Grafana frontend.
-  + server doesn't use custom MPI kernel, but rather the standard Debian kernel (to enable `perf` in Docker)
-  + sever is stateless, can be wiped anytime
+As the server is stateful, it has a backup job.
-  + may need to clean up storage from time to time, the local copies of the caches on this server can run full.
-    (somewhere in `/local/docker`, a lot of space will be used; deleting old files usually works pretty well.)
+#### coop-normal
+coop-normal is running on a physical server owned by Derek's group, last upgraded in mid 2024.
- coq-speed.mpi-sws.org
+It's main job is running CI jobs tagged with `fp`.
-  + runs in some VM somewhere, serves the `coq-speed.mpi-sws.org` frontend
-  + PostreSQL + Grafana + home-made Rust-service to fill the DB with timing info (`https://gitlab.mpi-sws.org/iris/ci-timing-webhook`)
+The hardware of the server is optimized for Coq's requirements:
-  + stateful (data in PostreSQL, also tricky Grafana config), has a backup job
+- large caches
+- high memory bandwidth
-  + Grafana config:
-    - queries that put data in right format for Grafana, configured through the Grafana frontend
+Setup:
-    - hopefully won't have to touch those
+ uses S3-Server for caching (served by MPI)
+ server doesn't use custom MPI kernel, but rather the standard Debian kernel (to enable `perf` in Docker)
- CI scripts
+ server is stateless, can be wiped anytime
-  - base-CI-image is on github.com/RalfJung/opam-ci (branch opam2)
+ may need to clean up storage from time to time, the local copies of the caches on this server can run full (#cleaning-old-ci-cache-files-on-coop--normal-and-coop--speed).
-  - CI config (.gitlab-ci.yml) is identical for most projects
-    + `tags` are used for setting the runner (`fp` (default), `fp-timing`)
+#### coop-speed
-    + `variables`: sets environment variables for CI, list with variables is at the top of https://gitlab.mpi-sws.org/iris/ci/-/blob/master/buildjob
+coop-speed is running on a physical server owned by Derek's group, last upgraded in mid 2024.
-      + `OPAM_PINS` variable: uses triples separated by spaces to configure opam pins (e.g. `coq-core.dev git git+https://....`); there's a shortcut for use with git repos containing `.opam` configs, just use `git+https://...` instead of the triple
+It's main job is running CI jobs tagged with `fp-timing`, the results of which are then submitted to coq-speed.
-      + `CI_RUNNER_DESCRIPTION`: parsed for activation of `perf`, sets the `TIMECMD` env variable (which is read by Coq); the `perf` script calls `perf` with the right parameters and concatenates the output files to one which is parsed by the frontend (this concatenated info is also shown in the CI output log on Gitlab)
-      + see `README.md` for more details
+The basic setup is similar to coop-normal, but getting accurate timings requires some work wrt hardware & config.
-    + for creating/updating opam packages: https://gitlab.mpi-sws.org/iris/opam-updater (private due to secrets) runs CI jobs for updating.
-    + updating coqdocs: uses a `coqdoc` user which can only use rsync for the specific coqdoc directory
+##### Core pinning
-    + `prepare-opam.sh`: uses a few opam workarounds; in case there are problems with the opam cache, use the `Clear Runner Caches` button on the pipeline overview in Gitlab.
+The CPU consists of 8 dies (CCDs) which each have their own mostly independent caches (except for L3 cache snooping).
-      This will not delete anything, just bump up some counter to avoid using the old caches.
+Each of the 8 dies has 8 physical cores, of which only 2 each are activated for our CPU.
-  - iris: `iris-bot` script does two things: checking if reverse dependencies still build and checking two commits for timing differences. To do that, there are `trigger-iris.timing`/`trigger-iris.dev` CI jobs, CI is using the `TIMING_AD_HOC_ID` variable to check which one to run. Again see `README.md` for more details.
+The mapping of this to Linux's processor enumeration can be observed with `cat /proc/cpuinfo`, looking at the `core id` attribute.
+To avoid running into memory contention, the CPU has 12 memory channels.
+We have pinned the operating system to the first CCD including its hyperthreads (cores 0, 1, 16, 17).
-## Steps to setup the coop server after a system reinstall
+The rest of the cores are available for CI workloads.
-1. Restore visible files in home directory (optional, the files are not strictly needed, but useful)
+To configure this:
+ `/etc/systemd/system.conf` sets cpu-affinity for the system with `CPUAffinity=0,1,16,17`
+ for some reason, this only works with the boot option `systemd.unified_cgroup_hierarchy=false`
+ for the gitlab-runner, we set the CPU pinning in its config file `/etc/gitlab-runner/config.toml`
+  * currently, we have two runners on different CCDs, with `cpuset_cpus = "4,5,6,7,8,9"` and `cpuset_cpus = "10,11,12,13,14,15"`
+  * this effectively disables hyperthreading for more reproducable timing.
+#### Running perf in docker
+To run `perf` in `docker`, we use `cap_add = ["PERFMON"]` in the gitlab-runner config.
+Alternatively, once https://gitlab.com/gitlab-org/gitlab-runner/-/issues/27235 is fixed, one could instead use a seccomp filter that allows the `perf_event_open` syscall in Docker, see https://stackoverflow.com/a/44748260 .
+(Previously, the job was running with `privileged = true` to enable `perf`.)
+We need to pay attention that the Kernel version of the Docker images matches with the Host kernel version so that `perf` works.
+#### History
+In the past, we used a single server for coop-normal and coop-speed with two sockets, where one socket was used for timing jobs.
+This allowed for the CPU running the timing jobs having its own memory controller which isn't shared with any other processes.
+However, it seems like in practice we are not loosing any timing accuracy on the current setup by having the system processes running on the CPU (but different CCD) than the timing jobs, as long as enough memory bandwidth is available.
+#### CI scripts
+- base-CI-image is on github.com/RalfJung/opam-ci (branch opam2)
+- CI config (.gitlab-ci.yml) is identical for most projects
+ `tags` are used for setting the runner (`fp` (default), `fp-timing`)
+ `variables`: sets environment variables for CI, list with variables is at the top of https://gitlab.mpi-sws.org/iris/ci/-/blob/master/buildjob
+  + `OPAM_PINS` variable: uses triples separated by spaces to configure opam pins (e.g. `coq-core.dev git git+https://....`); there's a shortcut for use with git repos containing `.opam` configs, just use `git+https://...` instead of the triple
+  + `CI_RUNNER_DESCRIPTION`: parsed for activation of `perf`, sets the `TIMECMD` env variable (which is read by Coq); the `perf` script calls `perf` with the right parameters and concatenates the output files to one which is parsed by the frontend (this concatenated info is also shown in the CI output log on Gitlab)
+  + see `README.md` for more details
+ for creating/updating opam packages: https://gitlab.mpi-sws.org/iris/opam-updater (private due to secrets) runs CI jobs for updating.
+ updating coqdocs: uses a `coqdoc` user which can only use rsync for the specific coqdoc directory
+ `prepare-opam.sh`: uses a few opam workarounds; in case there are problems with the opam cache, use the `Clear Runner Caches` button on the pipeline overview in Gitlab.
+  This will not delete anything, just bump up some counter to avoid using the old caches.
+- iris: `iris-bot` script does two things: checking if reverse dependencies still build and checking two commits for timing differences. To do that, there are `trigger-iris.timing`/`trigger-iris.dev` CI jobs, CI is using the `TIMING_AD_HOC_ID` variable to check which one to run. Again see `README.md` for more details.
+## Cleaning old CI cache files on coop-normal and coop-speed
+Once in a while, CI jobs might fail due to there being no space on the CI runner anymore. This happens because we regularly change the Coq versions we test in CI, and this changes the job names. So after a while, we end up with a bunch of caches for jobs that are not even run anymore because the Coq version has been phased out.
+To clean this up, do the following:
+```
+cd /local/docker/volumes
+find . -name "cache.zip" -mtime +150 -exec rm {} \;
+```
+This deletes caches which are more than 150 days old.
+## Setting up coop-normal after a re-install
+1. Restore visible files in home directory (optional, the files are not strictly needed, but useful).
+   In particular, `.docker/config.json` is useful for the credentials for Docker hub.
 2. Add Debian package repos for gitlab-runner and Docker and install them according to their respective instructions.
-3. In /etc/systemd/system.conf: set CPU affinity
+3. Restore /etc/gitlab-runner/config.toml
-4. restore /etc/gitlab-runner/config.toml
+4. Symlink /var/lib/docker/ to /local/docker. This maybe works with the following script. But last time the move actually destroyed the docker installation (which could be fixed by reinstalling docker-ce).
-5. restore /etc/sysctl/perf.conf
-6. symlink /var/lib/docker/ to /local/docker. This maybe works with the following script. But last time the move actually destroyed the docker installation (which could be fixed by reinstalling docker-ce).
 ```
 systemctl stop docker
 mv /var/lib/docker /local/docker
 ln -s /local/docker /var/lib/docker
 systemctl start docker
 ```
-7. Check if everything is running.
-  + CPU Affinity: sanity-check in htop that only the cores that are configured in system.conf are running system processes
+## Setting up coop-speed after a re-install
-8. Update the docker image at Docker Hub (the Debian version for the timing job needs to be the same as the host version, to have perf working), `ralfjung/opam-ci`. 
+Perform the same steps as for coop-normal, plus:
+1. In /etc/systemd/system.conf: set CPU affinity
+2. After a reboot, sanity-check in htop that only the cores that are configured in system.conf are running system processes.
+   If not, we may need the following hack:
+   1. In `/etc/default/grub`, add the following kernel option: `GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=false"`
+   2. Disable the MPI config which disables kernel commandline parameters: `mv  /etc/default/grub.d/disable.cfg  /etc/default/grub.d/disable.cfg_BAK`
+   3. `update-grub`
+   4. `grub-install`
+   5. Reboot
+3. Restore `/etc/sysctl/perf.conf`
+4. Update the docker image at Docker Hub (the Debian version for the timing CI image needs to be the same as the host version, to have perf working), `ralfjung/opam-ci`.
  Since the automatic pull from the Github repo doesn't work anymore, directly push to Docker Hub:
 ```
 git clone https://gitlab.mpi-sws.org/iris/ci.git
@@ -75,16 +132,8 @@ docker login
 docker push ralfjung/opam-ci:opam2
 ```
-## Cleaning old CI cache files on coop
-Once in a while, CI jobs might fail due to there being no space on the CI runner anymore. This happens because we regularly change the Coq versions we test in CI, and this changes the job names. So after a while, we end up with a bunch of caches for jobs that are not even run anymore because the Coq version has been phased out.
-To clean this up, do the following:
-```
-cd /local/docker/volumes
-find . -name "cache.zip" -mtime +150 -exec rm {} \;
-```
-This deletes caches which are more than 150 days old.
 ## Setting up coq-speed after a re-install
+0. Restore visible files in home directory (optional, the files are not strictly needed, but useful).
 1. Set up Grafana. Grafana uses its own SQLite DB, but we will add a PostgreSQL DB that is filled by the webhook as an external data source.
    * Install Grafana by adding the apt repository: https://grafana.com/docs/grafana/latest/setup-grafana/installation/debian/#install-from-apt-repository. Pay attention to the signing part.
    * Enable the systemd services
@@ -93,7 +142,7 @@ This deletes caches which are more than 150 days old.
        sudo systemctl enable grafana-server
        ```
    * Restore/update `/etc/grafana/grafana.ini`
-    * Restore `/var/lib/grafana.db`, making sure the user/group is correctly set to `grafana`, e.g.: 
+    * Restore `/var/lib/grafana.db`, making sure the user/group is correctly set to `grafana`, e.g.:
      `chown --reference=/var/lib/grafana/grafana.db grafana.db`
 2. Setup nginx as reverse proxy for webhook
    * Install nginx: `apt install nginx`