Exposing docker socket to a container

5ymm3trY@discuss.tchncs.de · 2 months ago

Exposing docker socket to a container

InnerScientist@lemmy.world · 2 months ago

I just follow the software release pages with RSS.

5ymm3trY@discuss.tchncs.de · 2 months ago

That is actually a pretty good idea. I wanted to try out FreshRSS anyways, so this might be one more reason to do that. Thanks!

glizzyguzzler@piefed.blahaj.zone · edit-2 2 months ago

Per this guide https://cheatsheetseries.owasp.org/cheatsheets/Docker_Security_Cheat_Sheet.html I do not. I have a cron/service script that updates containers automatically (‘docker compose pull’ I think) that I don’t care if they fail for a bit (pdf converter, RSS reader, etc.) or they’re exposed to the internet directly (Authentik, caddy).

Note that smart peeps say that the docker socket is not safe as read-only. Watchtower is inherently untenable sadly, so is Traefik (trusting a docker-socket-proxy container with giga root permissions only made sense to me if you could audit the whole thing and keep auditing with updates and I cannot). https://stackoverflow.com/a/52333163 https://blog.quarkslab.com/why-is-exposing-the-docker-socket-a-really-bad-idea.html

I then just have scripts to do the ‘docker compose pull’ for things with oodles of breaking changes (Immich) or things I’d care if they did break suddenly (paperless).

Overall, I’ve only had a few break over a few years - and that’s because I also run all services (per link above) as a user, read-only, and with no capabilities (that aren’t required, afaik none need any). And while some containers are well coded, many are not, and if an update makes changes that want to write to ‘/npm/staging’ suddenly, the read-only torches that until I can figure it out and put in a tmpfs fix. The few failures are worth the peace of mind that it’s locked the fuck down.

I hope to move to podman sometime to eliminate the last security risk - the docker daemon running the containers, which runs as root. Rootless docker seems to be a significant hassle to do at any scale, so I haven’t bothered with that.

Edit: this effort is to prevent the attack vector of “someone hacks or buys access to a well-used project (e.g., Watchtower last updated 2 years ago, commonly used docker socket proxy, etc.) which is known to have docker socket access and then pushes a malicious update that to encrypt and ransom your server with root access escalations from the docker socket”. As long as no container has root, (and the container doesn’t breach the docker daemon…) the fallout from a good container turned bad is limited to the newly bad container.

chameleon@fedia.io · 2 months ago

All true, wanted to add on to this:

Note that smart peeps say that the docker socket is not safe as read-only.

That’s true, and it’s not just something mildly imperfect, read-only straight up does nothing. For connecting to a socket, Linux ignores read-only mount state and only checks write permission on the socket itself. Read-only would only make it impossible to make a new socket there. Once you do have a connection, that connection can write anything it wants to it. Traefik and other “read-only” uses still have to send GET queries for the data they need, so that’s happening for legitimate use cases too.

If you really need a “GET-only” Docker socket, it has to be done with some other kind of mechanism, and frankly the options aren’t very good. Docker has authorization plugins that seem like too much of a headache to set up, and proxies don’t seem very good to me either.

Or TLDR: :ro or stripping off permission bits doesn’t do anything aside from potentially break all uses for the socket. If it can connect at all, it’s root-equivalent or has all privileges of your rootless user, unless you took other steps. That might or might not be a massive problem for your setup, but it is something you should know when doing it.

glizzyguzzler@piefed.blahaj.zone · 2 months ago

Thanks for explaining the underworkings, never dug to see what happened and how it works - I see it bad

5ymm3trY@discuss.tchncs.de · 2 months ago

Thank you for your comment and the resources you provided. I definitely look into these. I like your approach of minimizing the attack surface. As I said, I am still new to all of this and I came across the user option of docker compose just recently when I installed Jellyfin. However, I thought the actual container image has to be configured in a way so that this is even possible. Otherwise you can run into permission errors and such. Do you just specify a non-root user and see if it still works?

And while we’re at it, how would you setup something like Jellyfin with regards to read-write permissions? I currently haven’t restricted it to read-only and in my current setup I most certainly need write permissions as well because I store the artwork in the respective directories inside my media folder. Would you just save these files to the non-persisted storage inside the container because you can re-download them anyway and keep the media volume as read-only?

glizzyguzzler@piefed.blahaj.zone · edit-2 2 months ago

So I’ve found that if you use the user: option with a user: UserName it requires the container to have that UserName alsoo inside. If you do it with a UID/GID, it maps the container’s default user (likely root 0) to the UID/GID you provide user: 1500:1500. For many containers it just works, for linuxserver (a group that produces containers for stuff) containers I think it biffs it - those are way jacked up. I put the containers that won’t play ball in a LXC container (via Incus GUI), or for simple permission fixes I just make a permissions-fixing version of the container (runs as root, but only executes commands I provide) to fill a volume with the data that has the right permissions then load that volume into the container. Luckily jellyfin doesn’t need that.

I give jellyfin read-only access (via :ro in the volumes:) to my media stuff because it doesn’t need to write to it. I think it’s fine if your use-case needs :rw, keep a backup (even if you :ro!).

Here’s my docker-compose.yml, I gave jellyfin its own IP with macvlan. It’s pretty janky and I’m still working it, but you can have jellyfin use your server’s IP by deleting everything after jellyfin-nw: (but keep jellyfin-nw:!) in both the networks: section and services: section. Delete the mac: in the services: section too. In the ports: part that 10.0.1.69 would be the IP of your server (or in this case, what I declare the jellyfin container’s IP to be) - it makes it so the container can only bind to the IP you provide, otherwise it can bind to anything the server has access to (as far as I understand).

And of course, I have GPU acceleration working here with some embeded Intel iGPU. Hope this helps!

# --- NETWORKS ---  
networks:  
  jellyfin-nw:  
    # In docker, `macvlan` gets similar stuff to 
    driver: macvlan  
    driver_opts:  
        parent: 'br0'  
    #    mode: 'l2'  
    name: 'doc0'  
    ipam:  
        config:  
          - subnet: "10.0.1.0/24"  
            gateway: "10.0.1.1"  

# --- SERVICES ---  
services:  
    jellyfin:  
        container_name: jellyfin  
        image: ghcr.io/jellyfin/jellyfin:latest  
        environment:  
          - TZ=America/Los_Angeles  
          - JELLYFIN_PublishedServerUrl=https://jellyfin.guzzlezone.local/  
        ports:  
          - '10.0.1.69:8096:8096/tcp'  
          - '10.0.1.69:7359:7359/udp'  
          - '10.0.1.69:1900:1900/udp'  
        devices:  
          - '/dev/dri/renderD128:/dev/dri/renderD128'  
        #  - '/dev/dri/card0:/dev/dri/card0'  
        volumes:  
          - '/mnt/ssd/jellyfin/config:/config:rw,noexec,nosuid,nodev,Z'  
          - '/mnt/cache/jellyfin/log:/config/log:rw,noexec,nosuid,nodev,Z'  
          - '/mnt/cache/jellyfin/cache:/cache:rw,noexec,nosuid,nodev,Z'  
          - '/mnt/cache/jellyfin/config-cache:/config/cache:rw,noexec,nosuid,nodev,Z'  
          # Media links below  
          - '/mnt/spinner/movies:/data/movies:ro,noexec,nosuid,nodev,z'  
          - '/mnt/spinner/shows:/data/shows:ro,noexec,nosuid,nodev,z'  
          - '/mnt/spinner/music:/data/music:ro,noexec,nosuid,nodev,z'  
        restart: unless-stopped  
        # Security stuff  
        read_only: true  
        tmpfs:  
          - /tmp:uid=2200,gid=2200,rw,noexec,nosuid,nodev  
        # mac address is 02:42 then 10.0.1.69 in hex for each # betwen the .s mapped to the :s in the mac address  
        # its how docker assigns so there will never be a mac address collision  
        mac_address: 02:42:0A:00:01:45  
        networks:  
            jellyfin-nw:  
                # Docker is pretty jacked up and can't get an IP via DHCP so manually specify it  
                ipv4_address: 10.0.1.69  
        user: 2200:2200  
        # gpu capability needs render capability, see the # for your server with `getent group render | cut -d: -f3`  
        group_add:  
          - "109"  
        security_opt:  
          - no-new-privileges:true  
        cap_drop:  
          - ALL

Lastly thought I should add the external stuff needed for the hardware acceleration to work/get the user going:

# For jellyfin low power (LP) intel QSV stuff  
# if trouble see https://jellyfin.org/docs/general/administration/hardware-acceleration/intel/#configure-and-verify-lp-mode-on-linux  
sudo apt install -y firmware-linux-nonfree #intel-opencl-icd  
sudo mkdir -p /etc/modprobe.d  
sudo sh -c "echo 'options i915 enable_guc=2' >> /etc/modprobe.d/i915.conf"  
sudo update-initramfs -u  
sudo update-grub  

APP_NAME="jellyfin"  
APP_PID=2200  
sudo useradd -u $APP_PID $APP_NAME

The Jellyfin user isn’t added to the render group, rather the group is added to the container in the docker-compose.yml file.

5ymm3trY@discuss.tchncs.de · 2 months ago

I have set all this up on my Asustor NAS, therefore things like apt install are not applicable in my use-case. Nevertheless, thank you very much for your time and expertise with regards to users and volumes. What is your strategy for networks in general? Do you setup a separate network for each and every container unless the services have to communicate with each other? I am not sure I understand your network setup in the Jellyfin container.

In the ports: part that 10.0.1.69 would be the IP of your server (or in this case, what I declare the jellyfin container’s IP to be) - it makes it so the container can only bind to the IP you provide, otherwise it can bind to anything the server has access to (as far as I understand). With the macvlan driver the virtual network driver of your container behaves like its own physical network interface which you can assign a separate IP to, right? What advantage does this have exactly or what potential problems does this solve?

glizzyguzzler@piefed.blahaj.zone · 2 months ago

I wanted Jellyfin on its own IP so I could think about implementing VLANs. I havent yet, and I’m not sure what I did is even needed. But I did do it! You very likely don’t need to do it.

There are likely guides on enabling Jellyfin hardware acceleration on your Asustor NAS - so just follow them!

I do try to set up separate networks for each service.

On one server I have a monolithic docker compose file with a ton of networks defined to keep services from talking to the internet or each other if it’s not useful (pdf converter is prevented from talking to the internet or the Authentik database, for example). Makes the most sense here, has the most power.

On this server I have each service split up with its own docker compose file. The network bit makes more sense on services that have an external database and other bits, it lets me set it up so only the service can talk to its database and its database cannot reach the internet at large (via adding a ‘internal: true’ to the networks: section). In this case, yes the pdf converter can talk to other services and I’d need to block its internet access at the router somehow.

The monolithic method gets more annoying to deal with with many services via virtue of a gigantic docker compose file and the up/down time (esp. for services that don’t acknowledge shutdown commands). But it lets me use fine-grained networking within the docker compose file.

For each service on its own, they expose a port and things talk to them from there. So instead of an internal docker network letting Authentik talk to a service, Authentik just looks up the address of the service. I don’t notice any difference in perceptible lag.

5ymm3trY@discuss.tchncs.de · 2 months ago

I am a strong believer in separate docker compose files to keep it more organized and hopefully have more control over everything. But in the end most of it comes down to personal preference.

I actually have some kind of network issues with one of my containers at the moment (Adguard in this case), where your ideas already came in handy. Unfortunately, I couldn’t solve it yet, but this is also something for a new topic I believe.

brewery@feddit.uk · 2 months ago

Sorry this doesn’t answer your question really but I’ve had issues when I used to auto update containers so stopped doing that. Some things have breaking changes, others just had issues in that release that caused me issues accessing stuff when not at home. I update every so often when I have ten minutes to do updates, check release notes and deal with any issues if they arise or roll back to that version. I spin up what’s up docker to see what’s changed then when finished, stop the container so it doesn’t keep on polling docker hub using my free allowance.

In short, it could be an option to spin it up, let it run, then stop the container so theres less risk it could be used for an attack.

5ymm3trY@discuss.tchncs.de · 2 months ago

That is the exact reason why I wouldn’t use the auto-update feature. I just thought about setting it up to check for updates and give me some sort of notification. I just feel like a reminder every now and then helps me to keep everything up to date and avoid some sort of never change a running system mentality.

Your idea about setting it up and only letting it run occasionally is definitely one to consider. It would at least avoid manually checking the releases of each container similar to the RSS suggestion of /u/InnerScientist

brewery@feddit.uk · 2 months ago

To be honest, you would get frequent notifications for updates that are probably more often than just to remind you. If you’re like me, you’ll just end up ignoring them anyway! There are a lot of small updates to a lot of software, most often not from a security point of view but just as people develop their projects. I update every week if I can but can be a couple of weeks, in which I start to feel “guilty” so when it builds up I know I have to do it

5ymm3trY@discuss.tchncs.de · 2 months ago

Fair point. It is probably best to keep it simple. I can always setup a reminder in my calendar twice a month if I really have to.

i_am_not_a_robot@discuss.tchncs.de · 2 months ago

Mounting the docker socket into Watchtower is fine from a security perspective, but automatic updates can definitely cause problems. I used to use Rennovate and it would open a pull request to update the version.

5ymm3trY@discuss.tchncs.de · edit-2 2 months ago

There are lots of articles out there that say the opposite. Not about Watchtower per se, but giving a container access to the socket is generally considered to be a bad idea from a security point of view.

i_am_not_a_robot@discuss.tchncs.de · 2 months ago

Giving a container access to the docker socket allows container escapes, but if you’re doing it on purpose with a service designed for that purpose there is no problem. Either you trust Watchtower to manage the other containers on your system or you don’t. Whether it’s managing the containers through a mounted docker socket or with direct socket access doesn’t make a difference in security.

I don’t know if anybody seriously uses Watchtower, but I wouldn’t be surprised. I know that companies use tools like Argo CD, which has a larger attack surface and a similar level of system access via its Kubernetes service user.

5ymm3trY@discuss.tchncs.de · 2 months ago

I think I get where your coming from. In this specific case of Watchtower it is not a security flaw it just uses the socket to do what it is supposed to do. You either trust them and live with the risks it comes with or you don’t and find another solution. I used Watchtower as the example because it was the first one I came across that needs this access. There might be a lot of other containers out there that use this, so I wanted to hear peoples opinions on this topic and their approach.

LainTrain@lemmy.dbzer0.com · edit-2 2 months ago

Is the container exposed to the internet?

If yes, do not.

If no, I think it will be ok so long as it’s actually not exposed to the internet, e.g. ideally behind NAT with no port forwards and all ipv6 traffic turned off or some other deny all inbound firewall outside the system itself that sits between it and the system on which the container runs.

In the worst case scenario: you’ve given someone a file share on your root partition, but if it’s not exposed to the internet, then the chance of it happening is extremely remote.

5ymm3trY@discuss.tchncs.de · 2 months ago

No, none of my containers are exposed to the internet and I don’t intend to do so. I leave that to people with more experience. I have however setup the Wireguard VPN feature of my router to access my home network from outside which I need occasionally. But as far as I read, that is considered one of the savest options IF you have to make it available. No outside access is of course always preferred.

Eirikr70@jlai.lu · 2 months ago

I use Watchtower just to notify me of the updates. So the docker socket is read-only.

5ymm3trY@discuss.tchncs.de · 2 months ago

Interesting. I just skimmed through the documentation again and couldn’t find anything about read-only. How did you set it up exactly? Just because it isn’t auto-updating i.e. writing something, doesn’t necessarily mean it doesn’t have write privileges.

gonzo-rand19@moist.catsweat.com · 2 months ago

I use Podman with Diun (like Watchtower but no auto-updates) and I think that’s the only time I’ve had to mount the socket into the container. Maybe also CrowdSec. Podman is rootless so I feel a bit better about it.

MysteriousSophon21@lemmy.world · 2 months ago

Diun with Podman is a solid approch - I’ve been using it for months and it’s way more secure than exposing the docker socket with watchtower, plsu the notifications are configurable without the auto-update risks (which saved my ass during a power outage when I had some great power stations from gearscouts.com keeping my server rack alive).

5ymm3trY@discuss.tchncs.de · 2 months ago

I don’t know anything about Podman but I think Docker also has a rootless mode, however I don’t really know any details about that either. Maybe I should read more about that.

Yeah, I think I also saw some fancy dashboard with Grafana and Prometheus where some part also required access to the socket (can’t remember which), so I thought it might me more common to do that than I originally thought.

pavjav@lemmy.world · 2 months ago

If you mean updating the images themselves, I just use kubernetes and rolling updates. Works like a charm.

As for monitoring, kubernetes also handles that well. Liveness probes are kind of standard, then Prometheus for more intense monitoring.

If you don’t mind the extra overhead it would probably address these issues for you.

5ymm3trY@discuss.tchncs.de · 2 months ago

I have heard the name Kubernetes and know that is also some kind of container thing, but never went really deeper than that. It was more a general question how people handle the whole business of exposing the docker socket to a container. Since I came across it in Watchtower and considered installing that I used it as an example. I always thought that Kubernetes and Docker swarms and things like that are something for the future when I have more experience with Docker and containers in general, but thank you for the idea.

moonpiedumplings@programming.dev · 2 months ago

I’ve seen three cases where the docker socket gets exposed to the container (perhaps there are more but I haven’t seen any?):

Watchtower, which does auto updates and/or notifies people
Nextcloud AIO, which uses a management container that controls the docker socket to deploy the rest of the stuff nextcloud wants.
Traefik, which reads the docker socket to automatically reverse proxy services.

Nextcloud does the AIO, because Nextcloud is a complex service, but it grows to be very complex if you want more features or performance. The AIO handles deploying all the tertiary services for you, but something like this is how you would do it yourself: https://github.com/pimylifeup/compose/blob/main/nextcloud/signed/compose.yaml . Also, that example docker compose does not include other services, like collabara office, which is the google docs/sheets/slides alternative, a web based office.

Compare this to the kubernetes deployment, which yes, may look intimidating at first. But actually, many of the complexities that the docker deploy of nextcloud has are automated away. Enabling the Collabara office is just collabara.enabled: true in the configuration of it. Tertiary services like Redis or the database, are included in the Kubernetes package as well. Instead of configuring the containers itself, it lets you configure the database parameters via yaml, and other nice things.

For case 3, Kubernetes has a feature called an “Ingress”, which is essentially a standardized configuration for a reverse proxy that you can either separate out, or one is provided as part of the packages. For example, the nextcloud kubernetes package I linked above, has a way to handle ingresses in the config.

Kubernetes handles these things pretty well, and it’s part of why I switched. I do auto upgrade, but I only auto upgrade my services, within the supported stable release, which is compatible for auto upgrades and won’t break anything. This enables me to get automatic security updates for a period of time, before having to do a manual and potentially breaking upgrade.

TLDR: You are asking questions that Kubernetes has answers to.

5ymm3trY@discuss.tchncs.de · 2 months ago

Thanks for the write-up and sorry for the late reply. I guess I didn’t come very far without exposing the docker socket. Nextcloud was actually one of the services on my list I wanted to try out. But I haven’t looked at the compose file yet. It makes sense why it is needed by the AIO image. Interestingly, it uses a Docker socket proxy to presumably also mitigate some of the security risks that come from exposing the socket. Just like another comment in this thread already mentioned.

However, since I don’t know much about Kubernetes I can’t really tell if it improves something, or if the privileges are just shifted e.g. from the container having socket access to the Kubernetes orchestration thingy having socket access. But it looks indeed interesting and maybe it is not a bad idea to look into it even early on in my selfhost and container adventure.

Even though I said otherwise in another comment, I think I have also seen socket access in Nginx Proxy Manager in some example now. I don’t really know the advantages other than that you are able to use the container names for your proxy hosts instead of IP and port. I have also seen it in a monitoring setup, where I think Prometheus has access to the socket to track different Docker/Container statistics.

moonpiedumplings@programming.dev · 2 months ago

I think I have also seen socket access in Nginx Proxy Manager in some example now. I don’t really know the advantages other than that you are able to use the container names for your proxy hosts instead of IP and port

I don’t think you need socket access for this? This is what I did: https://stackoverflow.com/questions/31149501/how-to-reach-docker-containers-by-name-instead-of-ip-address#35691865

5ymm3trY@discuss.tchncs.de · 2 months ago

Yeah, you are right a custom bridge network can do DNS resolution with container names. I just saw in a video from Lawrence Systems, that he exposed the socket. And somewhere else I saw that container names where used for the proxy hosts in NPM. Since the default bridge doesn’t do DNS resolution I assumed that is why some people expose the socket.

I just checked again and apparently he created the compose file with ChatGPT which added the socket. https://forums.lawrencesystems.com/t/nginx-proxy-manager-docker/24147/6 I always considered him to be one of the more trustworthy and also security conscious people out there, but this makes me question his authority. Atleast he corrected the mistake, so everyone who actually uses his compose file now doesn’t expose the socket.

LiveLM@lemmy.zip · edit-2 2 months ago

I use Docker Socket Proxy.
Linux Server IO has their own version too

5ymm3trY@discuss.tchncs.de · 2 months ago

That sounds interesting, but I think I am following an approach where I don’t have to expose the socket at all and see how far I can get with that. If I ever have to expose it, this will definitely be something to come back to. Thanks for the suggestion!