Many Deis users are running Deis quite successfully in production. When readying a Deis deployment for production workloads, there are some additional (but optional) recommendations.
The Deis Control Plane, Data Plane, and Router Mesh components all depend on an etcd cluster for service discovery and configuration.
Whether built for evaluation or to host production applications, when managing a small Deis cluster (three to five nodes), it is reasonable to accept the platform’s default behavior wherein etcd runs on every node within the cluster.
In larger Deis clusters however, running etcd on every node can have a deleterious effect on overall cluster performance since it increases the time required for nodes to reach consensus on writes and leader elections. In such cases, it is beneficial to isolate etcd to a small, fixed number of nodes. All other nodes in the Deis cluster may run an etcd proxy. Proxies will forward read and write requests to active participants in the etcd cluster (leader or followers) without affecting the time required for etcd nodes to reach consensus on writes or leader elections.
The benefit of running an etcd proxy on any node not running a full etcd
process is that any container or service depending on etcd can connect to
etcd easily via
localhost from any node in the Deis cluster.
Also see CoreOS cluster architecture documentation for further details.
See Isolating etcd for further details.
The Deis Control Plane makes use of Ceph to provide persistent storage for the Registry, Database, and Logger components. The additional operational complexity of Ceph is tolerated because of the need for persistent storage for platform high availability.
Alternatively, persistent storage can be achieved by running an external S3-compatible blob store, PostgreSQL database, and log service. For users on AWS, the convenience of Amazon S3 and Amazon RDS make the prospect of running a Ceph-less Deis cluster quite reasonable.
Running a Deis cluster without Ceph provides several advantages:
See Running Deis without Ceph for details on removing this operational complexity.
When a host in your CoreOS cluster fails or becomes unresponsive, the CoreOS scheduler will relocate any cluster services on that machine to another host. These services come up on the new host just fine, but a component’s first task is to pull the corresponding Docker image from Docker Hub. Depending on factors such as available bandwidth, network latency, and performance of the Docker Hub platform, this can take some time. Failover is not finished until the pull completes and the component starts.
To minimize component downtime should failover occur, it is recommended to preseed the Docker images for Deis on all hosts in a cluster. This will pull all the images to the host’s local Docker graph, so if failover should occur, a component can start quickly.
A preseed script is provided as a script already loaded on CoreOS hosts.
On all hosts in the cluster, run:
This will pull all component images for the installed version of Deis.
There are some additional security-related considerations when running Deis in production, and users can consider enabling a firewall on the CoreOS hosts as well as the router component.
See Security considerations for details.
Backing up data regularly is recommended. See Backing Up and Restoring Data for steps.
Changing the registration process is highly recommended in production. By default, registrations for a new cluster are open to anyone with the proper URL. Once the admin user has registered with a new cluster, it is recommended that you either turn off registrations entirely or enable the admin only registration feature.
Please see the following documentation: Customizing controller