Lily deployment: Kubernetes
The Sentinel team operates a production deployment of Lily using Kubernetes and maintain Helm charts for it. This guide provides an overview on how to achieve the same.
Helm charts for Lily are available for ease of deployment. The following steps will use helm to deploy Lily to an existing Kubernetes cluster.
We assume users are already familiar with the basic operation of Helm and Kubernetes clusters.
Helm deployment
Add Filecoin helm charts repo.
$ helm repo add filecoin https://filecoin-project.github.io/helm-charts
Copy default values YAML out of the Helm chart into a new
values.yaml
file to use locally for your deployment and adjust the values to meet the needs of your deployment.If needed, set up the database credentials. They must be provided in a
secret
within the cluster. The<deployed_secret_name>
is the key which is referred to incustom-values.yaml
.# export the following envvars (whose names have no significance other # than to indicate their placement in the `kubectl` and `helm` commands below) KUBE_CONTEXT=<kube_context> # otherwise as inferred by kubectl KUBE_NAMESPACE=<kube_namespace> # otherwise as inferred by kubectl LILY_SECRET_NAME=<deployed_secret_name> LILY_SECRET_HOST=<resolveable_hostname> LILY_SECRET_PORT=<port> LILY_SECRET_DATABASE=<database_name> LILY_SECRET_USER=<username> LILY_SECRET_PASS=<password> # create the secret $ kubectl create secret generic \ --context="$(KUBE_CONTEXT)" \ --namespace "$(KUBE_NAMESPACE)" \ "$(LILY_SECRET_NAME)" \ --from-literal=url="postgres://$(LILY_SECRET_USER):$(LILY_SECRET_PASS)@$(LILY_SECRET_HOST):$(LILY_SECRET_PORT)/$(LILY_SECRET_DATABASE)?sslmode=require"
Be sure to configure your
custom-values.yaml
with the<deployed_secret_name>
like so:... storage: postgresql: - name: db secretName: <deployed_secret_name> # <- goes here secretKey: url schema: lily applicationName: lily ...
Deploy your release with
helm install
. (Make sure you are using the right kubernetes context for your intended cluster.) The following example useshelm upgrade --install
which universally works forinstall
andupgrade
(change-in-place) operations.$ helm upgrade --install \ --kube-context="$(KUBE_CONTEXT)" \ --namespace="$(KUBE_NAMESPACE)" \ $(RELEASE_NAME) filecoin/lily -f ./custom-values.yaml
NOTE: The flags
--wait
and--timeout
can be added to make this a blocking request, instead of returning immediately after successful delivery of the install/upgrade request.Monitor the deployment of your release.
# get logs of Lily container $ export CONTAINER_NAME=daemon $ kubectl --context="$(KUBE_CONTEXT)" \ --namespace="$(KUBE_NAMESPACE)" logs $(RELEASE_NAME)-lily-0 $(CONTAINER_NAME) \ --follow # get interactive shell in Visor container $ kubectl --context="$(KUBE_CONTEXT) \ --namespace="$(KUBE_NAMESPACE)" \ exec -it $(RELEASE_NAME)-lily-0 $(CONTAINER_NAME) -- bash
Upgrades
You can update the custom-values.yaml
file and deploy changes with:
# same helm upgrade --install command as before
$ helm upgrade \
--install \
--kube-context="$(KUBE_CONTEXT)" \
--namespace="$(KUBE_NAMESPACE)" \
$(RELEASE_NAME) filecoin/lily -f custom-values.yaml
Controlling Helm chart versions used in deployment
If you want to control the specific version of Helm chart used, a --version
flag may be passed into helm upgrade|install
:
$ helm upgrade --install $(RELEASE_NAME) filecoin/lily --version "M.N.R"
Omitting --version
argument will use the latest available version.
Troubleshooting
deployment error: “N node(s) had volume node affinity conflict”
Background: Sentinel Visor has Persistent Volume Claims (PVC) which are lazily assigned as they bound to the pod they are requested from. When a release has been destroyed without cleaning these lazily created PVC or an upgrade causes a pod to be assigned to a new node (different from what it was previously scheduled) the existing PVC can cause the deployment to become stuck.
To resolve:
Describe the pod which is having the schedule confict to identify the PVC name it is bound to. _(Note: Assuming your release name is
analysis
.)kubectl describe pod <releasename>-visor-0
Example:
$ kubectl describe pod analysis-visor-0 Volumes: datastore-volume: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: datastore-volume-analysis-visor-0 ReadOnly: false
``
Delete the PVC.
kubectl delete pvc datastore-volume-<releasename>-visor-0
Example:
$ kubectl delete pvc datastore-volume-analysis-visor-0 persistentvolumeclaim "datastore-volume-analysis-visor-1" deleted
Restart the pod.
kubectl delete pod <releasename>-visor-0
Example:
$ kubectl delete pod analysis-visor-0 pod "analysis-visor-1" deleted
deployment error: Multi-Attach error for volume “XXX”
Background: Deployment is in progress and the new pod attempts to start up but blocks with the error “Multi-Attach error for volume “pvc-a41e35bd-d1dc-11e8-9b2b-fa163ef89d28” Volume is already exclusively attached to one node and can’t be attached to another.” This is likely a timing issue where termination of a pod on one node has not allowed the volume to be released before the scheduling and spin-up of the replacement pod on a different node. The quickest fix is to delete the pod and allow the StatefulSet to restore the pod and be able to bind the volume to the new node.
To resolve:
Delete the pod and wait for the scheduler to start it again. Generally, it only takes a little time for the volume to release.
$ kubectl delete pod <releasename>-visor-<N>
Example:
$ kubectl delete pod analysis-visor-1 pod "analysis-visor-1" deleted
Observe the pod being rescheduled and the volume properly attaching to the new node.