Lily deployment: Kubernetes
The Sentinel team operates a production deployment of Lily using Kubernetes and maintain Helm charts for it. This guide provides an overview on how to achieve the same.
Helm charts for Lily are available for ease of deployment. The following steps will use helm to deploy Lily to an existing Kubernetes cluster.
Add Filecoin helm charts repo.
$ helm repo add filecoin https://filecoin-project.github.io/helm-charts
Copy default values YAML out of the Helm chart into a new
values.yamlfile to use locally for your deployment and adjust the values to meet the needs of your deployment.
If needed, set up the database credentials. They must be provided in a
secretwithin the cluster. The
<deployed_secret_name>is the key which is referred to in
# export the following envvars (whose names have no significance other # than to indicate their placement in the `kubectl` and `helm` commands below) KUBE_CONTEXT=<kube_context> # otherwise as inferred by kubectl KUBE_NAMESPACE=<kube_namespace> # otherwise as inferred by kubectl LILY_SECRET_NAME=<deployed_secret_name> LILY_SECRET_HOST=<resolveable_hostname> LILY_SECRET_PORT=<port> LILY_SECRET_DATABASE=<database_name> LILY_SECRET_USER=<username> LILY_SECRET_PASS=<password> # create the secret $ kubectl create secret generic \ --context="$(KUBE_CONTEXT)" \ --namespace "$(KUBE_NAMESPACE)" \ "$(LILY_SECRET_NAME)" \ --from-literal=url="postgres://$(LILY_SECRET_USER):$(LILY_SECRET_PASS)@$(LILY_SECRET_HOST):$(LILY_SECRET_PORT)/$(LILY_SECRET_DATABASE)?sslmode=require"
Be sure to configure your
... storage: postgresql: - name: db secretName: <deployed_secret_name> # <- goes here secretKey: url schema: lily applicationName: lily ...
Deploy your release with
helm install. (Make sure you are using the right kubernetes context for your intended cluster.) The following example uses
helm upgrade --installwhich universally works for
$ helm upgrade --install \ --kube-context="$(KUBE_CONTEXT)" \ --namespace="$(KUBE_NAMESPACE)" \ $(RELEASE_NAME) filecoin/lily -f ./custom-values.yaml
NOTE: The flags
--timeoutcan be added to make this a blocking request, instead of returning immediately after successful delivery of the install/upgrade request.
Monitor the deployment of your release.
# get logs of Lily container $ export CONTAINER_NAME=daemon $ kubectl --context="$(KUBE_CONTEXT)" \ --namespace="$(KUBE_NAMESPACE)" logs $(RELEASE_NAME)-lily-0 $(CONTAINER_NAME) \ --follow # get interactive shell in Visor container $ kubectl --context="$(KUBE_CONTEXT) \ --namespace="$(KUBE_NAMESPACE)" \ exec -it $(RELEASE_NAME)-lily-0 $(CONTAINER_NAME) -- bash
You can update the
custom-values.yaml file and deploy changes with:
# same helm upgrade --install command as before $ helm upgrade \ --install \ --kube-context="$(KUBE_CONTEXT)" \ --namespace="$(KUBE_NAMESPACE)" \ $(RELEASE_NAME) filecoin/lily -f custom-values.yaml
Controlling Helm chart versions used in deployment
If you want to control the specific version of Helm chart used, a
flag may be passed into
$ helm upgrade --install $(RELEASE_NAME) filecoin/lily --version "M.N.R"
--version argument will use the latest available version.
deployment error: “N node(s) had volume node affinity conflict”
Background: Sentinel Visor has Persistent Volume Claims (PVC) which are lazily assigned as they bound to the pod they are requested from. When a release has been destroyed without cleaning these lazily created PVC or an upgrade causes a pod to be assigned to a new node (different from what it was previously scheduled) the existing PVC can cause the deployment to become stuck.
Describe the pod which is having the schedule confict to identify the PVC name it is bound to. _(Note: Assuming your release name is
kubectl describe pod <releasename>-visor-0
$ kubectl describe pod analysis-visor-0 Volumes: datastore-volume: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: datastore-volume-analysis-visor-0 ReadOnly: false
Delete the PVC.
kubectl delete pvc datastore-volume-<releasename>-visor-0
$ kubectl delete pvc datastore-volume-analysis-visor-0 persistentvolumeclaim "datastore-volume-analysis-visor-1" deleted
Restart the pod.
kubectl delete pod <releasename>-visor-0
$ kubectl delete pod analysis-visor-0 pod "analysis-visor-1" deleted
deployment error: Multi-Attach error for volume “XXX”
Background: Deployment is in progress and the new pod attempts to start up but blocks with the error “Multi-Attach error for volume “pvc-a41e35bd-d1dc-11e8-9b2b-fa163ef89d28” Volume is already exclusively attached to one node and can’t be attached to another.” This is likely a timing issue where termination of a pod on one node has not allowed the volume to be released before the scheduling and spin-up of the replacement pod on a different node. The quickest fix is to delete the pod and allow the StatefulSet to restore the pod and be able to bind the volume to the new node.
Delete the pod and wait for the scheduler to start it again. Generally, it only takes a little time for the volume to release.
$ kubectl delete pod <releasename>-visor-<N>
$ kubectl delete pod analysis-visor-1 pod "analysis-visor-1" deleted
Observe the pod being rescheduled and the volume properly attaching to the new node.