How we setup prometheus multi-az persistent storage

We have couple of large and heavily used EKS cluster running inside our AWS accounts. To monitor all of them we are using prometheus. some of our EKS cluster are running in multi AZ and some of them are running in single AZ based on requirement. As basis prometheus setup will create an EBS volume for persisting its data on the disk there were some limiations with this setup. As system scale down and there is no instance in the az where the prometheus server is running then prometheus will not get restarted as EBS volume can’t be mounted cross AZ. To resolve this issue we have updated some of prometheus settings to use EFS instead of EBS for its persistent stroage. As EFS allow us to mount same storage accorss multiple AZ at the same time allow to multiple mount points. Steps to use EFS as prometheus persistent volumes are as follows.

  1. Create EFS Access Points: As EFS use root:root as its owner and prometheus forbids running pods as root. we have to create EFS access points so that we can use it as persistent volume for prometheus persistent volumes. To create access points we have used following configurations.
{
"Name": "Prometheus Server",
"FileSystemId": "fs-123123123",
"PosixUser": {
"Uid": "500",
"Gid": "500",
"SecondaryGids": [
2000
]
},
"RootDirectory": {
"Path": "/prometheus/server",
"CreationInfo": {
"OwnerUid": 500,
"OwnerGid": 500,
"Permissions": "0755"
}
}
}
{
"Name": "Prometheus Alert Manager",
"FileSystemId": "fs-123123123",
"PosixUser": {
"Uid": "501",
"Gid": "501",
"SecondaryGids": [
2000
]
},
"RootDirectory": {
"Path": "/prometheus/alertmanager",
"CreationInfo": {
"OwnerUid": 501,
"OwnerGid": 501,
"Permissions": "0755"
}
}
}

2. Create Persistent Volumes in EKS Cluster: After creating EFS access points we have to create persistent volumes in EKS cluster which will be utilized by prometheus deployment.

pv-server.yml

apiVersion: v1
kind: PersistentVolume
metadata:
name: prometheus-server
spec:
capacity:
storage: 8Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: efs
csi:
driver: efs.csi.aws.com
volumeHandle: fs-123123123::fsap-123abc123abc

pv-alertmanager.yml

apiVersion: v1
kind: PersistentVolume
metadata:
name: prometheus-server
spec:
capacity:
storage: 2Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: efs
csi:
driver: efs.csi.aws.com
volumeHandle: fs-123123123::fsap-123abc123abc

After creating these files you can apply them to EKS cluster and it will create the Persistent volumes.

3. Install/Update prometheus to use EFS: Now you can install prometheus using helm with following commands

helm upgrade -i prometheus prometheus-community/prometheus \
-n prometheus
--set alertmanager.persistentVolume.storageClass="efs" \
--set server.persistentVolume.storageClass="efs"

As soon as you run above command it will create prometheus deployment in prometheus namespace and will use EFS as persistent volumes for its server and alertmanager.