5 min read

Ceph RBD Volume Header Recovery

Background

Ceph is a massively scalable, failure-resistant, and self-healing distributed object storage system. As with all computer systems, a series of unfortunate events can affect Ceph clusters and in some situations, objects can be damaged beyond repair or lost.

Volumes are represented through the RBD (RADOS Block Device) interface. The Volume can either be format 1 or format 2. This guide is for format 2 Volumes only.

In the rare and unfortunate event of losing the rbd_header object describing a RBD Volume, you will no longer be able to perform operations on or attach said volume to any instance. It may however be possible to recover from such a situation and this document describes how recovery can be done.

Please note that if the rbd_header object is lost while a volume is attached to an instance, the instance will retain access to the volume until instance termination.

Synopsis

The RADOS Block Device (RBD) is a way of mapping objects into what an operating system would view as a regular block device. The objects are grouped by volume and each volume has a set of objects describing the properties of the volume itself.

Object Prefix Description
rbd_id.NAME This object contains the internal ID used for naming of
objects that belong to this volume.
rbd_header.ID This object’s object map (meta data) holds the
characteristics of the volume (features, object_prefix,
size, etc.)
rbd_data.ID The data in the volume is stored in objects usually
named like this (actual prefix is defined in the header)

Gathering information

Example Ceph Pool: cinder-ceph
RBD Volume Name:   volume-c103a237-2226-4f1d-8c8e-357a06511adf

Get internal ID of a Volume

sudo rados -p cinder-ceph get rbd_id.volume-c103a237-2226-4f1d-8c8e-357a06511adf -|strings
10db74b0dc51

Get Volume metadata from rbd_header with rados CLI

sudo rados -p cinder-ceph listomapvals rbd_header.10db74b0dc51
features
value: (8 bytes) :
0000 : 01 00 00 00 00 00 00 00                         : ........

object_prefix
value: (25 bytes) :
0000 : 15 00 00 00 72 62 64 5f 64 61 74 61 2e 31 30 64 : ....rbd_data.10d
0010 : 62 37 34 62 30 64 63 35 31                      : b74b0dc51

order
value: (1 bytes) :
0000 : 16                                              : .

size
value: (8 bytes) :
0000 : 00 00 00 40 00 00 00 00                         : ...@....
 
snap_seq
value: (8 bytes) :
0000 : 00 00 00 00 00 00 00 00                         : ........

Get Volume metadata using rbd CLI

sudo rbd -p cinder-ceph info volume-c103a237-2226-4f1d-8c8e-357a06511adf

rbd image 'volume-c103a237-2226-4f1d-8c8e-357a06511adf':
   size 1024 MB in 256 objects
   order 22 (4096 kB objects)
   block_name_prefix: rbd_data.10db74b0dc51
   format: 2
   features: layering

Simulate missing Volume rbd_header

sudo rados -p cinder-ceph rm rbd_header.10db74b0dc51
sudo rados -p cinder-ceph listomapvals rbd_header.10db74b0dc51
error getting omap keys cinder-ceph/rbd_header.10db74b0dc51: (2) No such file or directory

sudo rbd -p cinder-ceph info volume-c103a237-2226-4f1d-8c8e-357a06511adf
rbd: error opening image volume-c103a237-2226-4f1d-8c8e-357a06511adf: (2) No such file or directory
2017-04-17 09:03:30.634415 10db74b0dc51 -1 librbd::ImageCtx: error reading immutable metadata: (2) No such file or directory

Create new rbd_header for existing Volume

  • Use the hexadecimal values from ‘rados listomapvals’ from different Volume as input.
  • Create a new Volume with the same characteristics as the Volume you are recovering RBD header for and copy values from there.
  • Please note that when dealing with Volumes created by OpenStack Cinder, just doing ‘rbd create’ from the command line will probably give you a volume with different characteristics.
  • Create a new volume with Cinder and use that as template.
  • If your cluster has evolved over time, the features for volumes might have changed. Cross check that the new volume has features on par with a volume created at the same point in history as the volume you are recovering.

First verify that the header we are about to create does not already exist

sudo rados -p cinder-ceph stat rbd_header.10db74b0dc51
error stat-ing cinder-ceph/rbd_header.10db74b0dc51: (2) No such file or directory

Adapt the values in the following section to the ones you have gathered

echo -en \\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00 | sudo rados -p cinder-ceph setomapval rbd_header.10db74b0dc51 features
echo -en \\x15\\x00\\x00\\x00rbd_data.10db74b0dc51 | sudo rados -p cinder-ceph setomapval rbd_header.10db74b0dc51 object_prefix
echo -en \\x16 | sudo rados -p cinder-ceph setomapval rbd_header.10db74b0dc51 order
echo -en \\x00\\x00\\x00\\x40\\x00\\x00\\x00\\x00 | sudo rados -p cinder-ceph setomapval rbd_header.10db74b0dc51 size
echo -en \\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00 | sudo rados -p cinder-ceph setomapval rbd_header.10db74b0dc51 snap_seq

NB! Recent version of the Ceph client tools is required to be able to set omap values with data from STDIN. If your cluster is currently not running a recent version of Ceph, access with recent tools can be achieved by adding an empty machine (container, VM, physical) to the cluster purely for this purpose. No need to upgrade your existing Ceph cluster.

Verify result

Verify that you can get information about volume again

sudo rbd -p cinder-ceph info volume-c103a237-2226-4f1d-8c8e-357a06511adf
rbd image 'volume-c103a237-2226-4f1d-8c8e-357a06511adf':
   size 1024 MB in 256 objects
   order 22 (4096 kB objects)
   block_name_prefix: rbd_data.10db74b0dc51
   format: 2
   features: layering

Export / backup Volume

  • I recommend performing this operation on a system with good connectivity and available bandwidth towards the storage network, typically, a nova-compute (hypervisor) or cinder node. Temporarily copy ‘/etc/ceph/ceph.client.admin.keyring’ from one of the ceph-mon nodes to gain access. Make sure to remove it afterwards.
sudo rbd -p cinder-ceph export volume-c103a237-2226-4f1d-8c8e-357a06511adf OUTPUT_FILE

You may now attempt to attach the volume to an instance again.