azakon-logo

Ansible: automation workflow backup-restore

Ansible should also be used to simplify a backup-restore process’s regular everyday routines.

Ansible was chosen and tuned to the following workflow for regular backup-restore automation.

The content of server1, which was highly loaded, was located in a cloud dedicated, its operating system was quite old, and upgrading to the latest was impossible without downtime for several hours or days. Backups were created only without proving it while live restore.

Tools and workflow were chosen depending on server components, network channels, needed operation system and related software.

Input source data:

  • Dedicated server in the cloud;
  • OS: Ubuntu over LTS period – upgrade and new packages deployment via apt source system impossible;
  • Public accessible with static IP;
  • Used services: Jira, Fisheye, Apache Subversion, MySQL DB server, Apache Tomcat, JDK7, SSL.

To do:

  • Perform regular full and incremental backups of the server1 every 8 hours;
  • Keep reserved content for a month;
  • Deploy latest Ubuntu LTS, with predefined system users, permissions, Apache Subversion, latest MySQL database server, firewall, with public IP as a virtual machine inside vSphere ESXi regularly every 8 hours;
  • Restore content from server1 to server2 regularly every 8 hours;
  • All processes between servers should be made via RSA-passwordless authorization;
  • start and finish stages should be notified to the Slack channel;
  • runtime of server1 should not be interrupted during the process.

The content difference between instances will be equal to a period of backups.

Backup-restore model:

As a fundamental decision for the process, the following model with Ansible and BackupPC. Chef or Puppet could also be used for a similar workflow.
Ansible backup-restore workflow schemaUsed the following components in the workflow:

Before starting the process, additional parameters should be defined:

  • If the vSphere ESXi uses an SSL certificate as a default self-signed, at the instance where Ansible playbooks are executed, to prevent occasional exceptions, should be made following python source changes add to lines of /usr/local/lib/python2.7/dist-packages/pysphere/vi_server.py
     ###============
     import ssl
     ssl._create_default_https_context = ssl._create_unverified_context
     ###============
  • If you don’t need SSH critical host checking during RSA authentication, then in config /etc/ansible/ansible.cfg set the changes:
    # uncomment this line to disable SSH key host checking
    host_key_checking = False
  • If it’s required to create a user with a password on VM via Ansible, then a password hash could be made with the command:
    mkpasswd --method=sha-512
  • A webhook in the admin panel should be created to send notifications in Slack, an Ansible module should be connected, and a shell script should be used.

In general, the resulting workflow consists of the following steps:

  1. BackupPC – before 15 minutes from starting synchronization, on schedule – sends a message to Slack with a notification about backup start;
  2. BackupPC – executes at server1 dump all of its MySQL databases;
  3. BackupPC – executes full/incremental synchronization of content server1 and stores it on a predefined schedule during a month;
  4. BackupPC – prepares archive to restore, merging all copies to a single up-to-date archive to_restore_date.tar.gz;
  5. BackupPC – executes Ansible playbook;
  6. Ansible – at vSphere-ЕСХi-host creates a Virtual Machine with predefined hardware configuration (CPU, RAM, HDD, network interfaces (public/private), boot image). As a boot image is used, prepared set of silent automatized latest Ubuntu LTS deployment;
  7. Ansible – After VM creation and installation, its IP address was obtained, which was set from the DHCP pool of the router;
  8. Ansible – establishes SSH RSA connection to newly created VM, creates system users, to VM is copied archiveto_restore_date.tar.gz from BackupPC;
  9. Ansible – unpack content from the archive to its places;
  10. Ansible – makes MySQL optimization to exclude structure differences between MySQL servers of server1 and server2;
  11. Ansible – runs system daemons on restored server2 and drops the initial archive;
  12. Ansible – send a notification message to slack-channel about the finishing of restore and shows the new server2 public IP address;
  13. DNS-records update would be made manually in alarm case, with stopping BackupPC init sync process simultaneously.

As a result, we obtained a complete live workflow for the periodic backup-restore process between server1 and server2.