Puppet : All-way near realtime file syncronization
Here is an example on how to obtain all-way near realtime file syncronization on a cluster of server nodes managed by Puppet. It uses the old-ish but very useful csync2 to track all files, combined with the excellent lsyncd to trigger synchronization based on inotify events.
Advantages :
- Scalable read performance thanks to local storage being used
- No single point of failure (no NAS, no SAN)
- No new low-level technology involved (simple block devices, typical filesystems)
- Content deletion is reliably supported
Disadvantages :
- Can scale only as far as csync2 with its sqlite database can (millions of files will be a problem)
- Can scale only as far as the RAM can with one inode’s size used in memory per directory being watched
- Can scale only as far as the cluster’s smallest disk, since each node has the full content replicated
- Constantly modified files cannot be reliably synced (log files)
The required puppet modules are :
Here is a real-world example of some web servers which have a lot of static data occasionaly managed by users logged into any of the web servers (the wwwdata content), as well as a small set of application-specific and environment-specific configuration files which aren’t managed by Puppet (the wwwshared content). Both are separated in order to avoid having a slow sync of wwwdata affect wwwshared.
node /^web/ { # Make sure resources are in the working order Csync2::Key <| |> -> Csync2::Cfg <| |> -> Class['Lsyncd'] # Csync2 class { 'csync2': } csync2::cfg { 'wwwdata': source => "puppet:///modules/mycluster/csync2_wwwdata.cfg", } csync2::key { 'wwwdata': source => "puppet:///modules/mycluster/csync2_wwwdata.key", } csync2::cfg { 'wwwshared': source => "puppet:///modules/mycluster/csync2_wwwshared.cfg", } csync2::key { 'wwwshared': source => "puppet:///modules/mycluster/csync2_wwwshared.key", } # Lsyncd # Each of these MUST also be in csync2_*.cfg files $lsyncd_csync2_sources = { '/var/www/shared' => 'wwwshared', '/var/www/data/ini' => 'wwwdata', '/var/www/data/png' => 'wwwdata', '/var/www/data/zip' => 'wwwdata', } class { 'lsyncd': config_content => template('lsyncd/lsyncd-csync2.conf.erb'), } }
The key files contain content generated with the csync2 -k command.
The cfg files are prepared csync2 configuration files as follow :
nossl * *; group web { host web1.example.com@192.168.1.1; host web2.example.com@192.168.1.2; host web3.example.com@192.168.1.3; key /etc/csync2/csync2_wwwdata.key; # Each directory MUST also be in $lsyncd_csync2_sources! include /var/www/data/ini; include /var/www/data/png; include /var/www/data/zip; }
nossl * *; group web { host web1.example.com@192.168.1.1; host web2.example.com@192.168.1.2; host web3.example.com@192.168.1.3; key /etc/csync2/csync2_wwwshared.key; include /var/www/shared; exclude .*.swp; }
This could easily be improved by using templates to build the cfg files. The csync2 module could also be modified to use exported resources and the concat module to have the host lines automatically generated based on the nodes where this manifest is applied, but that would be a major change in the module’s requirements. Still, do feel free to make these kind of changes!
I’ve built and released a Puppet module to do just this:
https://github.com/scottsb/puppet-clustersync
Cool! Thanks a lot for sharing :-)