Sloggging In Swift

slog informal verb
1. work hard over a period of time.
2. hit (someone or something) forcefully and typically wildly, especially in boxing or cricket.

The Slogging project has, to my knowledge, nothing to do with hitting things wildly nor forcefully. I suppose it must have been named for the intensive nature of regular data collection, I'll admit "Swift Stats System" while more descriptive does not quite have the same ring to it.

Swift Slogging is a sub-project I've been directed to as a good starting place to learn about data gathering and some of the code here may be reusable for my purposes. The Slogging project in its current form exists as a smallish git repository, quite separate from Swift itself. In order to use it requires an independent installation and set up. This isn't too intensive, it mostly involves editing configuration files and preparing the system to generate and store hourly logs. Setting up a new account to store the new slogging logs is probably the trickiest bit.

The Slogging system itself is made up of three main stages:

  1. Log creation
  2. Log uploading
  3. Log processing

There are two types of logs, Access Logs and Account/Database Stats. Access logs are generated from proxy server data about requests. Account and Container database stats are generated by crawling the account servers for databases and pulling data from them as it goes along. One CSV file is produced for every run of swift-account-stats-logger for each storage node, every hour. These are produced using cron jobs: one each for the first two stages (creation and uploading) for each type of log (Access, Account, and Container) and once for the final stage (processing). You end up with seven cron jobs:

  • Access Data Creation
  • Account Data Creation
  • Container Data Creation
  • Access Data Uploading
  • Account Data Uploading
  • Container Data Uploading
  • Processing of All Data

The results of this data collection are stored inside Swift. Slogging docs recommend setting up a new account especially for the logged data. I set up a Swift-All-In-One (or SAIO for short) for my development needs and added a new loop-back device to use as the new Slogging Account Storage Node.

To install the slogging functionality at the moment requires following the steps outlined in the Slogging documentation. Some of the steps I found a little confusing not because of their complexity but because of the terminology. In Swift an account has nothing to do with users or identity, it's more like a bank account in that sense, a high tiered storage location. The term "account hash" is also not referring to the MD5 Ring hash or a hash of the partition UUID or anything fancy like I had originally suspected - it's really just the name of the account.

Examining the Slogging code reveals its inner workings. My project is about account data, so I spent a while analysing the AccountStatsCollector class. Directories are scanned from bottom-up using Python's os.walk function, and when a database file is found account the details (Account Hash, Container Count, Object Count, Bytes Used) are logged. Container stats collection works in the same way. It's quite straightforward, but could take a very long time if there is are a lot of files in the storage system.

While possible to use this method get account data for my project there may be a better way so I will be tinkering with slogging and looking at other options over the next few days and try come up with a few different options.


Sources:

  1. Swift Slogging Docs
  2. SAIO Set Up