Deap dive into using Heat Alarms

So this is just some output from me debugging how Heat’s alarms work with the other parts
of OpenStack (warts-n-all).

First some histroy

Way back, Heat needed alarms to be able to do autoscaling (because it is really useful). The only problem
was we didn’t have Ceilometer back then. So we wrote a minimalist version of Cloud Watch, it was not
suposed to last very long as we fully expected some monitoring-aas to come along. Alas, it has lasted a
lot longer then expected. Also note that we didn’t have Alarms based on notifcations at this point so
all metrics were generated by an agent (cfn-push-stats).

Using Cloud Watch alarms

First off how you can choose the implementation (Ceilometer/Heat).

We use the global environment to configure this. Look in /etc/heat/environment.d/default.yaml
Just uncomment the one you want (We HIGHLY recommend Ceilometer).

# Choose your implementation of AWS::CloudWatch::Alarm
"AWS::CloudWatch::Alarm": "file:///etc/heat/templates/AWS_CloudWatch_Alarm.yaml"
#"AWS::CloudWatch::Alarm": "OS::Heat::CWLiteAlarm"

Note: the default will change this cycle to use Ceilometer for Cloud Watch Alarms.

There are unfortunately some differences depending on the implementation and whether or not
you are using cfn-push-stats.

A note about cfn-push-stats “–watch”
Basically this was a mistake in implementation as it causes a circular dependancy (this was hidden at
the time of implementation as way back then Ref returned only the resource name, not a resource instance.
(AWS does not do this, they use tags, like we now do)
Basically you should just drop the “–watch” option (it’s just not needed anymore).

Some Cloud Watch examples

CWLiteAlarm with in guest agent

AWS_CloudWatch_Alarm.yaml with built in Ceilometer metrics

AWS_CloudWatch_Alarm.yaml with in guest agent

Some Ceilometer Examples

How do these alarms work?

type: OS::Ceilometer::Alarm
counter_name: cpu_util
statistic: avg
period: '60'
evaluation_periods: '1'
threshold: '50'
- {"Fn::GetAtt": [ServerScaleUpPolicy, AlarmUrl]}
matching_metadata: {'metadata.user_metadata.server_group': 'ServerGroup'}
comparison_operator: gt

You will notice the matching_metadata, this is how Ceilometer finds the applicable samples to
calculate the Alarm. When ever we create a server we add metering.InstanceId=x and when ever
we create an autoscaling group we add to the server’s metadata.
Then Ceilometer looks for metadata with the prefix metering. and adds that to the sample metadata.
Have a look here:
Just remember if you are using cfn-push-stats or even ceilmeter client directly this renaming does not happen
and you will need to search for metering. and not user_metadata..

With builtin metrics

With in guest agent

Some potential reasons that your Alarms don’t work:

1) cfn-push-stats is out of date and has a bug – at one point it had a bug that pervented samples from being sent
2) The Ceilometer pipeline interval is too slow (needs to ~60 secs)
3) check the matching_metadata to see if it is correct (use ceilometer -d sample-list -m ) check what the metadata actual is.
4) we have a bug in Heat:-O – head over to #heat and ask for help.

Things we need to do

1) make this more consistent
2) add all those combinations to integration tests
3) look at adding Monasca Alarms (if possible)
4) remove the builtin cloud watch implementation (it’s now deprecated)

  1. #1 by Ramon Caposole on April 30, 2017 - 7:20 pm

    Thanks very nice blog!|

  1. OpenStack Community Weekly Newsletter (Sep 19 – 26) » The OpenStack Blog

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

<span>%d</span> bloggers like this: