Amazon Certification Sysops Administrator Associate

Demonstrate ability to monitor availability and performance

Creating CloudWatch Alarms

The default for ec2 are host metrics, like cpu , disk write/read, but not memory usage or disk usage for that kind we need to install something inside the machine.

A topic means that a notification finish in some point , the topic can run email, http , sqs or sms ecc.

Installing And Configuring Monitoring Scripts for Amazon EC2 Instances

it is a perl script that needs to comunicate with the cloudwat.
You can do with the role or with the keys but it is always do by role.

Create role , cloud wacht full access

login inside the machine
sudo apt-get install unzip libperl … and others

download the package

run every minute or 5 minutes mon-put-instance-data.pl with the parameters

Elastic Compute Cloud (EC2) Instance and System Status Checks

There are status check enable by default like:

System Status Check:

  • loss of network connectivity
  • loss of system power
  • sw issue on the physical host

resolution: best way of resolvin is start/stop or rebuild the host, because except for tennancy hw usually they run on different host.

Istance Status Checks:

  • misconfigured networking on startup
  • exausted memory
  • corrupt file system (needs rebuil the instance)

stop and start not reboot permit to run in a different host

Creating And Scripting Automation For EC2 Snapshots

yum install python-pip
pip install boto
vi backup.py

list all the ec2 volumes and create a snapshot for everyone

import boto 
ec2 = boto.connect_ec2('mysecretkey','mysecretaccesskey','us-east-1')
volumes = ec2.get_all_volumes()
for volume in volumes:
    print volume.id

run the script

chmod u+x backup.py
./backup.py

get the list of all

if I replace the print with this command I obtain the copy of everything

ec2.create_snapshot(volume.id,"backup_script")

snapshots = ec2.get_all_snapshots()
for snapshot in snapshots
    if snapshot.description == 'backup script':
    print snapshot.id

    print "deleteing" snaphot.id 
    snapshot.delete

Building IAM Policies

an explicit Deny override an explicit allow

we can't select allowt to all the actions of a ec2 action except the terminate actions because there are too many actions and we receive an error. "The maximum policy size is"

We need to enable all the actions with the * and deny the action of delete an instance.

it is possible specify an variable inside the policy in this way I can access to my home directory only with this variable

${aws:username}

Monitoring ElastiCache For Performance And Availability

two cache engine

  • memcahed
  • redis

Monitoring EBS For Performance And Availability

what determines if a volume is burstable or not

I/O Credits

status checks

  • ok = normal
  • warning = degradated (below the expeted, severely degrated well below expected)
  • imparied = stalled (severely impacted), not available (i/o disable)
  • insufficiet data

Monitoring RDS For Performance And Availability

swap usage
memcache: if swap usage exceed 50 MB we need to increase

cpu utilization

evictions
is the replace of an old elment with a new one becasue it is necessary free space

CHECK THIS LATER

amazon EBS status check

  • OK
  • WARINING: degraded, severely degraded
  • IMPAIRED: stalled, not available (io disable)
  • INSUFFICIENT DATA

Elastick Cache Monitor

Swap Usage

  • memcached: should stay 0 most of the time and not exceed 50 MB. It is possible increade the memcached_connections_overhead = amount of memory reserved for connections and other reouscrs/overhead
  • Redis , not set monitor for swap

CPU

  • memcached: multithread no problem if cpu <= 90% is is up more nodes or more larger instance
  • redis not multithread cpu utilization 90/number of core, application specific

evictions

occour when a new item is added or an older is removed for a lack of free space
choose a threshold is based

  • memcached if instance surpasses increase the size or add available nodes in the cluster to solve
  • scale the cluster adding read replica

network bottleneck

instance and network in mbps

  • small 250
  • large 500
  • xlarge 1000

iperf is a tool to test bandwith

continue from pag 66

Salvo diversa indicazione, il contenuto di questa pagina è sotto licenza Creative Commons Attribution-ShareAlike 3.0 License