Monitor

Talk monitor

6th March Engine Yards Dublin

  1. shinken http://shaicoleman.com/slides/shinken/#/
  2. omd check: slide http://www.slideshare.net/arturmartins/omd-and-checkmk , code https://github.com/arturmartins/omd-check_mk_starter

Prestazioni

Obiettivo monitoraggio + log delle risorse memoria e cpu + scritture disco e swap

Monitor della memoria, io , cpu

Il comando è vmstat

con "vmstat 5" si ottiene un output ogni 5 secondi , il formato interessante è con l'opzione -a

"vmstat 3 -a"

procs -----------memory----------         ---swap-- -----io----    --system-- -----cpu------
 r  b   swpd   free       inact       active          si   so   bi      bo   in       cs        us  sy id wa st
 2  0  12040 115788 2086292 1562456    0    0   507   271 1345 13452 15  8 44 33  0
 1  0  12040 118060 2085720 1560596    0    0   628   121 1607 16013 17 11 44 28  0
 0  1  12040 116140 2087416 1560844    0    0   643    29 1429 17052 17 12 45 26  0

Il valore di memoria active corrisponde al valore consumed che si può osservare in vmware hypervisor guardando le proprietà della singola macchina.

La formattazione che fornisce il comando non è la migliore ma i dati poi dovrebbero essere analizzati da uno strumento di log quindi non dovrebbe influenzarci troppo

Copiamo un po' di roba dal man di vmstat

Prima colonna processi

Procs
r: The number of processes waiting for run time.
b: The number of processes in uninterruptible sleep.

Memory dovrebbero essere valori in kilobyte
swpd: the amount of virtual memory used.
free: the amount of idle memory.
buff: the amount of memory used as buffers.
cache: the amount of memory used as cache.
inact: the amount of inactive memory. (-a option)
active: the amount of active memory. (-a option)

quindi nel log sopra torna parecchio che ci siano 2 GigaByte di memoria inactive cioè non usata, perchè il sysdat si alloca tutto per se anche se non lo usa e 1,5 GB di memoria usata active

Swap
si: Amount of memory swapped in from disk (/s).
so: Amount of memory swapped to disk (/s).

non abbiamo memoria in swap su disco

IO
bi: Blocks received from a block device (blocks/s).
bo: Blocks sent to a block device (blocks/s).

il numero di blocchi in scrittura e lettura c'è ma non è elevatissimo

System
in: The number of interrupts per second, including the clock.
cs: The number of context switches per second.

CPU
These are percentages of total CPU time.
us: Time spent running non-kernel code. (user time, including nice time)
sy: Time spent running kernel code. (system time)
id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.
wa: Time spent waiting for IO. Prior to Linux 2.5.41, included in idle.
st: Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown.

Le percentuali della cpu indicano un basso consumo dei processori , perchè il totale 100% è quasi interamente tempo di idle e waiting for i/o

Opzioni interessanti:

The -a switch displays active/inactive memory, given a 2.5.41 kernel or better.

The -f switch displays the number of forks since boot. This includes the fork, vfork, and clone system calls, and is equivalent to the
total number of tasks created. Each process is represented by one or more tasks, depending on thread usage. This display does not repeat.

The -m displays slabinfo.

The -n switch causes the header to be displayed only once rather than periodically.

The -s switch displays a table of various event counters and memory statistics. This display does not repeat.

delay is the delay between updates in seconds. If no delay is specified, only one report is printed with the average values since boot.

count is the number of updates. If no count is specified and delay is defined, count defaults to infinity.

The -d reports disk statistics (2.5.70 or above required)

The -D reports some summary statistics about disk activity.

The -p followed by some partition name for detailed statistics (2.5.70 or above required)

The -S followed by k or K or m or M switches outputs between 1000, 1024, 1000000, or 1048576 bytes

The -V switch results in displaying version information.

Monitor dei processi

Comando utile

ps h -e -o %cpu,pid,user,state,start,time,etime,%mem,cmd|sort -rn | grep runcbl

In questo modo si ottengono delle righe tipo queste

0.7 23984 sysdat S 11:21:51 00:02:54 06:44:29 0.4 /bin/runcbl
0.4 8288 sysdat S 17:36:16 00:00:08 30:04 0.1 /bin/runcbl
0.3 8327 sysdat Z 17:37:31 00:00:06 28:49 0.0 [runcbl] <defunct>

Il sort -rn dice di valutare i numeri con l'opzione -n e invece di visualizzarli in ordine inverso con l'opzione -r reverse. Perchè ovviamente ci interessano i numeri più alti all'inizio. Chi consuma più cpu più ram ecc.

Significato dello state preso dal manuale di ps

PROCESS STATE CODES
Here are the different values that the s, stat and state output
specifiers (header "STAT" or "S") will display to describe the state of
a process.
D Uninterruptible sleep (usually IO)
R Running or runnable (on run queue)
S Interruptible sleep (waiting for an event to complete)
T Stopped, either by a job control signal or because it is being
traced.
W paging (not valid since the 2.6.xx kernel)
X dead (should never be seen)
Z Defunct ("zombie") process, terminated but not reaped by its
parent.

Significato di time o cputime

time TIME cumulative CPU time, "[dd-]hh:mm:ss" format. (alias cputime).

Significato etime

etime elapsed time (tempo trascorso) since the process was started, in the form

New Relic

the time of loading a page is divided in 5 :

  • request queuing, time spent web server and application code
  • web applicatio, the time spent in the application code
  • network , the time spent to round trip time to internet
  • dom processing, the time spent to web browser to parse and interpreting html
  • page rendering , the time spent to display html, load images , execute java script

new relic can monitor all this time using javascript embedded in code to understand what is the problem, and can do this separatery for request of every part of the world

Salvo diversa indicazione, il contenuto di questa pagina è sotto licenza Creative Commons Attribution-ShareAlike 3.0 License