Friday, December 16, 2011

Tmux swap panes

Category: tmux — SkyHi @ Friday, December 16, 2011

SESSION MANAGEMENT

Sessions are useful for completely separating work environments. I have a ‘Work’ session and a ‘Play’ session; in ‘Work’, I keep everything open that I need during my day-to-day development, while in ‘Play’, I keep open current open-source gems or other work I hack on at home.

tmux new -s session_name: creates a new tmux session named session_name
tmux attach -t session_name: attaches to an existing tmux session named session_name
tmux switch -t session_name: switches to an existing session named session_name
tmux list-sessions: lists existing tmux sessions
tmux detach (prefix + d): detach the currently attached session

WINDOWS

Tmux has a tabbed interface, but it calls its tabs “Windows”. To stay organized, I rename all the windows I use; if I’m hacking on a gem, I’ll name the window that gem’s name. The same thing goes for client applications. That way, I can recognize windows by context and not what application it’s running.

tmux new-window (prefix + c): create a new window
tmux select-window -t :0-9 (prefix + 0-9): move to the window based on index
tmux rename-window (prefix + ,): rename the current window

PANES

Panes take my development time from bland to awesome. They’re the reason I was able to uninstall MacVim and develop solely in iTerm2. I don’t have to switch applications to switch contexts (editing, reading logs, IRB, etc.) - everything I do, I do in a terminal now. People argue that OS X’s Cmd+Tab is just as fast, but I don’t think so.

tmux split-window (prefix + "): splits the window into two vertical panes
tmux split-window -h (prefix + %): splits the window into two horizontal panes
tmux swap-pane -[UDLR] (prefix + { or }): swaps pane with another in the specified direction
tmux select-pane -[UDLR]: selects the next pane in the specified direction
tmux select-pane -t :.+: selects the next pane in numerical order

HELPFUL TMUX COMMANDS

tmux list-keys: lists out every bound key and the tmux command it runs
tmux list-commands: lists out every tmux command and its arguments
tmux info: lists out every session, window, pane, its pid, etc.
tmux source-file ~/.tmux.conf: reloads the current tmux configuration (based on a default tmux config)

MUST-HAVES

These are some of my must-haves in my tmux config:

# remap prefix to Control + a
set -g prefix C-a
unbind C-b
bind C-a send-prefix

# force a reload of the config file
unbind r
bind r source-file ~/.tmux.conf

# quick pane cycling
unbind ^A
bind ^A select-pane -t :.+

WORKFLOW

During the day, I’ll work on one or two Rails apps, work on my dotfiles, run irssi, and maybe run vim in another window to take notes for myself. As I mentioned, I run all of this inside one tmux session (named work) and switch between the different windows throughout the day.

When I’m working on any Ruby work specifically, I’ll have a 75%/25% vertical split for vim and a terminal so I can run tests, interact with git, and code. If I run tests or ‘git diff’ and want to see more output than the 25% allots me, I’ll use tmux to swap the panes and then move into copy mode to see whatever I need to see.

Finally, I run iTerm2 in full-screen mode. Switching between OS X apps for an editor and a terminal is for chumps!

REFERENCES

http://robots.thoughtbot.com/post/2641409235/a-tmux-crash-course
http://myhumblecorner.wordpress.com/2011/08/30/screen-to-tmux-a-humble-quick-start-guide/

Nagios Icinga related

Category: Nagios Icinga — SkyHi @ Friday, December 16, 2011

icinga on Ubuntu bash install script

http://blog.kyodium.net/search/label/icinga

Related Articles
http://www.kernelhardware.org/category/nagios/
http://www.thegeekstuff.com/tag/nagios/

Nagios Email Notification Frequency or Interval

Category: Nagios Icinga — SkyHi @ Friday, December 16, 2011

Question:

How do I configure how often Nagios emails notifications about problems.

Solution:

The notification interval setting for service and hosts allows you to configure when email notifications get sent by Nagios

Nagios Notifications

Before Nagios sends out a notification the program runs a series of filters.

1. Are notifications enabled ?

2. Scheduled Downtime for Host/Service ?

3. Is service Flapping ?

4. Notification Period (Is Nagios allowed to send notifications at this time ?)

Some techs dont want their cellphone pinging at 4 am with a new email alert

5. Notification Interval ( check and make sure the time that has passed since the last notification went out either meets or exceeds )

Nagios Notification Interval

The notification interval filter requires two previous check to be met first.

1. A notification was already sent out about the problem

2. the host/service has remained in an not OK state since the last notification went out

Configure Nagios Notification Interval Setting

The setting for notification interval can be defined within the host definition section of the configuration files.

The easiest way to modify this setting is to modify the main generic-host template “template.cfg” found in

# cd /user/local/nagios/etc/objects/

Edit the file “template.cfg”

# vi /user/local/nagios/etc/objects/template.cfg

Find the line (** The below line will show for each type of host defined in the template **)

define host {

notification_interval     120       ; Resend notification every 2 hours

}

Change “120″ to match how minutes you want to wait before receiving additional notifications

Save your changes

Next, verify your Nagios configuration files and restart Nagios.

# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

# service nagios restart

REFERENCES
http://www.kernelhardware.org/nagios-email-notification-frequency-or-interval/

Thursday, December 15, 2011

How should I run fsck on a Linux file system

Category: Hard Drive, Hardware — SkyHi @ Thursday, December 15, 2011

Force fsck on boot using /forcefsck

By creating /forcefsck file you will force the Linux system (or rc scripts) to perform a full file system check.

Login as the root:
$ su -
Change directory to root (/) directory:
# cd /
Create a file called forcefsck:
# touch /forcefsckNow reboot the system:
# reboot

Scenario / Question:

I need to check file system for errors using fsck. Can I run fsck on a mounted file system ?

Solution / Answer:

Running fsck on a mounted file system can result in data corruption. The two options are:

1) Change the running state of the system to single user mode and unmount the file system

What if you need to run fsck on the root / file system ?

2) Boot the computer into Rescue Mode using the installation CD

1) Single User Mode and umount the file system

Issue command to change run level and umount the /home file system that is mounted on /dev/sda2

# init 1
# umount /home

Run fsck:

# fsck /dev/sda2

2) Rescue Mode using installation CD ( to run fsck on root /)

Insert the Installation CD into the drive and reboot your system:

# shutdown -r now

After booting from the Installation CD and presented with the installation command prompt type:

linux rescue nomount

Once you are at the system command prompt you need to run mknod. Because we started Rescue Mode with the “nomount” option, no file systems were initialized and no device files were created. If we try to run fsck on a file system it will fail. We need to use mknod to create the block or character special file.

To use mknod we need to know the Minor and Major numbers of the device.

# ls -l /dev/sda
8 0
# ls -l /dev/sda2
8 2

# mknod /dev/sda b 8 0
# mknod /dev/sda2 b 8 2

Run fsck and force the check and attempt to automatically repair:

-y — cause the fs-specific fsck to always attempt to fix any detected filesystem corruption automatically.

-f — force a check even if reported in a clean state

-v — Produce verbose output, including all file system-specific commands that are executed.

# fsck -yvf /dev/sda2

LVM Partitions

In order to be able to run fsck on lvm partitions we need to find the pv’s, vg’s, lv’s and activate them.

# lvm pvscan
# lvm vgscan
# lvm lvchange -ay /dev/VolGroup00/LogVol_home
# lvm lvscan

# fsck -yfv /dev/VolGroup00/LogVol_home

LUKS Partition

In order to be able to access an encrypted LUKS partition user cryptsetup.

cryptsetup luksOpen

– is the device path

– is the name of the unencrypted mount that can be accessed by /dev/mapper/

# cryptsetup luksOpen /dev/VolGroup00/LogVol_home home
# Ener LUKS passphrase for /dev/VolGroup00/LogVol_home
# fsck -yvf /dev/mapper/home

REFERENCES

http://www.cyberciti.biz/faq/linux-force-fsck-on-the-next-reboot-or-boot-sequence/

http://www.kernelhardware.org/how-should-run-fsck-linux-file-system/
http://www.centos.org/docs/5/html/Installation_Guide-en-US/s1-rescuemode-boot.html
http://smartmontools.sourceforge.net/badblockhowto.html

Smartd Error: Currently unreadable (pending) sectors

Category: Hardware, smartd — SkyHi @ Thursday, December 15, 2011

I am encountering following error in /var/log/messages:

Aug 15 03:55:42 hostname smartd[2366]: Device: /dev/sda, 1 Currently unreadable (pending) sectors

Which cause the / partition to be mounted as read-only. The server is accessible anyway but you cant do anything much inside. Lets troubleshoot this.

Collecting Information/Troubleshooting

I see read-only filesystem mounted when creating a test file in /root directory:

$ touch /root/testfile
touch: cannot touch `/root/testfile': Read-only file system

What is SMART daemon (smartd)?

Self-Monitoring, Analysis and Reporting Technology (SMART) system built into many ATA-3 and later ATA, IDE and SCSI-3 hard drives. The purpose of SMART is to monitor the reliability of the hard drive and predict drive failures, and to carry out different types of drive self-tests. We will use smartctl command to help us find out what is wrong with the disk.

Lets check the overall health of disk /dev/sda:

$ smartctl -H /dev/sda
smartctl version 5.38 [i686-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
 
 === START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

It passed. But it just general information only. We need to go deeper by do self-test to the disk:

$ smartctl -q errorsonly -H -l selftest -l error /dev/sda
ATA Error Count: 2
Error 2 occurred at disk power-on lifetime: 36795 hours (1533 days + 3 hours)
Error 1 occurred at disk power-on lifetime: 31542 hours (1314 days + 6 hours)
 
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       60%     39255         -

When I Google up the error above, it seems like the hard disk might have hardware problem. FSCK only might not helping much since it only fix logical error in file system, not the hardware error.

Errors reported by SMARTD is related to power-on lifetime attributes which explain as below (reference):

Count of hours in power-on state. The raw value of this attribute shows total count of hours (or minutes, or seconds, depending on manufacturer) in power-on state. A decrease of this attribute value to the critical level (threshold) indicates a decrease of the MTBF (Mean Time Between Failures).
However, in reality, even if the MTBF value falls to zero, it does not mean that the MTBF resource is completely exhausted and the drive will not function normally.

Backup

Since the hard disk is in read-only mode, we better do backup before proceed with any problem solving process. In this case, SCP to another server is good idea because we cannot write to the local disk at this moment. For me, “home” partition is the most important folder need to be saved:

$ scp -r /home user1@remoteserver:/home/user1/home_backup

Problem Solving Process

1. Remount the / partition:

$ mount -n -o remount /
mount: block device /dev/sda2 is write-protected, mounting read-only

2. Run e2fsck command to check ext3 file system online:

$ e2fsck /dev/sda2
e2fsck 1.39 (29-May-2006)
/: recovering journal
Clearing orphaned inode 31672817 (uid=0, gid=0, mode=0100755, size=157913)
Clearing orphaned inode 31672803 (uid=0, gid=0, mode=0100755, size=3532999)
Clearing orphaned inode 31666625 (uid=0, gid=0, mode=0100755, size=150604)
Clearing orphaned inode 31666619 (uid=0, gid=0, mode=0100755, size=383872)
Clearing orphaned inode 27885882 (uid=0, gid=0, mode=0100755, size=1011760)
Clearing orphaned inode 31666617 (uid=0, gid=0, mode=0100755, size=1141532)
Clearing orphaned inode 31665420 (uid=0, gid=0, mode=0100755, size=398180)
Clearing orphaned inode 31665416 (uid=0, gid=0, mode=0100755, size=71852)
Clearing orphaned inode 31671503 (uid=0, gid=0, mode=0100755, size=1250176)
/: clean, 80179/38273024 files, 2990728/38258797 blocks

Try remounting again the partition like step 1 but same error occurred. Proceed to next step.

3. Run full file system check using FSCK via rescue environment:

$ fsck -f -y /dev/sda2

Even the box remount correctly after that, the smartd status still haunting me up. This has force me to make final decision as my next step.

4. To avoid any sudden breakdown (since the disk already run more than 1000 days), I decided to replace the hard disk and re-install the box. Its better for me to do this as part of my maintenance task so I will not worrying much about ‘urgent’ maintenance when it breakdown during weekend or sleep time!

REFERENCES
http://blog.secaserver.com/2011/08/smartd-error-1-unreadable-pending-sectors/

Command Center