CloudWatch INSUFFICIENT_DATA for Linux System Metric

I recently had to recreate images for our production systems on EC2 because they didn’t have ephemeral storage that we require to keep our temporary tcp dumps. Considering that they are EC2 instances, it was quite easy.

We use mon-get-instance-stats.pl to monitor system metrics such as memory utilization and disk space.

Naturally, I copied alarms from the old instances and just replaced the InstanceId with the new ones. However, I was baffled to see CloudWatch complaining that the alarms has INSUFFICIENT_DATA. Attempting to verify, mon-get-instance-stats.pl --verify showed the wrong InstanceId.

It wasn’t after I ransacked the whole filesystem I realized that the Perl scripts are caching information in /var/tmp/aws-mon. Remove (or move) that directory and all is well again.

I hope this saves someone some time.

DD-WRT: OpenVPN Server Using Certificates

GUI confuses me sometimes, so I prefer to make configurations in text files. For DD-WRT, OpenVPN server is available in OpenVPN, OpenVPN Small, Big, Mega, and Giga builds: K2.6 Build Features. Since I have never used any router with USB storage capabilities, I can’t be sure but I think OpenVPN can be installed using ipkg as well.

For this post I am going to assume you’re an OS X user, but Windows procedures shouldn’t be too different.

1. Generating certificates and keys

  1. Get Easy-RSA. You can either clone the git repository or download the package as zip. Navigate to the folder where you downloaded/cloned Easy-RSA and get into the directory easy-rsa/2.0.
  2. Edit the file vars. I’m showing the variables that you might want to change. Take note of the KEY_SIZE variable. If you’re paranoid like me, leave it at 2048. It takes longer to generate DH parms but not that long.
    # Increase this to 2048 if you
    # are paranoid.  This will slow
    # down TLS negotiation performance
    # as well as the one-time DH parms
    # generation process.
    export KEY_SIZE=2048
     
    # In how many days should the root CA key expire?
    export CA_EXPIRE=3650
     
    # In how many days should certificates expire?
    export KEY_EXPIRE=3650
     
    # These are the default values for fields
    # which will be placed in the certificate.
    # Don't leave any of these fields blank.
    export KEY_COUNTRY="MY"
    export KEY_PROVINCE="SELANGOR"
    export KEY_CITY="Puchong"
    export KEY_ORG="AdyRomantika"
    export KEY_EMAIL="[email protected]"
    export KEY_OU="RomantikaName"
     
    # X509 Subject Field
    export KEY_NAME="MYKEY1"
  3. Import the variables into the current shell:
    $ source vars
  4. Clean existing keys if any (WARNING: This deletes all existing certificates and keys)
    $ ./clean-all
  5. Generate server certificates. The script will still ask for parameters you entered in vars so just press ENTER if you’re satisfied
    • This will produce 2 files: ca.key and ca.crt
    $ ./build-ca
  6. Generate Diffie Hellman parameters
    • This will produce the file: dh{n}.pem where {n} is the key size specified in the vars file.
    $ ./build-dh
  7. Generate key for the server.
    • When asked for a password, just press ENTER otherwise the key password will be asked each time service is being brought up.
    • When asked whether to sign the certificate, say Yes.
    • This will produce 3 files: server.crt, server.csr, server.key
    $ ./build-key-server server1
  8. Generate key for the clients. This step can be repeated in the future for more clients as needed.
    • When asked for a password, you can enter a password so that when connecting to the service, the key password will be asked. I recommend this to make it more secure.
    • When asked whether to sign the certificate, say Yes.
    • This will produce 3 files: client1.crt, client1.csr, client1.key
    $ ./build-key client1

Continue reading DD-WRT: OpenVPN Server Using Certificates

CrashPlan 3.5.3 Headless Upgrade

A headless installation of CrashPlan will fail when it tries to update itself.

This short post assumes that you already have it setup and successfully running before, and is targeted only to help you save some time by identifying important files to copy.

Running the installer again will also work, but we actually spend more time to fix the scripts and the identity file might get overwritten causing more time to figure out what happened.

So here goes. This is how we extract the tar archive and the cpio archive within it.

# CrashPlan_3.5.3_Linux.tgz
# cd CrashPlan-install
# cat CrashPlan_3.5.3.cpi | gzip -dc - | cpio -i --no-preserve-owner

Changed files for 3.4.1 to 3.5.3 (thanks to rsync) are:

lang/txt.properties
lang/txt_sv.properties
lang/txt_th.properties
lang/txt_tr.properties
lang/txt_zh.properties
lib/com.backup42.desktop.jar
lib/com.jniwrapper.jniwrap.jar
lib/com.jniwrapper.winpack.jar

All I did was replace those files, and my CrashPlan installation is working fine.

If you actually arrive here to find information on installing for the first time, this post can help you if you’re using a Dlink DNS-32X series. Follow it from start to end (with some adaptation to the paths) and you’ll be fine.

However, you might have to change paths and also do extra steps to get it working. At one point, CrashPlan will run fine but you’ll see that it’s not uploading files.

This post can help you troubleshoot the Java issues by replacing libraries.

From the top of my head I remember having to insert a new library with the correct architecture inside jna-3.2.5.jar, replace libmd5.so, and replace libjtux.so. I also had to link /ffp/usr/local/crashplan/libffi.so.5 to a location accessible by the system loader.

Good luck!

Leverage Browser Caching

In the previous post I wrote about enabling compression for your pages so that they would load faster to the visitor. Today I’m going to write about how you can make use of browser caching to save some bandwidth.

Some people told me that their ISP or hosting provider requested that they upgrade the hosting plan or subscribe for more bandwidth. Since this site doesn’t have that much traffic, I wouldn’t know.

However recently I was able to help on a website which has a lot of visitors compared to this site. Around 14-18 visitors per minute on a working day and the bandwidth usage was very high, more than a gigabyte per day.

For the website, I saw that there were many requests for images (photos). The images aren’t that big anyway, around 100KB each but the amount of request made it significant.

Armed with knowledge of mod_expires, I added the following clauses to .htaccess while hoping that the server has the module installed. The following configuration is minimal, and Google Pagespeed actually suggests for 1 week.

<ifmodule mod_expires.c>
        ExpiresActive On
        ExpiresByType image/gif "access plus 2 hours"
        ExpiresByType image/png "access plus 2 hours"
        ExpiresByType image/jpg "access plus 2 hours"
        ExpiresByType image/jpeg "access plus 2 hours"
        ExpiresByType text/css "access plus 2 hours"
        ExpiresByType application/javascript "access plus 2 hours"
        ExpiresByType application/x-javascript "access plus 2 hours"
</ifmodule>

Although I know why Google Analytics set its expiry to 2 hours, it’s kind of amusing since the suggestion comes from another Google product. Oh well I am allowed to be amused right?

So let’s get to the results. Here are the bandwidth graphs from both days. I enabled mod_expires at around 6PM on 5 January 2012.

We can’t really see the difference by looking at the graphs. Google Analytics shows that there are at least 200 more visits on 6 January 2012. The numbers? Here you go:

At least 400MB were saved by this technique. You can actually put specific settings for each folder in your website. For example 2 hours is nice for cosmetic images which may need to be changed frequently but not for photos. For example if you run a photography website, you can even make your photos to expire in 1 year!

What mod_expires does is actually telling the browser that the resource (images) will expire on a specific date. It’s flexible enough to set the date from the access time. Here is the link to the official manual page for mod_expires.

Please be careful to note that this is not a quick solution for the lazy. You must think hard enough to set the proper amount of time before the images expire otherwise normal users will not see your changes or updates to the image until the cache on their browsers expire!

Good luck!

Error Compiling djbdns and daemontools

While attempting to compile djbdns 1.05 and daemontools 0.76 on a CentOS 5.5 I received the error:

/usr/bin/ld: errno: TLS definition in /lib/libc.so.6 section .tbss mismatches non-TLS reference in envdir.o

The problem can be eliminated by adding:

-include /usr/include/errno.h

In conf-cc files for each tarball. Don’t forget to install gcc first, if you have a basic installation.

By the way, please remember to follow the installation instruction for daemontools exactly as described or you’ll end up with the software somewhere undesirable. Well, you can change /package to be elsewhere. I stupidly did it on /root as a test so the svcscanboot process was unable to execute programs in the /root directory. They run as unprivileged users.

Although these software felt like really old-school to me, they have very small memory footprint and runs very fast. If you’re also looking into DNS, consider PowerDNS too, as it has very good statistical capabilities.

Do I need to reboot the machine after increasing the maximum number of open files at /etc/security/limits.conf?

No, you don’t need to. This morning I struggled to convince someone that the server does not need a reboot. It was because of this: Increasing the number of file handles on Linux workstations.

ulimit – Provides control over the resources available to the shell and to processes started by it, on systems that allow such control.

limits.conf is configuration file for the pam_limits module

It takes effects immediately upon re-login. It’s hard to explain things that only you understand internally. I wish I have a formal Red Hat training so that I can explain better.

In the end I just rebooted the system so that other person who thinks (he/she) knows everything will be satisfied. Now (he/she) is.

Those struggling to get some proof you can probably forward this url which should have better reputation than this blog.

Setting DD-WRT Cron Job Through Command Line

I managed to get OpenVPN running on my DD-WRT v2.2 router, with the instructions from the wiki.

However after a few reboot tests I saw that OpenVPN died immediately after it started, with no traceable reasons.

Sep 12 00:51:10 192.168.xx.xx openvpn[3940]: TUN/TAP device tap0 opened
. . .
Sep 12 00:51:11 192.168.xx.xx openvpn[3949]: Initialization Sequence Completed

I suspect it has got to do with the fact that my ppp0 (ADSL) connection takes some time to activate.

So I thought of doing a check using cron – if OpenVPN is not running, run it.

The command I wrote was:

But the bad news is that when I enter this command in the cron box inside the Web Administration GUI the single quotes get translated into the HTML entity, and this becomes permanent in the nvram and also in /tmp/cron.d/cron_jobs. Damn.

So I thought of using the command line. Here’s what I did in the SSH shell:

At this point if you don’t want to reboot your router, enter these into /tmp/cron.d/cron_jobs and restart cron using stopservice cron && startservice cron.

And I’m all set!

I hope the IT team from my company is not reading this, but I also have a vpnc daemon running on the router to connect to my company network and I do the same check as above 😉

OpenVZ On Ubuntu Or Debian

As a SysAdmin I have been using OpenVZ since it was introduced, and trust me it has not always been this easy. I used to take care of 20 physical servers with yearly replacement of about 5 machines. Since some of the servers are running different Linux distributions and different hardware it was decided that to standardize all servers, OpenVZ was to be deployed so that all of them are running Debian stable.

OpenVZ is container-based virtualization for Linux and it only separates the different guest servers in terms of resources. This differ from other implementations such as VMware, Xen, and VirtualBox where these involve hardware virtualization. Because of this, the guests called VE or VPS have the same kernel version and can only run Linux. What distribution as guest? The choice is yours.

Undoubtedly most of you have heard of Virtuozzo – it’s running OpenVZ. As a matter of fact the company that produces Virtuozzo is the one funding and supporting the development of OpenVZ.

The fact that it can run any distribution you like means that you can study and learn how to maintain different distributions. Even the littlest difference can confuse a rookie SysAdmin, for example:

  • Debian apache’s init script is distributed as /etc/init.d/apache and /etc/init.d/apache2 while in CentOS it’s called /etc/init.d/httpd
  • In Debian to change init scripts and runlevels we use update-rc.d while in CentOS we use chkconfig even though they both do the same exact thing

There are many other differences in terms of implementation that I rather not discuss here.

Click on Continue Reading if you’re interested to read more…
Continue reading OpenVZ On Ubuntu Or Debian

OSCC: The Silent Mirror

All hyped out about sharing Linux knowledge with friends especially dirn, I wanted to download Ubuntu for my own use mainly because I am a strict Debian user. Browsing the mirror list in Ubuntu official site I am disappointed by the speed of most of the mirrors I selected, and the fastest I can get is the ETA of 4 hours.

Then a bell rang in my head and I went to look for OSCC. This mirror is the closest mirror I can get using my ADSL. The problem with this mirror is that it sometimes have old files especially for the Debian and CentOS repositories. So it always become my last choice to look for files. I don’t blame them as the rsync process must’ve been really slow to download files from the 1st level mirrors.

The speed is very satisfactory because the mirror is located in Cyberjaya, Malaysia as you can see below:

I have been seeking this mirror every time I feel disappointed with speed of overseas mirror as it mirrors some other projects too.

Historically in 2002 I almost became an employer of OSCC after scoring good marks in a Linux test done at DRB-Hicom, and went to an interview at OSCC. I failed to get the position because when asked “How do you change init scripts and levels in Red Hat?”, I answered “I use ln to make symbolic links from /etc/init.d to /etc/rc.{0-6}”. They said, “No, you should use chkconfig“. My answer was not incorrect being a self-taught Linux user but by-the-book users will feel otherwise. I was annoyed but I don’t hold any grudge against them. I do, however feel lucky I didn’t get the job.

PHP 5 In CentOS 4.5

Just a short sharing note, for users of CentOS 4.5 who is looking to update PHP to version 5 instead of the default 4.3.9 there is a clean and easy way to upgrade your PHP.

  1. Open up /etc/yum.repos.d/CentOS-Base.repo and look for the section centosplus:

    [centosplus]
    name=CentOS-$releasever - Plus
    mirrorlist=http://mirrorlist.centos.org/...
    #baseurl=http://mirror.centos.org/...
    gpgcheck=1
    enabled=0
    gpgkey=http://mirror.centos.org/centos/RPM-GPG-KEY-centos4
    priority=2
    protect=1

  2. Change enabled=0 to enabled=1
  3. Save the file
  4. Run yum update php*

And the rest is up to you… when it finishes restart Apache (service httpd restart) and you’ll be up and running with PHP 5.

How to check PHP version on the server?

Use rpm -qa | grep php and you’ll see the list of installed PHP packages. In this case PHP on the server has been upgraded to PHP 5.

php-pdo-5.1.6-3.el4s1.7
php-cli-5.1.6-3.el4s1.7
php-pear-1.4.11-1.el4s1.1
php-ncurses-5.1.6-3.el4s1.7
php-mbstring-5.1.6-3.el4s1.7
php-pgsql-5.1.6-3.el4s1.7
php-gd-5.1.6-3.el4s1.7
php-odbc-5.1.6-3.el4s1.7
php-common-5.1.6-3.el4s1.7
php-5.1.6-3.el4s1.7
php-snmp-5.1.6-3.el4s1.7
php-ldap-5.1.6-3.el4s1.7
php-mysql-5.1.6-3.el4s1.7
php-devel-5.1.6-3.el4s1.7
php-xmlrpc-5.1.6-3.el4s1.7
php-imap-5.1.6-3.el4s1.7
php-xml-5.1.6-3.el4s1.7

Good luck!

Compiz Fusion

Have you seen or used Windows Vista? Like the eye candies and effects that it provides? Wished that something like that is available on your other favorite *NIX OS? Wish no more, Compiz Fusion is here.

This project is a merger between Compiz and Beryl. Beryl was a fork of Compiz, where there was some history on why Beryl was separated in the first place. However that is not in our best interest now, let bygones be bygones. The merger has been announced with the new name, Compiz Fusion. Feast your eyes on this video:

Unfortunate for me, I have no machine powerful enough in terms of processing power and graphics acceleration to test it out.

More info:
www.opencompositing.org
compiz.org – down at the time of writing. Perhaps they are building the new site?
beryl-project.org – original page of the Beryl project

This proves that open source projects can also achieve what the big boys in the industry are capable of doing.

Iceweasel

iceweasel_icon.png

Have you ever heard of the browser named Iceweasel? Of course not, if you’re not using Debian. One of my machine at home is running a Debian Etch installation (my torrent box), and a few days ago I ran apt-get upgrade to upgrade the packages.

I was quite annoyed at first, as it’s trying to install a new package (not to mention the huge size) but I let it anyway. Earlier today I launched the web browser in my Xfce and Iceweasel was loading…

Iceweasel is a rebranded Firefox, and exist in 2 independent projects: one by Gnuzilla, and the other one by Debian.

Iceweasel was created since Mozilla demanded that Debian complies to some of the policies and terms that Debian finds unacceptable.

The other products are also re branded. Thunderbird became Icedove and Seamonkey became Iceape.

The current release of Gnuzilla IceWeasel is based on the 1.5.0.7 version of Mozilla Firefox, while the current version of Debian Iceweasel is based on the 2.0.0.1 release of Firefox.

deer_park_globe.png

The most obvious reason for this name change was that Mozilla demanded that Debian retain all branding from Mozilla if they were to continue using the Firefox name. However, because of the Debian Free Software Guidelines that said no non-free artwork and plugins are allowed, they were unable to comply. This generic, non-branded icon on the right was used for Firefox in Debian.

What I can see so far is that only the name changed. All of my plugins can still be used and upgraded normally. As for my active machines, however I always use the extracted package from Mozilla so there would be no way I would realize about the existence of Iceweasel.

Iceweasel. Cute name?

ALSA Support in Skype

Finally, Skype has released a beta version with alsa support: 1.3.0.30_API

Skype Beta with ALSA

Hopefully all the troubles with “Problem with Sound Device” will be history. However for users with very old kernels, and prefer to use OSS, the option is still there. The problem with Skype utilizing OSS on modern systems is that it keeps on failing to close /dev/dsp after using it. The only way to make it work again is to restart Skype. It’s a hassle and a headache. Believe me, I used to be a SysAdmin (until a week ago) for a 99.99% Linux desktop company – with Skype as one of the primary communication tools.

Storage Emergency

My 17-days old Seagate Barracuda 7200.9 300GB disk was giving a lot of error two days ago. There were a bunch of errors in my syslog:

ata1: translated ATA stat/err 0x51/40 to SCSI SK/ASC/ASCQ 0x3/11/04
ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: error=0x40 { UncorrectableError }
ata1: translated ATA stat/err 0x51/40 to SCSI SK/ASC/ASCQ 0x3/11/04
ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: error=0x40 { UncorrectableError }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sda: Current: sense key: Medium Error
Additional sense: Unrecovered read error - auto reallocate failed
end_request: I/O error, dev sda, sector 212833665
Buffer I/O error on device sda1, logical block 106416801
ata1: translated ATA stat/err 0x51/40 to SCSI SK/ASC/ASCQ 0x3/11/04
ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: error=0x40 { UncorrectableError }
ata1: translated ATA stat/err 0x51/40 to SCSI SK/ASC/ASCQ 0x3/11/04

Yes, that’s right. After 17 days so I can’t get a one-to-one replacement from the shop.

SMARTD Logs:

Error 6892 occurred at disk power-on lifetime: 427 hours (17 days + 19 hours)
When the command that caused the error occurred, the device was active or idle.
 
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 b4 95 af e0  Error: UNC at LBA = 0x00af95b4 = 11507124
 
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
-- -- -- -- -- -- -- --  ----------------  --------------------
25 00 d0 b0 95 af e0 00      01:47:04.861  READ DMA EXT
25 00 d0 b0 95 af e0 00      01:47:03.048  READ DMA EXT
25 00 d0 b0 95 af e0 00      01:47:01.243  READ DMA EXT
25 00 d0 b0 95 af e0 00      01:46:59.447  READ DMA EXT
25 00 d8 a8 95 af e0 00      01:46:57.650  READ DMA EXT
 
Error 6891 occurred at disk power-on lifetime: 427 hours (17 days + 19 hours)
When the command that caused the error occurred, the device was active or idle.
 
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 b4 95 af e0  Error: UNC at LBA = 0x00af95b4 = 11507124
 
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
-- -- -- -- -- -- -- --  ----------------  --------------------
25 00 d0 b0 95 af e0 00      01:47:04.861  READ DMA EXT
25 00 d0 b0 95 af e0 00      01:47:03.048  READ DMA EXT
25 00 d0 b0 95 af e0 00      01:47:01.243  READ DMA EXT
25 00 d8 a8 95 af e0 00      01:46:59.447  READ DMA EXT
25 00 d8 a8 95 af e0 00      01:46:57.650  READ DMA EXT
 
Error 6890 occurred at disk power-on lifetime: 427 hours (17 days + 19 hours)
When the command that caused the error occurred, the device was active or idle.
 
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 b4 95 af e0  Error: UNC at LBA = 0x00af95b4 = 11507124
 
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
-- -- -- -- -- -- -- --  ----------------  --------------------
25 00 d0 b0 95 af e0 00      01:47:04.861  READ DMA EXT
25 00 d0 b0 95 af e0 00      01:47:03.048  READ DMA EXT
25 00 d8 a8 95 af e0 00      01:47:01.243  READ DMA EXT
25 00 d8 a8 95 af e0 00      01:46:59.447  READ DMA EXT
25 00 d8 a8 95 af e0 00      01:46:57.650  READ DMA EXT

Here’s the disk label:

Seagate Disk 300GB

I blamed the disk. My friend Azidin had a different idea. He said that it might be the SATA controller card that I installed on my computer that’s causing the errors. I didn’t believe him.

That night I tested the disk with Azidin. There were a lot of bad sectors!!!!! But still, I refused to blame the SATA controller card.

Seatools

After work on 23 June, I immediately rushed to the shop, hoping that they would give me some help, or keep my disk for checking during the weekend but they (C-Zone) rejected me saying that their service center is closed and asked me to come the next day. I was disappointed. But I didn’t leave Low Yat plaza before buying a 200GB Maxtor disk from Startec, just in case if it’ll take months to get my disk repaired.

Maxtor 200GB

Back home, I installed the disk onto the same SATA controller card. The next day, I received these from my syslog:

end_request: I/O error, dev sda, sector 132826840
Buffer I/O error on device sda2, logical block 8210
lost page write due to I/O error on sda2
ATA: abnormal status 0xD0 on port 0x9807
ATA: abnormal status 0xD0 on port 0x9807
ATA: abnormal status 0xD0 on port 0x9807
ReiserFS: sda2: warning: journal-837: IO error during journal replay
REISERFS: abort (device sda2): Write error while updating journal header in flush_journal_list
REISERFS: Aborting journal for filesystem on sda2
ata1: command 0x25 timeout, stat 0xd0 host_stat 0x1
ata1: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata1: status=0xd0 { Busy }
sd 0:0:0:0: SCSI error: return code = 0x8000002
sda: Current: sense key: Aborted Command
Additional sense: Scsi parity error
end_request: I/O error, dev sda, sector 133810704

I started to believe that the controller card might be causing the problems. What are the odds that all my disks end up producing errors like these? I decided to buy a new motherboard with a built-in SATA controller, without spending too much. Also, I have an unused socket 478 Celeron, so after some research, I decided to get an ASUS P4P800-MX that’s still available in Cycom. The very same night, I ran Seagate Desktop on my older disk – low-level format (zero fill). It took hours but totally worth it. This morning when the process finished I ran another surface scan of the 300GB disk and all bad sectors are gone. Pheww! I decided not to send it to the shop, but continue using it with caution. It carries a 5-year warranty anyway.

Seatools 2 all ok

So today I went and bought a P4P800-MX from Cycom, with two sticks of 512MB DDR (to utilize dual-channel memory bus). I have just finished installing the 300GB Seagate disk plus the 200GB Maxtor disk on the new motherboard. Everything looks good.

The cuplrit? Here it is:

Sata controller

I don’t think it’s the chip. Maybe the card is defective. I bought it at Sri, in a plastic package (they hang such packages on a wall like in a supermarket). I thought of returning it, but I’m too tired to argue with the shop.

Oh well. I am all happy now. Thanks to Azidin for his help, and of course to my dear wife for her understanding of this matter.