When a program forks and the child finishes before the parent, the kernel still keeps some of its information about the child in case the parent might need it – for example, the parent may need to check the child’s exit status. To be able to get this information, the parent calls `wait()’; In the interval between the child terminating and the parent calling `wait()’, the child is said to be a `zombie’ (If you do `ps’, the child will have a `Z’ in its status field to indicate this.)
Category Archives: Linux Operating System
Linux OS issues, Different flavors, Patches.. etc
Useful tools for techies especially for developers and sys admin
There are many situation in programming and testing where we can use these tools to get our work done faster and effectively.
1) Firebug Download
Very interesting tool. Can not live without it if you really have to do Javascript and CSS testing. Not only that it also helps in request tracking and cookie management.
2) FireCookie download
Another interesting FireFox add on for cookie management. You can change the cookie on the fly and add new cookie whenever required. Very use full if your site is using cookie intensively.
3) YSlow Download
Add on to firefox, very use full if you have to asses performance of your site. specially recommendation and site score by Yslow is use full to improve overall performance of site.
4) Web Developer Download
Add on to firefox. You can do ton of things from debugging java script to changing and testing css, HTML on the fly with web developer tool. Have to have tool for HTML developer.
5) HTTP Watch Download
Very use full tool for both IE and firefox for inspecting http traffic on site. Very use full to debug some performance issue. Can watch AJAX request and response and debug it. You cn also use Net tab in firebug for same perpose though. But some time I feel Net tab doesn’t work, HTTP Watch is more relaiable.
6) Fiddler Download
ooooo … debugging traffic and web issue in IE is really difficult. Fiddler is one of those tool that can help to watch traffic on site easily.
7) Samurai Thread dump analyzer Download 8) JadEclipse Download JAD Executable Download 9) Jmeter Download 10) HTML Parser Download 11) Regular Expression check Link 12) Key Notes Download 13) Java Code analyzer tool Download Download for eclipse 14) Message Post tool (Wget) Download 15) Visual VM (Java Profiling tool) Download 16) Any Edit plugin for eclipse Download 17) Heap Dump Analyzer (MAT) Download
Very use full tool to analyze thread dump. If your site is having performance issues (100% CPU usage). You can use this tool to analyze all the waiting threads. You can take thread dump using command kill -3
Use full tool to decompile class file in eclipse. After installing JAD eclipse, go to windows -> preferences -> Jad Eclipse -> and set Path to decompiler as C:JADjad.exe
Very use full tool to do load testing. Since this tool is free you can easily do load testing on your site whenever you want. Also this tool is very easy to set up and configure.
Another Use full free java API to parse HTML. Documentation of this API is not good though with some inspection you will find this API very interesting and easy to use.
If you are using regular expression a lot, this web site will help you to create and test your regular expression. I use this link quite often to test my regex expressions.
Well, This is not any tool as such but very use full to keep your notes.
It is a very use full tool to analyze Java code performance. There are plug ins available for many IDE. Tool also tells you if you have any code issue in your code (Null pointer exception and all). Very use full to develop a quality code.
Wget is very handy massage POST tool and can be used to POST XML across applications.
Very nice and neat free Java profiling tool. For enterprise application I will even recommend YourKit Download. But for quick and free memory issue problems you can can use this tool effectively. You should have Java 6.0 for this to run.
If the JSP pages contains a lot of white spaces or tabs, it may take more time to load the page and requires more network band width. Any Edit is a nice tool to remove unnecessary spaces from the page.
Some time your application suffer with memory issues, for example out of memory error. And you don’t have any idea what is going on. There are many different reasons for out of memory error but most common is memory leak. Eclipse Memory Analyzer (MAT) is a power full to tool to analyze heap dump and narrow down the problem. Please note that you should have -XX:+HeapDumpOnOutOfMemoryError parameter set to collect heap dump. Java 1.6 also comes with a tool called jmap (memory map) to force heap dump. More information can be found here.
Delete mails from exchange server
First you will need to install fecthmail.
You need to create one hidden file with email user’s details with “.fetchmailrc” name.
poll YOUR_MAIL_SERVER_HERE.com protocol IMAP:
user YOUR_USER_ID_HERE with password YOUR_PASSWORD_HERE some text
Then to fetch the mail you will have to fire this in order to flush the mail from your exchange server.
#/usr/local/bin/fetchmail -a -K -v -F –limitflush –limit 5
Linux High IO load.. what to check for trouble shooting?
When you look at the CPU activity of your computer, one of the parameters is the iowait. This value shows how much time your CPU wastes while it is waiting for I/O operations for complete. These include disk read/write operations, network, IPC, etc. Is this behavior a problem and, if so, what causes it and how to fix it? One one of the popular Unix-related forums one “genius” wrote:
The iowait “problem” is funny. It’s like when people complain that Linux is “using all my memory”. Yeah, no shit. You should be upset if you are copying files and your computer is /not/ in 100% iowait.
In reality, 100% iowait indicates that there is a problem and in most cases – a big problem that may even lead to data loss. Essentially, there is a bottleneck somewhere in the system. Maybe one of your disks is getting ready to die; or, perhaps, the NIC firmware is having problems with the latest kernel upgrade you installed. The troubleshooting process starts with the potentially more serious possibility: bad disk.
Take a quick look at /etc/messages, /etc/dmesg, /etc/boot.log and any other system log files. You are looking for disk I/O errors, failed read/write operations, bad sectors – anything that indicates a hardware problem with a disk. If you don’t find anything, look for IRQ and disk controller errors. Also look for memory errors and kernel panics. The three most likely culprits of high iowait are: bad disk, faulty memory and network problems.
If you still see nothing relevant, it is time to test your system. If possible, kick all the users off the box, shut down Web server, database and any other user application. Log in via command line and stop XDM.
Open three shell windows: run “top” in one, “iostat -x 1? in the other and “find /etc -type f -print” in the third. Make sure you can see all three windows at the same time. This is a simple test that should generate some I/O activity on the system disk. Repeat this process for other disks. If you see iowait hovering near 100%, chance are you have a problem but we don’t know what it is yet. However, now we do know that network is probably not the cause.
deathstar:/ # iostat -x 1
Linux 2.6.5-7.201-default (deathstar) 12/20/08
avg-cpu: %user %nice %sys %iowait %idle
2.83 0.42 1.45 9.11 86.20
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
hda 40.63 66.34 27.45 6.04 936.50 581.23 468.25 290.61 45.32 2.42 72.16 2.22 7.42
hdc 0.01 0.00 0.01 0.00 0.03 0.00 0.02 0.00 4.02 0.00 1.17 1.17 0.00
sda 0.09 2.32 4.15 1.33 71.56 29.23 35.78 14.62 18.37 0.65 118.49 6.39 3.51
sdb 3.47 0.00 1.90 0.00 15.32 0.01 7.66 0.01 8.08 0.74 391.31 5.68 1.08
fd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00 0.00 45.00 45.00 0.00
deathstar:/ # top
top – 21:28:28 up 1:22, 2 users, load average: 0.09, 0.14, 0.16
Tasks: 77 total, 1 running, 76 sleeping, 0 stopped, 0 zombie
Cpu(s): 2.8% us, 1.3% sy, 0.4% ni, 86.2% id, 9.1% wa, 0.1% hi, 0.0% si
Mem: 508644k total, 503612k used, 5032k free, 34052k buffers
Swap: 1020088k total, 458980k used, 561108k free, 16012k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 16 0 640 56 28 S 0.0 0.0 0:05.14 init
2 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
3 root 5 -10 0 0 0 S 0.0 0.0 0:00.09 events/0
4 root 5 -10 0 0 0 S 0.0 0.0 0:00.00 khelper
Next step, lets stress out your CPU but not the disks. The command below will try to create an endless zip file in /dev/null. This generates no disk activity, but loads the CPU. Continue running “top” and “iostat -x 1? in the other two windows.
cat /dev/zero | bzip2 -c > /dev/null
If you see high CPU load but low iowait, we can eliminate CPU issues, IRQ conflicts, and faulty memory. Just to be on the safe side, let’s test memory anyway:
deathstar:/ # free
total used free shared buffers cached
Mem: 508644 503504 5140 0 37036 48968
-/+ buffers/cache: 417500 91144
Swap: 1020088 516196 503892
This server has 508644Kb of RAM. Use the corresponding value for the following test:
deathstar:/ # dd if=/dev/hda2 bs=508644 of=/backups/memtest count=1050
1050+0 records in
1050+0 records out
deathstar:/ # md5sum /backups/memtest ; md5sum /backups/memtest ; md5sum /backups/memtest
04762ff36b2231aac75754ab9c1a564a /backups/memtest
04762ff36b2231aac75754ab9c1a564a /backups/memtest
04762ff36b2231aac75754ab9c1a564a /backups/memtest
The three MD5 values above should be identical. If they are not – your system has a faulty RAM chip.
When you have eliminated hardware problems as possible causes of high iowait, the next step is to review firmware and drivers. You are particularly interested in disk controller firmware: unstable performance and no error messages are the signs of a firmware problem. Try really hard to remember if you made any system changes recently, especially something that required a reboot – like kernel upgrade, for example. If this is the case, roll back the upgrade or search for upgrade firmware. You should grab a copy of Sysinfo (free 30-day trial) to help you identify makes and models of your disks, controllers, etc.
While your disks and controllers may be tip-top, your may have a problem with a filesystem. Even if you see high iowait when accessing any filesystem, you should still check out the partition where /var is mounted and swap – if there is a problem, it will manifest itself regardless of what your system is doing. But here you will run into a little problem: fsck will not scan a mounted partition and you cannot unmount /var. Let’s say these are your partitions:
deathstar:/ # more /etc/fstab
/dev/hda2 / reiserfs acl,user_xattr 1 1
/dev/hda1 swap swap pri=42 0 0
You need to fsck /dev/hda2 because this is where your /var is mounted. Download KNOPPIX or Ubuntu LiveCD, boot from CD (without installing) and “fsck /dev/hda2? from there. If everything looks clean, shut down your system, take the CD out and boot normally. The next step is to check out swap. If you just run fsck on the swap partition, it will fail:
deathstar:/ # fsck /dev/hda1
fsck 1.34 (25-Jul-2003)
fsck: fsck.swap: not found
fsck: Error 2 while executing fsck.swap for /dev/hda1
You need to disable swap on /dev/hda1 before you can scan it. Before you can do this, you need to add another swap area: you cannot run without any swap space. So, to add swap on the fly, create a swap file (1Gb in this example):
deathstar:/ # dd if=/dev/zero of=/swapfile bs=1024 count=1048576
1048576+0 records in
1048576+0 records out
deathstar:/ # chmod 600 /swapfile
deathstar:/ # ls -lash /swapfile
1.1G -rw——- 1 root root 1.0G Dec 20 22:48 /swapfile
Now you can set up and activate the new swap file:
deathstar:/ # mkswap /swapfile
Setting up swapspace version 1, size = 1073737 kB
deathstar:/ # free
total used free shared buffers cached
Mem: 508644 500996 7648 0 38912 147332
-/+ buffers/cache: 314752 193892
Swap: 1020088 521784 498304
deathstar:/ # swapon /swapfile
deathstar:/ # free
total used free shared buffers cached
Mem: 508644 502232 6412 0 39400 147392
-/+ buffers/cache: 315440 193204
Swap: 2068656 521784 1546872
Now we need to deactivate the original swap partition. This operation may take a couple minutes to complete:
deathstar:/ # swapoff /dev/hda1
deathstar:/ # free
total used free shared buffers cached
Mem: 508644 501624 7020 0 31712 10416
-/+ buffers/cache: 459496 49148
Swap: 1048568 167032 881536
The next step is to create a standard filesystem on the old swap partition, so that fsck has something to scan:
deathstar:/ # mke2fs -c /dev/hda1
mke2fs 1.34 (25-Jul-2003)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
127744 inodes, 255024 blocks
12751 blocks (5.00%) reserved for the super user
First data block=0
8 block groups
32768 blocks per group, 32768 fragments per group
15968 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376
Checking for bad blocks (read-only test): done
Writing inode tables: done
Writing superblocks and filesystem accounting information: done
The previous operation already ran fsck and so, if you see no errors, you can now re-activate your original swap space and remove the temporary swap you created:
deathstar:/ # mkswap /dev/hda1
Setting up swapspace version 1, size = 1044574 kB
deathstar:/ # swapon /dev/hda1
deathstar:/ # swapoff /swapfile
deathstar:/ # rm /swapfile
deathstar:/ # free
total used free shared buffers cached
Mem: 508644 503172 5472 0 33668 9256
-/+ buffers/cache: 460248 48396
Swap: 1020088 156300 863788
Anothe command commonly used for analyzing system bottlenecks is vmstat. The following example runs vmstat five times at 2-second intervals:
deathstar:~ # vmstat -S M 2 5
procs ———–memory———- —swap– —–io—- –system– —-cpu—-
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 15 174 70 58 0 0 189 50 5 6 1 3 94 1
0 0 15 174 70 58 0 0 0 0 1005 35 4 0 96 0
0 1 15 174 70 58 0 0 0 258 1515 45 0 6 88 7
0 0 15 173 71 58 0 0 0 194 1083 24 0 1 83 16
0 0 15 173 71 58 0 0 0 0 1003 19 0 0 100 0
Explanation of vmstat columns:
(a) procs is the process-related fields are:
* r: The number of processes waiting for run time.
* b: The number of processes in uninterruptible sleep.
(b) memory is the memory-related fields are:
* swpd: the amount of virtual memory used.
* free: the amount of idle memory.
* buff: the amount of memory used as buffers.
* cache: the amount of memory used as cache.
(c) swap is swap-related fields are:
* si: Amount of memory swapped in from disk (/s).
* so: Amount of memory swapped to disk (/s).
(d) io is the I/O-related fields are:
* bi: Blocks received from a block device (blocks/s).
* bo: Blocks sent to a block device (blocks/s).
(e) system is the system-related fields are:
* in: The number of interrupts per second, including the clock.
* cs: The number of context switches per second.
(f) cpu is the CPU-related fields are:
These are percentages of total CPU time.
* us: Time spent running non-kernel code. (user time, including nice time)
* sy: Time spent running kernel code. (system time)
* id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.
* wa: Time spent waiting for IO. Prior to Linux 2.5.41, shown as zero.
If you failed to identify the cause of the iowait problem, you should consider the possibility that there is no problem: perhaps your system is handling extra load and running short on resources. Take a look at the running processes and see what’s eating up memory. Perhaps you upgraded an application and now it is using more RAM, which leads to high swapping, which leads to high disk activity, which leads to high iowait.
The solutions are simple:
1. Install more RAM
2. Move swap to another disk or – even better – move it to another disk on a separate controller.
3. Move user applications to another disk/controller and specify default log locations outside of the system disk.
– Jayesh
../../../libraries/libldap/error.c:273: ldap_parse_result: Assertion `r != ((void *)0)’ failed
If you are getting error as mentioned below while doing some operation your linux server bash shell.
../../../libraries/libldap/error.c:273: ldap_parse_result: Assertion `r != ((void *)0)’ failed
Then its due to nss-ldap software running on your server. One of the reason I found and fixed with was nscd service was down on my server restarting it fixed the issue.
Error I saw in logs were..
/var/log/messages:
Oct 28 03:01:27 HOSTNAME nscd: nss_ldap: reconnected to LDAP server ldap://domain.com/ after 1 attempt
Nov 10 02:49:58 HOSTNAME nscd: nss_ldap: reconnecting to LDAP server (sleeping 4 seconds)…
Nov 10 02:50:14 HOSTNAME nscd: nss_ldap: reconnected to LDAP server ldap://domain.com/ after 2 attempts
Jan 18 07:45:09 HOSTNAME kernel: nscd[5114]: segfault at 00002b1c735dee78 rip 00002b1b6d4fe885 rsp 000000004185c6d0 error 4
Fix :
[root@HOSTNAME webdocs]# /etc/init.d/nscd status
nscd dead but subsys locked
You have new mail in /var/spool/mail/root
[root@HOSTNAME webdocs]# /etc/init.d/nscd restart
Stopping nscd: [FAILED]
Starting nscd: [ OK ]
[root@HOSTNAME webdocs]# /etc/init.d/nscd status
nscd (pid 30292) is running…
You have new mail in /var/spool/mail/root
[root@HOSTNAME webdocs]#
strings: ‘/lib/libc.so.6’: No such file centos
If you are getting above error while installing siteminder agent then its due to glibc not installed on centos as its installed with “minimal install” option..
# ./nete-wa-6qmr5-cr035-rhas30-x86-64.bin -i console
Preparing to install…
Extracting the JRE from the installer archive…
Unpacking the JRE…
Extracting the installation resources from the installer archive…
Configuring the installer for this system’s environment…
strings: ‘/lib/libc.so.6’: No such file
Launching installer…
./nete-wa-6qmr5-cr035-rhas30-x86-64.bin: /tmp/install.dir.18984/Linux/resource/jre/bin/java: /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory
./nete-wa-6qmr5-cr035-rhas30-x86-64.bin: line 2479: /tmp/install.dir.18984/Linux/resource/jre/bin/java: Success
Fix: install yum and then “yum install glibc”
– Cheers
Restrict access for tomcat application server from IP or hosts
To restrict access to a standalone Tomcat instance by IP address<Valve className="org.apache.catalina.valves.RemoteAddrValve" allow="127.0.0.1"/>The above will restrict access to the surrounding Engine, Host, or Context element inTOMCAT_HOME/conf/server.xml. You may also specify a comma separated list of IP addresses instead of a single address. If you want to deny access to one or more IP addresses, you would do something like this:<Valve className="org.apache.catalina.valves.RemoteAddrValve" deny="127.0.0.1"/>To restrict by host name:<Valve className="org.apache.catalina.valves.RemoteHostValve" allow="yahoo.com"/>You use the same allow or deny attributes and the RemoteHostValve class instead of RemoteAddrValve.
How to catch 500 error from error logs in apache
A. Enable cgi for your apache. Add following.
1) LoadModule cgid_module modules/mod_cgid.so
2)
<Directory “/appl/apache2/cgi-bin”>
AllowOverride None
Options ExecCGI
Order allow,deny
Allow from all
</Directory>
3)
ScriptAlias /cgi-bin/ “/appl/apache2/cgi-bin/”
AddHandler cgi-script .cgi
ErrorDocument 413 /cgi-bin/error.cgi
4) Restart apache.
B. Set up the following python script to catch this error, send an email to admin and give the custome message to users.
/appl/apche/cgi-bin/error.cgi
chmod +x /appl/apache/cgi-bin/error.cgi
#!/usr/bin/python
import sys, os
SENDMAIL = “/usr/sbin/sendmail” # sendmail location
print “Content-Type: text/htmlnn”;
if (os.environ[“REDIRECT_STATUS”] == “413”) or (os.environ[“REDIRECT_STATUS”] == “500”):
stats = “<table border=1><tr><td>Variable</td><td>Value</td></tr>”
for name, value in os.environ.items():
stats += “<tr><td>%s</td><td>%s</td></tr>” % (name,value)
stats += “</table>”
sendmail_location = “/usr/sbin/sendmail” # sendmail location
p = os.popen(“%s -t” % “/usr/sbin/sendmail”, “w”)
p.write(“From: %sn” % “error-reporter@domain.com”)
p.write(“To: %sn” % “mail@domain.com”)
p.write(“Content-Type: text/htmln”)
p.write(“Subject: Error %s in accessing n” % os.environ[“REDIRECT_STATUS”])
p.write(“n”) # blank line separating headers from body
p.write(stats)
status = p.close()
”’print “<H3><center>Inconvenience Regretted. Team has been notified of this issue</center></h3>””’
cookieclearjs=”””
<script language=’JavaScript’>
var todate = new Date ( );
todate.setTime ( todate.getTime() – 100000 );
var domain_url_del = window.location.href;
var domain_Name_url_del = domain_url_del.split(“http://”);
var domain_Name_temp_del = domain_Name_url_del[1].split(“/”);
var domain_Name_del = domain_Name_temp_del[0];
var cookieList = document.cookie.split(‘;’);
for(var i=0;i < cookieList.length;i++)
{
var name = cookieList[i];
if(name.indexOf(“MyLinks”)!=-1)
{
document.cookie = ”+name+’=; path=//APPLICATION/PATH; domain=.’ + domain_Name_del + ‘; expires=’ + todate.toGMTString();
document.cookie = ”+name+’=; path=//APPLICATION/PATH; domain=.’ + domain_Name_del + ‘; expires=’ + todate.toGMTString();
}
}
</script>
“””
print cookieclearjs
print “<script language=’JavaScript’>window.location=’%s'</script>” % os.environ[“REDIRECT_SCRIPT_URI”]
else:
print “<H3><center>What you are looking for, is not here</center></h3>”
How to find the no of cpu, core and if its under HT technology CPU
Finding Physical Processors
$ grep ‘physical id’ /proc/cpuinfo | sort | uniq | wc -l
Finding Virtual Processors
$ grep ^processor /proc/cpuinfo | wc -l
Finding CPU cores
$ grep ‘cpu cores’ /proc/cpuinfo
“2” indicates the two physical processors are dual-core, resulting in 4 virtual processors.
If “1” was returned, the two physical processors are single-core.
If the processors are single-core, and the number of virtual processors is greater than the number of physical processors, the CPUs are using hyper-threading.
Performance Tools and Tuning Tips for Java Technology-Based Server Applications on the Solaris OS
Introduction
This article presents a set of tools, system settings, and tuning tips for Java server applications that run on and scale across 2 to 64 CPU Sun Enterprise servers. This information was assembled by engineers with many years of experience tuning a variety of commercial server-side Java applications on Solaris.
Analysis Tools
The table below lists the performance analysis tools covered in this article. The tools are distinguished by software layer. In addition to performance issues, many of these tools can be used to detect other types of bottlenecks.
Click on a Name or a Parameter to link to a particular topic. Many tool descriptions provide sample output, suggestions for interpreting output results, tips on improving output results, and links to related sites.
|
Solaris 8 Tools
mpstat
The mpstat utility is a useful tool to monitor CPU utilization, especially with multithreaded applications running on multiprocessor machines, which is a typical configuration for enterprise solutions.
mpstat with an argument between 5 seconds to 10 seconds will be quite non-intrusive to monitor; larger arguments, such as 60 seconds, might be suitable for certain applications. Statistics are gathered for each clock tick.
An interval that is smaller than 5 or 10 seconds will be more difficult to analyze. A larger interval might provide a means of smoothing the data by removing spikes that could mislead you during analysis.
mpstat output
#mpstat 10 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 1 0 5529 442 302 419 166 12 196 0 775 95 5 0 0 1 1 0 220 237 100 383 161 41 95 0 450 96 4 0 0 4 0 0 27 192 100 178 94 38 44 0 100 99 1 0 0 5 1 0 160 255 100 566 202 28 162 0 1286 87 8 0 5 8 0 0 131 283 100 684 238 30 203 0 1396 81 11 0 8 9 1 0 165 263 100 579 212 23 162 0 1260 86 10 0 4 10 1 0 208 255 100 553 213 12 179 0 1430 88 11 0 1 11 0 0 116 255 100 698 207 48 221 0 1310 76 14 0 10 12 2 0 239 252 100 584 215 8 152 0 1529 90 8 0 2 13 0 0 110 275 100 459 200 36 100 0 619 96 4 0 0 14 1 0 145 263 100 583 218 18 165 0 1389 88 7 0 4 15 1 0 165 254 100 1404 587 26 179 0 2117 82 11 0 7 16 0 0 133 278 100 523 215 26 130 0 1068 93 6 0 2 17 0 0 77 292 100 506 219 35 117 0 657 94 4 0 2 18 1 0 235 257 100 655 218 25 185 0 1722 85 9 0 5 19 1 0 193 255 100 576 212 14 164 0 1485 89 8 0 2 20 0 0 363 5731 5686 727 177 62 532 0 423 36 46 0 18 21 1 0 174 256 100 608 220 24 174 0 1444 85 10 0 5 22 0 0 125 259 100 566 216 12 192 0 1645 85 11 0 4 23 0 0 46 317 100 457 216 39 93 0 118 99 1 0 0 24 0 0 47 298 100 406 198 48 76 0 123 98 2 0 0 25 3 0 414 270 100 882 340 8 158 0 1736 91 8 0 0 26 1 0 155 261 100 564 213 18 190 0 1330 87 11 0 2 27 1 0 217 257 100 552 220 2 160 0 1699 91 8 0 0 28 3 0 423 259 100 840 287 13 177 0 1683 88 10 0 2 29 0 0 752 1218 1113 666 127 77 346 0 637 56 25 0 19 30 0 0 103 294 100 468 211 31 98 0 552 96 4 0 0 31 1 0 109 252 100 570 207 16 190 0 1501 86 10 0 4
What to look for
- Note the much higher intr and ithr values for CPU#20 and CPU#21. Solaris will select some CPUs to handle the system interrupts. Which CPUs and the number that are chosen depend on the I/O devices attached to the system, the physical location of those devices, and whether interrupts have been disabled on a CPU (psradmin command).
intr– interruptsintr– thread interrupts (not including the clock interrupts)csw– Voluntary Context switches. When this number slowly increases, and the application is not IO bound, it may indicate a mutex contention.icsw– Involuntary Context switches. When this number increases past 500, the system is under a heavy load.smtx– ifsmtxincreases sharply, for instance from 50 to 500, it is a sign of a system resource bottleneck (ex., network or disk).Usr,sysandidl– Together, all three columns represent CPU saturation. A well-tuned application under full load (0% idle) should fall within 80% to 90%usr, and 20% to 10%systimes, respectively. A smaller percentage value forsysreflects more time for user code and fewer preemptions, which result in greater throughput for a Java application.
Things to try
- Do not include CPU(s) handling interrupts in processor binds of processor sets. In the above example, CPU#20 and CPU#29 are handling interrupts. If you wanted to run 14 instances of your application, and you get the best performance from one instance from 2 CPUs, then it is reasonable to expect that creating 14 2CPU processor sets would yield the best performance. The ideal solution would be to create 13 processor sets, which don’t include the interrupt-handling CPUs, and bind 13 of the processes to the 13 processor sets. The last process would be started and allowed to run on the remaining CPUs. It is important to make available to your application as many CPUs as it can efficiently use.
- Do you see increasing
csw? For a Java application, an increasingcswvalue will most likely have to do with network use. A common cause for a highcswvalue is the result of having created too many socket connections–either by not pooling connections or by handling new connections inefficiently. If this is the case you would also see a high TCP connection count when executingnetstat -a | wc -l(Refer to the netstat section). - Do you see increasing
icsw? A common cause of this is preemption, most likely because of an end of time slice on the CPU. For a Java application, this could be a sign that there is room for improvement in code optimization.
iostat
The iostat tool gives statistics on the disk I/O subsystem. The iostat command has many options. More information can be found in the man pages. The following options provide information on locating I/O bottlenecks.
iostat Output
#iostat -xn 10
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 fd0
2.7 58.2 14.6 2507.0 0.0 1.4 0.0 23.0 0 52 d0
47.3 0.0 2465.6 0.0 0.0 0.4 0.0 8.8 0 30 d1
0.0 0.1 0.0 0.1 0.0 0.0 0.0 13.1 0 0 c0t0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t1d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t6d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t9d0
0.1 58.2 0.1 801.9 0.0 1.5 0.0 25.7 0 29 c1t10d0
2.1 64.4 10.5 818.8 0.0 1.6 0.0 23.5 0 38 c1t11d0
0.5 71.7 4.0 887.1 0.0 1.6 0.0 21.8 0 41 c1t12d0
92.0 0.0 1242.5 0.0 0.0 0.7 0.0 8.1 0 24 c1t13d0
84.7 0.0 1223.1 0.0 0.0 0.7 0.0 8.4 0 22 c1t14d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 thirdeye:vold(pid268)
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 fd0
2.5 94.3 14.3 2372.5 0.0 4.0 0.0 41.8 0 85 d0
50.8 2.8 2000.3 22.4 0.0 0.7 0.0 13.8 0 29 d1
0.4 2.3 2.5 17.7 0.0 0.2 0.0 82.4 0 3 c0t0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t1d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t6d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t9d0
0.0 62.6 0.0 736.0 0.0 1.6 0.0 25.2 0 46 c1t10d0
1.9 60.6 9.5 746.9 0.0 2.6 0.0 41.5 0 45 c1t11d0
0.6 80.0 4.8 888.8 0.0 2.6 0.0 32.6 0 65 c1t12d0
74.8 2.4 1014.2 19.2 0.0 0.9 0.0 11.4 0 22 c1t13d0
75.7 0.4 986.1 3.2 0.0 0.5 0.0 6.7 0 20 c1t14d0
What to look for
-
%b– Percentage of time the disk is busy (transactions in progress). Average %b values over 25 could be a bottleneck. -
%w– Percentage of time there are transactions waiting for service (queue non-empty). -
asvc_t– Reports on average response time of active transactions, in milliseconds. It is mislabeledasvc_t; it is the time between a user process issuing a read and the read completing. Consistent values over 30ms could indicate a bottleneck.
Things to try
- For a Java application, disk bottlenecks can often be addressed by using software caches. An example of a software cache would be a JDBC result set cache, or a generated pages cache. Disk reads and writes are slow; therefore, limiting disk access is a sure way to improve performance. Problems with too much disk access are often hidden when running on Solaris because of its own file system caches. Even with Solaris file system caches, using software caches to prevent files ystem and operating system overhead is recommended.
- Mount file systems with options. (Refer to the
mount_ufsman page). Several mount options may eliminate some disk load. Which options to try depends highly on the type of data. One possible option isnoatime, which specifies the ufs file system not to update the access time on files. This may reduce load of systems accessing read-only files or doing error logging. - # mount -F ufs -o noatime /<your_volume>
- Add more disks to the file system. If you are using a single disk file system, upgrading to a hardware or software RAID is the next logical step. Hardware RAID is significantly faster than software RAID and is highly suggested. A software RAID solution would add additional computational (CPU) load to the system.
- Change block size. Depending on storage hardware and application behavior, there may be a better block size to use besides the ufs default of 8192k. Look at the man pages for
mkfsandnewfsto determine ways to change block size.
netstat
The netstat tool gives statistics on the network subsystem. It can be used to analyze many aspects of the network subsystem, two of which are the TCP/IP kernel module and the interface bandwidth. An overview of both uses is below.
netstat -I hme0 10
These
netstatoptions are used to analyze interface bandwidth. The upper bound (max) of the current throughput can be calculated from the output. The upper bound is reported because thenetstatoutput reports the metric of packets, which don’t necessarily have to be their maximum size. The upper bound of the bandwidth can be calculated using the following equation:Bandwidth Used = (Total number of Packets) / (Polling Interval (10) ) ) * MTU (1500 default).
The current MTU for an interface can be found with:
ifconfig -a
netstat -I hme0 10 Output
#netstat -I hme0 10 input hme0 output input (Total) output packets errs packets errs colls packets errs packets errs colls 122004816 272 159722061 0 0 348585818 2582 440541305 2 2 0 0 0 0 0 84144 0 107695 0 0 0 0 0 0 0 96144 0 123734 0 0 0 0 0 0 0 89373 0 114906 0 0 0 0 0 0 0 84568 0 108759 0 0 0 0 0 0 0 84720 0 108800 0 0 0 0 0 0 0 87911 0 112803 0 0 0 0 0 0 0 99046 0 126866 0 0 0 0 0 0 0 105500 0 134260 0 0 0 0 0 0 0 96404 0 123158 0 0 0 0 0 0 0 86732 0 111010 0 0 0 0 0 0 0 87753 0 112309 0 0 0 0 0 0 0 88752 0 114405 0 0 0 0 0 0 0 96240 0 123425 0 0 0 0 0 0 0 107527 0 136866 0 0 0 0 0 0 0 100686 0 128385 0 0 0 0 0 0 0 92745 0 118790 0 0 0 0 0 0 0 95187 0 122041 0 0 0 0 0 0 0 95105 0 122998 0 0 0 0 0 0 0 104498 0 134284 0 0 0 0 0 0 0 113289 0 144882 0 0 0 0 0 0 0 103227 0 132159 0 0 0 0 0 0 0 98239 0 125220 0 0
What to look for
colls– collisions. If your network is not switched, then a low level of collisions is expected. As the network becomes increasingly saturated, collision will increase and eventually will become a bottleneck. The best solution for collisions is a switched network.errs– errors. The presence of errors could indicate device errors. If your network is switched, errors indicate that you are nearly consuming the bandwidth capacity of your network. The solution to this problem is to give the system more bandwidth, which can be achieved through more network interfaces or a network bandwidth upgrade. This is highly dependent on your particular network architecture.
Things to try
- For a Java application, network saturation is difficult to address besides increasing bandwidth. If network saturation is occurring quickly (saturation at less than 8CPUs for an application server running on a 100mbit Ethernet), then an investigation to ensure conservative network usage is a good first step.
- Increase network bandwidth. If your network is not switched, the best step to take is to upgrade to a switched network. If your network is switched, first check if more network interfaces are a possible solution, otherwise upgrade to a higher bandwidth network.
netstat -sP tcp
These netstat options are used to analyze the TCP kernel module. Many of the fields reported represent fields in the kernel module that indicate bottlenecks. These bottlenecks can be addressed using the ndd command and the tuning parameters referenced in the /etc/rc2.d/S69inet Section
netstat -sP tcp Output
#netstat -sP tcp
TCP tcpRtoAlgorithm = 4 tcpRtoMin = 400
tcpRtoMax = 60000 tcpMaxConn = -1
tcpActiveOpens = 34773 tcpPassiveOpens = 9015
tcpAttemptFails = 110 tcpEstabResets = 145
tcpCurrEstab = 106 tcpOutSegs =2338097
tcpOutDataSegs =1363583 tcpOutDataBytes =730037068
tcpRetransSegs = 531 tcpRetransBytes =139481
tcpOutAck =974222 tcpOutAckDelayed =388421
tcpOutUrg = 0 tcpOutWinUpdate = 96
tcpOutWinProbe = 53 tcpOutControl = 87975
tcpOutRsts = 666 tcpOutFastRetrans = 47
tcpInSegs =2302712
tcpInAckSegs =1148145 tcpInAckBytes =729808007
tcpInDupAck = 76300 tcpInAckUnsent = 0
tcpInInorderSegs =1828170 tcpInInorderBytes =995767266
tcpInUnorderSegs = 15155 tcpInUnorderBytes =113298
tcpInDupSegs = 1144 tcpInDupBytes =132520
tcpInPartDupSegs = 1 tcpInPartDupBytes = 416
tcpInPastWinSegs = 0 tcpInPastWinBytes = 0
tcpInWinProbe = 46 tcpInWinUpdate = 48
tcpInClosed = 251 tcpRttNoUpdate = 344
tcpRttUpdate =1105386 tcpTimRetrans = 989
tcpTimRetransDrop = 5 tcpTimKeepalive = 818
tcpTimKeepaliveProbe= 183 tcpTimKeepaliveDrop = 0
tcpListenDrop = 0 tcpListenDropQ0 = 0
tcpHalfOpenDrop = 0 tcpOutSackRetrans = 56
What to look for
tcpListenDrop– If after several looks at the command output thetcpListenDropcontinues to increase, it could indicate a problem with queue size.
Things to try
- Increase Java application thread count. A possible cause of increasing
tcpListenDropis the application throughput being bottlenecked by the number of executing threads. At this point increasing application threads may be a good thing to try. - Increase queue size. Increase the request queue sizes using
ndd. More information on other ndd commands referenced in the /etc/rc2.d/S69inet Section- ndd -set /dev/tcp tcp_conn_req_max_q 1024
- ndd -set /dev/tcp tcp_conn_req_max_q0 4096
netstat -a | grep <your_hostname> | wc -l
Running this command gives a rough count of socket connections on the system. There is a limit of how many connections can be open at one time; therefore, it is a good tool to use when looking for bottlenecks.
netstat -a | grep <your_hostname> | wc -l Output
#netstat -a | wc -l 34567
What to look for
- socket count – If the number returned is greater than 20,000 then the number of socket connections could be a possible bottleneck.
Things to try
- For a Java application, a common cause of too many sockets is inefficient use of sockets. It is common practice in Java applications to create a socket connection each time a request is made. Creating and destroying socket connections is not only expensive, but can cause unnecessary system overhead by creating too many sockets. Creating a connection pool may be a good solution to investigate. For an example of connection pool use, refer to Advanced Programming for the Java 2 Platform, Chapter 8.
- Decrease point where number of anonymous socket connections start.
- ndd -set /dev/tcp tcp_smallest_anon_port 1024
- Decrease the time a TCP connection stays in TIME_WAIT.
- ndd -set /dev/tcp tcp_time_wait_interval 60000
verbose:gc
The java -verbose:gc option is a great tool for quickly diagnosing garbage collection (GC) bottlenecks. Calculate the total of all the time spent in GC by adding the time output from -verbose:gc. If the fraction (time in GC)/( elapsed time) is a high fraction greater than 0.2, then GC is most likely a problem. If this fraction is less than 0.2, then GC is not the issue. For more detail information about JVM Garbage Collection, see Tuning Garbage Collection with the 1.3.1 Java Virtual Machine.
Java Application
Tnf traces
This is a great tool for both profiling and debugging a Java Application. On a Solaris system refer to the Manual pages for tracing, TNF_PROBE, tnfdump, tnfmerge and prex. This will help to get an overall understanding of inserting the probes in the source code. The manual pages have been written with C/C++ sources in view.
Here are the steps to take for a Java source:
Step 1: Insert the probes as shown in the short example below.
import java.io.*;
import java.util.*;
class probedObject{
public native void objectCreateStart();
public native void objectCreateEnd();
static {
System.loadLibrary("javaProbe");
}
}
class Main{
public static void main(String[] arg) throws Throwable
{
probedObject obj = new probedObject();
long startTime = System.currentTimeMillis();
for (int i=0; i<1000; i++) {
obj.objectCreateStart();
obj = new probedObject();
obj.objectCreateEnd();
};
System.out.println(System.currentTimeMillis()-startTime);
}
}
Step 2: Compile Main.java
#javac Main.java
Step 3: Generate .h file
Step 2 will result in an object called probedObject.class. Use this class to generate the .h file using JNI as follows:
#javah -jni probedObject
Step 4: Write the C routine javaProbe.c
#include <jni.h>
#include "probedObject.h"
#include <tnf/probe.h>
JNIEXPORT void JNICALL Java_probedObject_objectCreateStart(JNIEnv *env,
jobject obj){
TNF_PROBE_0(object_create_start, "object creation", "");
}
JNIEXPORT void JNICALL Java_probedObject_objectCreateEnd(JNIEnv *env,
jobject obj){
TNF_PROBE_0(object_create_end, "object creation", "");
}
Step 5: Generate the shared library
#cc -G -I/usr/java/include -I/usr/java/include/solaris javaProbe.c -o libjavaProbe.so
Step 6: Run the program under prex.
Please note that prex has a circular buffer as mentioned in the man pages for prex. Use the -o and -s options for prex, as needed.
darwin 69 =>prex java Main Target process stopped Type "continue" to resume the target, "help" for help ... prex> enable $all prex> continue Target process exec'd
Step 7: Use the tnfdump on the output trace file to get the ASCII output, or use the tnfmerge to merge trace files. For information of TNF (Trace Normal Form) TNF, including TNFView and tnfmerge, refer to Performance Profiling Using TNF.
JVMPI
The JVMPI (Java Virtual Machine Profiler Interface) is a two-way function call interface between the Java virtual machine and an in-process profiler agent. On one hand, the virtual machine notifies the profiler agent of various events, corresponding to, for example, heap allocation, thread start, etc. On the other hand, the profiler agent issues controls and requests for more information through the JVMPI. For example, the profiler agent can turn on/off a specific event notification based on the needs of the profiler front-end. A detailed overview of JVMPI can be found at Java Virtual Machine Profiler Interface (JVMPI).
Commercial Profiling Tools
Commercial and public source profiling tools are mentioned here. All of them use the JVMPI.
Tuning Parameters
Solaris 8 Tuning Parameters
Below are the Solaris 8 and JVM tuning parameters found to work best with server-side Java applications. The tuning parameters are listed with a brief description. A more in-depth look at when to use these parameters is discussed in the Analysis Tools and Tuning Process sections.
/etc/system
The table below is a list of /etc/system tuning parameters used during the performance study. The changes are applied by appending each to the /etc/system file and rebooting the system.
|
A description of all /etc/system parameters can be found in the Solaris Tunable Parameters Reference Manual.
/etc/rc2.d/S69inet
Below is a list of TCP kernel tuning parameters. These are known TCP tuning parameters for high throughput Java servers. The parameters can be applied by executing each line individually with root privileges, or appending each to the /etc/rc2.d/S69inet file and rebooting the system.
A detailed description of each of these parameters can be found in the Solaris TunableË Parameters Reference Manual.
|
Java Application Tuning Parameters
Brief suggestions for basic Java server applications are listed below.
Number of Execution Threads
A general rule for thread count is to use as few threads as possible. The JVM performs best with the fewest busy threads. A good starting point for thread count can be found with the following equations.
(Number of Java Execution Threads) = Number of Transactions / Time(in seconds)
or
(Number of Execution Threads)=Throughput(transactions/sec)
It is important to remember that these equations give a good starting point for thread count tuning, not the best value for thread count for your application. The number of execution Threads can greatly influence performance; therefore, the proper sizing of this value is very important.
Number of Database Connections
The number of database connections, commonly known as a connection or resource pool, is closely tied to the number of execution threads. A rule of thumb is to match the number of database connections to the number of execute threads. This is a good starting point for finding the correct number of database connections. Over-configuring this value could cause unnecessary overhead to the database, while under-configuring could tie up all execution threads waiting on database I/O.
(Number of Database Connections) = (Number of Execution Threads)
Software Caches
Many server-side Java applications implement some type of software cache, commonly for JDBC result sets, or commonly generated, dynamic pages. Software caches are the most likely part of an application to cause unnecessary garbage collection overhead resulting from the software cache architecture and the replacement policy of the cache.
Most middle tier applications will have some sort of caching. These caches should be studied with GC in mind to see if they result in greater GC. Choose the architecture and replacement strategy that has lower GC. Careful implementation of caches with garbage collection in mind greatly improves performance simply by limiting garbage.
Java Virtual Machine Tuning Parameters
Below are a few Java Virtual Machine Tuning Parameters that have been found to improve performance. There are many more tuning parameters; the following are examples of what has worked for us. A detailed list of all tuning parameters can be found Java HotSpot VM Options.
| Java VM Option | Description |
-XX:+UseLWPSynchronization |
Use LWP-based instead of thread based synchronization (SPARC only). |
-XX:SurvivorRatio=40 |
Ratio of eden/survivor space size [Solaris: 64, Linux/Windows: 8]. |
-XX:NewSize=128m |
Disable young generation resizing. To do this on Hotspot, simply the size of the young generation to a constant. |
-Xms=512m |
Overall size of Heap.
. REF : http://developers.sun.com/solaris/articles/performance_tools.html |