A Wider View: Experimentations and ruminations on Oracle performance analysis

Subscribe to A Wider View: Experimentations and ruminations on Oracle performance analysis feed
Oracle Database performance tuning and analysis experimentation and ruminations.Craig Shallahamer, President/Founder, OraPubhttp://www.blogger.com/profile/04109635337570098781noreply@blogger.comBlogger95125
Updated: 2 weeks 6 days ago

Four Options For Oracle DBA Tuning Training

Mon, 2014-07-28 10:00
This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.Four Options For Oracle DBA Tuning Training
Oracle DBAs are constantly solving problems... mysteries. That requires a constant knowledge increase. I received more personal emails from my Oracle DBA Training Options Are Changing posting than ever before. Many of these were from frustrated, angry, and "stuck" DBAs. But in some way, almost all asked the question, "What should I do?"

In response to the "What should I do?" question, I came up with four types of Oracle DBA performance tuning training that are available today. Here they are:

Instructor Led Training (ILT) 
Instructor Led Training (ILT) is the best because you have a personal connection with the teacher. I can't speak for other companies, but I strive to connect with every student and every student knows they can personally email or call me...even years after the training. In fact, I practically beg them to do what we do in class on their production systems and send me the results so I can continue helping them. To me being a great teacher is more than being a great communicator. It's about connection. ILT makes connecting with students easy.

Content Aggregators
Content Aggregators are the folks who pull together free content from various sources, organize and display it. Oh yeah... and they profit from it. Sometimes the content value is high, sometimes not. I tend to think of content aggregators like patent trolls, yet many times they can be a great resource. The problem is you're not dealing with the creator of the content. However, the creator of the content actually knows the subject matter. You can somtimes contact them...as I encourage my students and readers to do.

Content Creators
Content Creators are the folks who create content based on their experiences. We receive that content through their blogs, videos, conference presentations and sometimes through their training. I am a content creator but with an original, almost child-like curiosity, performance research twist. Content creators rarely directly profit from their posted content, but somehow try to transform it into a revenue stream. I can personally attest, it can be a risky financial strategy...but it's personally very rewarding. Since I love do research, it's easy and enjoyable to post my findings so others may benefit.

Online Training (OLT)
Online Training (OLT) is something I have put off for years. The online Oracle training I have seen is mostly complete and total crap. The content is usually technically low and mechanical. The production quality is something a six year old can do on their PC. The teaching quality is ridiculous and the experience puts you to sleep. I do not ever want to be associated with that kind of crowd.

I was determined to do something different. It had to be the highest quality. I have invested thousands of dollars in time, labor, and equipment to make online video training
Craig teaching in an OraPub Online Institute Seminarwork. Based on the encouraging feedback I receive it's working!

This totally caught me by surprise. I have discovered that I can do things through special effects and a highly organized delivery that is impossible to do in a classroom. (Just watch my seminar introductions on YouTube and you'll quickly see what I mean.) This makes the content rich and highly compressed. One hour of OraPub Online Institute training is easily equivalent to two to four hours of classroom training. Easily. I have also strive to keep the price super low, the production at a professional level and ensure the video can be streamed anywhere in the world and on any device. Online training is an option, but you have to search for it.

So there you have it. Because of economics and the devaluation of DBAs as human beings coupled with new technologies, the Oracle DBA still has at least four main sources of training and knowledge expansion. Don't give up learning!

Some of you reading may be surprised that I'm writing about this topic because it will hurt my traditional instructor led training (public or on-site) classes. I don't think so. If people can attend my classes in person, they will. Otherwise, I hope they will register for an OraPub Online Institute seminar. Or, at least subscribe to my blog (see upper left of page).

All the best in your quest to do great work,

Categories: DBA Blogs

Oracle DBA Training Options Are Changing

Tue, 2014-07-15 08:43
This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.Oracle DBA Training Options Are Changing
Training options for Oracle Database DBAs are changing. Generally, I don't think they are for the better. Companies don't value Oracle Database Administrators like they used to. And, it shows in the lack of their professional development investment.

When I travel a long way from home, I tend to get very reflective about life, death and beyond. On my way home from teaching an onsite two-day Oracle performance tuning seminar coupled with a one-day predictive analysis (forecasting) class in Ghana (yes, Ghana as in AFRICA) I started thinking about how fortunate the Ghana DBAs I taught are. Clearly their management is willing to invest in their DBAs' future. This is very, very rare.

Today most Oracle DBAs receive what I call, "Training By Google." You know, blog posts, YouTube videos and various syntax websites. While these are all valuable (I am a content creator myself with my blog posts and videos), they are no substitute for instructor led training. Not even close! So what is happening that is forcing Oracle DBAs to change their training habits?

So Why The Change? Three Reasons
1. Training Budget. Over the past five years I have been disappointed (more like disturbed) that most companies do not provide the training DBAs need. They just won't do it. IT managers (not typically DBA managers) believe their staff can get by with "Training By Google." It's stupid and foolish. It tells DBAs they are worthless and leaves them unprepared to perform at their best. And, of course, that ends up hurting the company they work for. Stupid and foolish.

Are we then surprised with the results from poor performing systems, down production systems, massive security breaches, and DBAs hopping from one company to another?

2. Travel Budget. A nasty tactic many companies use is to provide a minimal training budget but without a travel budget. If you want specialized and advanced training, you'll probably have to travel to get to it. Maybe not hundreds or thousands of miles, but probably more than you want to commute each day.

Essentially the company is splitting the training cost with the DBA and ensuring the DBA really, really wants the training. OK, I can respect that. But, I think a company that does not truly provide training for its employees (human beings that spend a significant portion of their lives doing whatever it takes to get the job done) is cruel and frankly immoral.

3. More Training Options. The good news for Oracle DBAs is there is more information and training options available today than ever before. When the orapub.com website began in 1995, doing a "tail -f" on the web log was a lesson in world geography. It was amazing watching line after line stream by as DBAs from all over the world were looking for Oracle performance materials through the web. Now there is much more available. Training options for Oracle DBAs now include traditional instructor led training (ILT), web sites from content aggregators (people who pull together content for us), content creators (like myself), and online training. I'm very excited about online training and have made a significant investment in OraPub's Online Institute.

So there you have it. Because of economics, the devaluation of DBAs as human beings and the increase in training options, the Oracle DBA training landscape is changing. If you believe this, the next question is, "What is good content?" That will be the subject of my next posting!

Enjoy your work and thanks for reading!

Categories: DBA Blogs

How The Oracle Database Determines Wait Time When It's Not Set

Mon, 2014-06-30 07:41
This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.Have you ever wondered how the Oracle Database 12c (and earlier versions) determines the wait time when it has absolutely no control over how long the wait will take? If so, then read on!

The Back Story
Using wait time is part of an Oracle Time Based Analysis (OTBA). While Oracle process CPU consumption is a big part of the analysis, the other category is non-idle wait time.

You can see the two categories of time clearly "in action" with my Real-Time Session Sampler script, rss.sql. It's part of my OraPub System Monitor (OSM) toolkit, that can be downloaded for free. Here's an example of the output:

I need to  explain a little more about CPU time and then Oracle wait time.

An Oracle processes wants to burn CPU. Without consuming CPU an Oracle process, can well...not process work! Oracle keeps track of the CPU consumption time. It's the actual CPU time consumed, for example, 500 ms or 3 seconds.

When an Oracle process can not burn CPU, the process, in Oracle terms, it must wait. For example, when an Oracle process waits, it's like it yells out details about why it's waiting. We call this yell a wait event. Each wait event has a name. And the name provides clues about why the process can't burn CPU and is therefore waiting.

There Are Different Reasons Why An Oracle Process Waits
There are three broad categories Oracle must time, when a process is not consuming CPU:
  1. When the wait time is predetermined and not interruptible. Perhaps when a latch can not be acquired through repeated attempts so the process takes a break (i.e., sleeps) for a fix period of time, say 10ms.
  2. When the wait time is predetermined but the process can be woken by another process. Perhaps a log writer is in the middle of its three second sleep and then a server process commits. The log writer will be signaled to wake and do some work.
  3. When Oracle has no idea how long the wait may last. Perhaps a process submitted a block to the IO subsystem for a synchronous read. Oracle has no idea how long this may take. This situation is what I'm focusing on in this posting and I show in the video below.
Exploring When Oracle Has No Idea How Long The Wait Will Take
Let's say when I work it's like an Oracle process consuming CPU. And if I have to stop working to drive to a meeting, it's like an Oracle process waiting. And in this situation,  I really don't know how long the wait will take. It's out of my control. There could be an accident along the way (think: table level lock)! An Oracle process can experience this same kind of thing.

If I'm an Oracle server process and I discover a block I need is not in the buffer cache, I'm going to need to make a call to the OS for that block. When I make the synchronous IO call, I really do not know how long it will take and I have to wait, that is, the Oracle process must wait.

From a DBA perspective, when I perform an Oracle Time Based Analysis (OTBA) I need know how long the IO, that is, the wait took. How does Oracle figure this out? It's pretty simple actually. It's basiclly like this:
  • Get the current time, start time
  • Make the synchronous IO call and wait until IO received
  • Get the current time, end time
  • Calculate the delta, end time - start time
The "delta" is the wait time for the single block read. If the single block read is a synchronous read and then placed into the buffer cache, Oracle will tag or name the wait time calling it a "db file sequential read".

Actually Watching An Oracle Process Figuring Out the Wait Time
Some say that seeing is believing. For sure it helps one to learn quickly. If you want to see with your own eyes an Oracle server process determine wait time for a multiple block synchronous read (event name is, db file scattered read), watch the below video.

Pretty cool, eh? By the way, the processing of timing processes and events has a special name, called instrumentation. As we can see, Oracle has instrumented its kernel code.

Enjoy your work and thanks for reading!

Categories: DBA Blogs

The LGWR Three Second Rule. Really?

Sun, 2014-06-08 09:39
This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.Does the Oracle Database 12c Log Writer Really Sleep For Three Seconds?I have learned the part of Oracle Database performance tuning is checking if what you have been taught is actually true. What I'm writing about today has been done before in Oracle Database 10g and 11g, but I wanted to document this using 12c.

When I was first learning about the Oracle Database process architecture, the instructor said there are a number of reasons the log writer (LGWR) background process will wake up and do "something." In addition to this, if the LGWR was not signaled to wake up, every three seconds it would wake up by itself and check if there was something it needed to do.

Is this "three second rule" really true? That's what this posting is all about.

If you don't want to read the details below, I created a short video for you to watch.

Here is what you just saw in static text. First let's get the LGWR process PID. Here's one way to do this.
$ ps -eaf | grep lgwr
oracle 41977 1 0 May16 ? 00:00:50 ora_lgwr_prod35
oracle 46294 46110 0 07:39 pts/0 00:00:00 grep lgwr
oracle 60264 1 0 Jan13 ? 00:14:14 ora_lgwr_prod30
My Oracle Database 12c instance is prod35, so the process I'm interested in 41977.

Note: Since this is a 12c instance, there are also two additional log writers, lg00 and lg01. But that's a topic for another posting!

At this point, for simplicity sake it's important there be little instance activity. It will make it much simpler to find what we are looking for... the three second sleep

To watch the LGWR process, I'm going to use the Linux strace command. To easily see the system call timing, I'm going to use the -r option. I'm also going to redirect the output, using the -o option to the out.txt file. I'll let the below strace run for about 15 seconds. OK... here we go:
$ strace -rp 41977 -o out.txt
Process 41977 attached - interrupt to quit
^CProcess 41977 detached
To break out of the strace, I simply did a control-c and the result is what you see above. Let's look at the first three lines of the out.txt file so I can tell you what we will be looking for.
$ head -3 out.txt
0.000000 clock_gettime(CLOCK_MONOTONIC, {19940373, 720653971}) = 0
0.000186 gettimeofday({1401979552, 853214}, NULL) = 0
0.000207 gettimeofday({1401979552, 853444}, NULL) = 0
With the -r option, the first column (far left) shows how long the system call took the for call directly above. For example, in the snipped above, the first clock_gettime call took 0.000186 seconds. The first gettimeofday call took 0.000207 seconds. Got it? Read on!

You may wonder why the system call time is the next line down? It actually makes since, because strace displays information as it becomes available. When the call starts obviously it does not know the call duration, so it just displays the call itself. When the call completes and therefore has the time, it shows return details (e.g., "= 0"), line feeds, prints the call time, the next call is displayed, and repeat. OK, back to our objective.

What we are looking for is a three second sleep. In other words, we are looking for a call that takes three seconds. And since a call's duration shown in the first column of the next line down, we want to look for values in the first column that start with 3.0.

Below is an example of what I found in the out.txt file.
0.000033 clock_gettime(CLOCK_MONOTONIC, {19940376, 746821505}) = 0
0.000033 semtimedop(7503875, {{16, -1, 0}}, 1, {3, 0}) = -1 EAGAIN
                   (Resource temporarily unavailable)
3.000436 clock_gettime(CLOCK_MONOTONIC, {19940379, 747351774}) = 0
0.000188 gettimeofday({1401979558, 879922}, NULL) = 0
0.000171 clock_gettime(CLOCK_MONOTONIC, {19940379, 747708976}) = 0
Looking at the above output snippet, notice the time of 3.000436. Then look at the call directly above it. It's the semtimedop call. This call can be used to put a process to sleep, but the process can be woken up. In other words, the LGWR can set an alarm for three seconds, go to sleep, and if it is not disturbed, it will wake up in three seconds. Shown above is an example of the LGWR setting the alarm for three seconds, not being woken up until the alarm fired. This is an example of the three second rule!

If you do what I did and look at your output file, you'll see lots of these "three seconds." If you do this on a pretty-much idle Oracle Database 12c instance's LG01 process (one of the two default LGWR children), it may contain mostly semtimedop calls, each with a three second sleep.

As you might have guessed, the sleep time is part of the system call. Look closely at the semtimedop call below:
semtimedop(7503875, {{16, -1, 0}}, 1, {3, 0})
Notice the "{3, 0}" near the far right? The defines the maximum delay, that is, the sleep. The 3 is the seconds and the 0 is the number of nanoseconds. If you're interested in other sleep options and the actual code to make this work, check out this posting.

Oracle could have used the select system call (not "select * from...") but that would have been a bad idea because then the process could not be woken before the three seconds limit. This topic touches very closely on Oracle Time Based Analysis and more specifically about Oracle's wait interface. I really get into this, including sample C code in my online video seminar entitled, Introduction To Oracle Time Based Analysis.

So there you go! The Oracle kernel code developers set the default sleep time to three seconds along with the ability for the process to be awoken, should the need arise.

Give It A Try
Check this out for yourself on a test or QA system. Once you see it for yourself, you'll never forget about it!

But it gets even better! If you really want to dig into this, do the same but on your database writer (DBWR). Does the DBWR have a three second rule? Does Oracle use the same system call? I'm not going to tell you the answer. You'll have to figure that out for yourself!

Always A Three Second Delay?
You may have noticed that sometimes the process does not sleep the entire three seconds and at other times Oracle sets the sleep time less than three seconds. What up with that? That will be my next posting!

Thanks for reading,

Categories: DBA Blogs

It's All About CPU But There Is NO CPU Bottleneck! What?

Thu, 2014-05-29 17:00
This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.It's All About CPU But There Is NO CPU Bottleneck...
Diagnosing Oracle Database performance is like solving a puzzle. But what I really enjoy is coming up with performance solutions that are anti-intuitive for most people. You know, the ones when you can see people stop talking and actually think, "Why would Craig say that!?" In this posting, I delve into one of these situations.

The Situation
Just over a month ago I received an email from a former Australian Oracle Performance Firefighting student about their "CPU bottleneck." Based on an one-hour AWR report's Time Model Statistics section (which is based on v$sys_time_model view), the  DB CPU was 95% of DB Time, leaving 5% for Oracle wait time. The top CPU consuming SQL statements were easily identified and work had begun with tuning them. By "tuning" I mean a small group of DBAs were working to reduce their CPU consumption. But something didn't feel right to him, so he emailed me about it.

My first question was, "Is there an OS CPU bottleneck?" That is, is the host (or virtual machine if you like) out of CPU resources? I was wondering if the average CPU utilization was 80% or higher. Why was I asking this? Read on...

DB Time Math
Over an interval of time, DB Time is all the Oracle server process CPU time (i.e., DB CPU) plus all the non-idle wait time.

Note: Oracle does not include background process CPU in DB Time. Why they don't include background process CPU time is an entirely different subject. And of course, we are trusting Oracle that what they consider "non-idle" wait time should actually be "non-idle" wait time. Again, that's an entirely different subject.

My former student knew the DB Time math and therefore understood that since DB CPU was pretty close to DB Time, there was little non-idle wait time. It was "all about CPU." While I still wanted to know the wait event situation because it provides performance hindering clues and forces the DBA to review their work and think a little more.

OS CPU Utilization Math
In my Utilization On Steroids online video seminar, I demonstrate utilization math by shrinking myself down to the size of a water glass and also the size of a printer cartridge. (Like the motto says, "What Online Training Should Be!") In a nutshell, utilization can be viewed/understood as requirements divided by capacity. If a 500ml glass contains 250ml of water, the glass is 50% utilized that is, 50% full.

As I present in my online seminar, there are two simple methods to figure OS CPU utilization using only the v$osstat view. One of these methods as based on the BUSY_TIME and the IDLE_TIME statistics from the Operating System Statistics section of any AWR and Statspack report. The BUSY_TIME is the instance CPU consumption, that is, the CPU requirements. As I carefully explain in my seminar and in my classes, the capacity is actually equal to the BUSY_TIME plus the IDLE_TIME. That is:

Utilization = Requirements / Capacity

Utilization =  BUSY_TIME / ( BUSY_TIME + IDLE_TIME )

This is true unless the OS is updating OS structures incorrectly (e.g., /proc) or Oracle is not gathering the data from the OS correctly or frequently enough. And yes, I talk about how to check this in my seminar.

My former student did the utilization math. The average utilization was around 65%. Since this is the only Oracle instance doing any real work on the box, is was not a surprise to him that vmstat also indicated the OS CPU utilization was 65%.

Why I Asked About A CPU Bottleneck?
Suppose there is plenty of available CPU resources AND the SQL users are complaining about are doing nothing but buffer gets (i.e., logical reads). In other words, from a resource consumption perspective, it's all about CPU consumption. Is this possible? The answer is, YES!

If there was a single server process running this SQL statement, parallel query operations were not occurring and there are four CPU cores, what would the CPU utilization be? Probably around 25% (plus overhead between 5% to 15%). That is because on average only one CPU core was processing the SQL statement!

If this SQL statement took "too long to run" adding more CPU power (specifically more cores) would not help.

As strange at it may seem, one performance improving solution is to use the available CPU! Read on.

Use More CPU! Now!
Suppose the average CPU utilization of a 10 CORE box (or VM if you prefer) was running at an average of 50% utilization. This means that on average 5 of the 10 cores are busy. Now suppose a SQL statement takes 4 minutes to process without parallel query or very-cool (but perhaps costly) application design. How long would the SQL statement take to run if it ran in four parallel streams?

It depends. We need to ensure there is available CPU power. Assuming each parallel stream consumes one CPU core, the box has 10 cores with 5 being available (remember the average utilization is 50%), that leaves with us 5 "extra" cores. For our SQL statement to run in 4 parallel streams, we need three more CPU cores... and we have them!

If there is no parallelism overhead (ha! don't count on that!), the SQL statement would run in around 1 minutes, 8 CPU cores would be active (5+3), and the average CPU utilization would be around 80% (8 cores required / 10 cores of capacity).

So by creatively using available CPU resources, we were able to consume 4 minutes of CPU in a single minute! ...a beautiful example of parallelism. And of course, the users will be delighted and I may get a free pizza.

Be Anti-Intuitive
When a SQL statement, batch process or an Oracle process is constrained by CPU, yet there is plenty of CPU power available, the problem is probably a serialization issue. (Another possibility is some type of locking or blocking problem.) The solution can be found by parallelizing, thereby using the available CPU resources which will reduce the wall/clock/run time of the process.

If the above paragraph makes sense, then you understand what I'm talking about. If it does not make sense, stop, breath, and re-read the previous paragraph or paragraphs. It will be worth your time.

This next paragraph is really important to understand:

It is important to understand we have NOT reduced the total SQL statement's CPU requirements. We have NOT increased the total SQL statement's CPU requirements. We have NOT tuned the SQL statement in the traditional sense by reducing the buffer gets (logical reads, logical IOs). What we have done is simply altered the situation by consuming the necessary CPU requirements within a shorter duration. (In our example, the four minutes of CPU time was consumed within a single minute.) We did this taking a serial process and parallelizing it to use more of the available CPU resources.

Expand Our View Into The OS
Without understanding the OS situation we can easily misdiagnose the problem. And we can miss creative and powerful solutions.

Most DBAs immediately jump to tuning the SQL and tuning Oracle. While those are both valid technical solutions, there are others... like parallelizing to reduce run time while consuming more resources (in a shorter period of time).

Some DBAs but more likely OS Administrator but for sure hardware vendors may insist on getting more CPU power. While faster CPU will help somewhat, adding more of the same CPUs cores will likely do absolutely nothing...unless we can increase parallelism to use them.

Being able to determine the OS bottleneck is so important, I devote an entire chapter in my Oracle Performance Firefighting book on the subject. When I was learning to optimize Oracle systems, learning how to find the OS bottleneck significantly increased my value in the Oracle DBA market. Later when I ruminated on serialization and parallelization, I turned situations upside down by placing a larger (yet shorter-term) load on the system as a valid performance improving solution.

If performance is not what it needs to be AND there are available OS resources, consider using them to improve performance. It's a creative and rarely used solution.

Thanks for reading!

Categories: DBA Blogs