Reading Alert Log Using V$DIAG_ALERT_EXT (sqlplus 11g+)

Got a small snippet to read alert log from sqlplus in 11g without doing any setup. This is quite useful when you are firefighting. Previously this was available through X$ view.

select TO_CHAR(A.ORIGINATING_TIMESTAMP, 'dd.mm.yyyy hh24:mi:ss') MESSAGE_TIME,message_text, host_id, inst_id, adr_home from V$DIAG_ALERT_EXT A
where A.ORIGINATING_TIMESTAMP > sysdate-1
and component_id='rdbms'
and message_text like '%ORA-%'
order by 1 desc;

Tablespace Utilization Trend and Forecasting

As a DBA, everyday we have a responsibility to keep sufficient space in tablespace across the fleet. And when the fleet is large, you have thousands of tablespace to monitor. And sometimes it is unpredictable behavior in growth due to change in business requirement, bad code or bad planning by application teams, which result in incidents. Everyone has their own workflow and tools. Some uses Enterprise Manager, some uses their own custom solutions.

To ease the monitoring and forecasting, I use following workflow:

    1. Data Collection – collect the utilization every day from all the databases and put in repository.
    2. Visualization – collected data is then visualized in a lattice graph (one graph per instance with top tablespaces as series in each plot). You can use excel, google charts etc to do this.
    3. Monitor – you can choose to manually look into the plot and take necessary actions
    4. Drill Down and Automatic prediction using linear regression – When I found anything suspicious in monitoring, I further drill down and use oracle linear regression function to predict 100% date. You can get the SQL from my github repository which generates the Plot and Provide you top objects and also the prediction date.  One can also create reports like Top 20 tablespaces, Top 20 fastest growing tablespace etc.

2015-02-23_19-02-53

The math behind the prediction is quite simple. I am using oracle provided analytics function like REGR_INT, REGR_R2 and REGR_INTERCEPT which can be used to perform linear regression on the timeseries data like tablespace utilization.  Also note that you need to convert the timestamp data into Julian formatted date to use regression functions.

The visualization is done by using Google charts. I was inspired by Carlos Sierra’s blog post on creating line charts using google charts API.

SELECT DDATE, instance_name, tablespace_name,
 megs_alloc number_column_1,
 megs_used number_column_2,
 --regr_intercept(ddate - to_date('19700101 00:00:00','YYYYMMDD HH24:MI:SS'), megs_alloc - megs_used) over (partition by tablespace_name) as regr_int,
 (to_date('19700101 00:00:00','YYYYMMDD HH24:MI:SS')+ regr_intercept(ddate - to_date('19700101 00:00:00','YYYYMMDD HH24:MI:SS'), megs_alloc - megs_used) over (partition by tablespace_name)) as predicted_date,
 regr_r2(ddate - to_date('19700101 00:00:00','YYYYMMDD HH24:MI:SS'), megs_alloc - megs_used) over (partition by tablespace_name) as coeff_of_determination  
 FROM MON_TABLESPACE
 WHERE     
instance_name  = substr('&&metric_selected.', 0,instr('&&metric_selected.',':')-1) 
AND tablespace_name  = substr('&&metric_selected.', instr('&&metric_selected.',':')+1)      AND ddate >  (SYSDATE) - nvl(&&report_period., 1) 
order by DDATE asc 

Generating ADDM report Automatically

Database tuning is always ongoing effort and it never stops. Similar doing health checks are like gold mine. You never know what you will get it. From Oracle DB 10g onwards, we have a facility to generate ADDM reports which often is neglected. With growing databases under my hood, I was bit negligent reviewing performance of some least used database or those which were not coming in picture very often. I thought of automating the tuning and reviewing which may help me to efficiently monitor the database performance and keep a track of what is going on in my database.

So here is my workflow.

1. Automate the ADDM report of last day for each database.

2. Send an email report to yourself and team

3. Ask team members to have a quick look every day on these reports

4. Run tuning advisor and segment advisor and review the result

5. Take actions

Hopefully, we will rip this gold mine of data for performance tuning.

Here is the script to generate ADDM report of last day.

/*
 * Purpose: To generate ADDM report manually
 * 
 * Author: Anjul Sahu
 * 
 * History: 
 * 25-SEP-2014	anjul	initial version
 * 
 */
 

SET LONG 1000000
SET LINES 1000 PAGES 0
SET FEEDBACK ON TIMING OFF VERIFY OFF
SPOOL last_day_addm_report.txt

DECLARE
   dbid           NUMBER;
   bid            NUMBER;
   eid            NUMBER;
   db_unique_name VARCHAR2(30);  
   host_name      VARCHAR2(64);
   status         VARCHAR2(11);
   starttime      CHAR (5);
   endtime        CHAR (5);
   output         VARCHAR2 (32000);
   tname varchar2(50);
   tid   number;  
BEGIN
   --starttime := '00:00';
   --endtime := '23:00';
	dbms_output.enable(100000);
   SELECT MIN (snap_id)-1, MAX (snap_id)+1
     INTO bid, eid
     FROM dba_hist_snapshot
    WHERE 
      --TO_CHAR (begin_interval_time, 'hh24:mi') >= starttime
      --AND TO_CHAR (end_interval_time, 'hh24:mi') <= endtime
      --AND 
      TRUNC (begin_interval_time) = TRUNC (SYSDATE-1)
      AND TRUNC (end_interval_time) = TRUNC (SYSDATE-1);
 
   SELECT dbid, db_unique_name
     INTO dbid, db_unique_name
     FROM v$database;
 
   SELECT host_name INTO host_name
     FROM v$instance;
    
    DBMS_ADVISOR.CREATE_TASK('ADDM',tid,tname,'ADDM Report( '
    || bid || ',' || eid || ' )');
    DBMS_ADVISOR.SET_TASK_PARAMETER( tname,'START_SNAPSHOT',bid );
    DBMS_ADVISOR.SET_TASK_PARAMETER( tname,'END_SNAPSHOT',eid );
    DBMS_ADVISOR.EXECUTE_TASK( tname );    
    
    status := 0;
    
    while status  'COMPLETED' loop
    select status into status from dba_advisor_tasks where task_id = tid;
    dbms_lock.sleep(5);    
    end loop;

 
 
   SELECT DBMS_ADVISOR.GET_TASK_REPORT( tname) INTO output FROM DUAL;
   DBMS_OUTPUT.PUT_LINE(output); 

EXCEPTION
   WHEN OTHERS THEN
    dbms_output.put_line('Error Occurred...'||SQLCODE);
    NULL;
END;
/
SPOOL OFF
exit

SQL Tuning: Where to START?

Found following Oracle support notes to be good starting point while tuning SQL –

  • TROUBLESHOOTING: Tuning a New Query (Doc ID 372431.1)
  • Tuning Queries: ‘Quick and Dirty’ Solutions (Doc ID 207434.1)
  • TROUBLESHOOTING: Advanced Query Tuning (Doc ID 163563.1)
  • Troubleshooting: High CPU Utilization (Doc ID 164768.1)
  • Potentially Expensive Query Operations (Doc ID 162142.1)

Cache Buffer Chains

Recently, I have been seeing cache buffer chains (CBC) in our OLTP database. We have been working to resolve for more than 2 weeks now. I came across following interesting articles:

 

We have got some Oracle recommendations. We are also trying hash partitioning our hot indexes. Lets see how it goes.


Oracle Parallelism: Handle with care

Recently, i have been investigating a performance issue in one of my production data warehouse environment and started reading about controlling and tuning parallelism on the database server.

Two very good links i have hit during this process.

1. http://oracledoug.com/px.pdf

2. https://blogs.oracle.com/datawarehousing/entry/workload_management_statement

This post will be updated with more details and resources in context with parallel tuning.


High Redo Logs or Archives?

It is sometimes crazy to see those archive destination filling up fast and your RMAN job is not able to cope-up with the redo log generation speed. Cleaning up logs is not a solution if you data is critical and you require PTTR (Point-in-time-recovery). DBA has a responsibility to identify the root cause of it. Here I have taken one such case on how to do identify the culprit.

Approach

  • Identify the log generation rate and time, how many of them are generating and when? Use the below script to see the trend.
    select to_char(FIRST_TIME,'MM/DD') day,
    to_char(sum(decode(to_char(first_time,'hh24'),'00',1,0)),'99') h00,
    to_char(sum(decode(to_char(first_time,'hh24'),'01',1,0)),'99') h01,
    to_char(sum(decode(to_char(first_time,'hh24'),'02',1,0)),'99') h02,
    to_char(sum(decode(to_char(first_time,'hh24'),'03',1,0)),'99') h03,
    to_char(sum(decode(to_char(first_time,'hh24'),'04',1,0)),'99') h04,
    to_char(sum(decode(to_char(first_time,'hh24'),'05',1,0)),'99') h05,
    to_char(sum(decode(to_char(first_time,'hh24'),'06',1,0)),'99') h06,
    to_char(sum(decode(to_char(first_time,'hh24'),'07',1,0)),'99') h07,
    to_char(sum(decode(to_char(first_time,'hh24'),'08',1,0)),'99') h08,
    to_char(sum(decode(to_char(first_time,'hh24'),'09',1,0)),'99') h09,
    to_char(sum(decode(to_char(first_time,'hh24'),'10',1,0)),'99') h10,
    to_char(sum(decode(to_char(first_time,'hh24'),'11',1,0)),'99') h11,
    to_char(sum(decode(to_char(first_time,'hh24'),'12',1,0)),'99') h12,
    to_char(sum(decode(to_char(first_time,'hh24'),'13',1,0)),'99') h13,
    to_char(sum(decode(to_char(first_time,'hh24'),'14',1,0)),'99') h14,
    to_char(sum(decode(to_char(first_time,'hh24'),'15',1,0)),'99') h15,
    to_char(sum(decode(to_char(first_time,'hh24'),'16',1,0)),'99') h16,
    to_char(sum(decode(to_char(first_time,'hh24'),'17',1,0)),'99') h17,
    to_char(sum(decode(to_char(first_time,'hh24'),'18',1,0)),'99') h18,
    to_char(sum(decode(to_char(first_time,'hh24'),'19',1,0)),'99') h19,
    to_char(sum(decode(to_char(first_time,'hh24'),'20',1,0)),'99') h20,
    to_char(sum(decode(to_char(first_time,'hh24'),'21',1,0)),'99') h21,
    to_char(sum(decode(to_char(first_time,'hh24'),'22',1,0)),'99') h22,
    to_char(sum(decode(to_char(first_time,'hh24'),'23',1,0)),'99') h23,
    count(*) Total
    from v$log_history
    where
    FIRST_TIME > sysdate - 8 --last 7 days trend
    group by to_char(FIRST_TIME,'MM/DD') order by substr(to_char(FIRST_TIME,'MM/DD'),1,10) desc ;
    
  • For the given time period, take the awr or statspack report and identify the SQL doing more redo. If statspack/awr is not configured then try to see it from v$sql with search where upper(sql_text) like ‘%INSERT%’ or upper(sql_text) like ‘%DELETE%’ or upper (sql_text) like ‘%UPDATE%’ and row_processed > 1000
  • If you are not lucky enough to get it from library cache then at the last, you have logminer option available which gives guarantee of identifying the culprit SQL.
  • Most systematic approach is to use logminer utility provided by Oracle but for that you need to perform additional setup.  It is described in detail on MOS Doc Id 300395.1
  • You can see top sessions which are doing more block changes. More changes mean more redo information.
    --  to see which session is doing how many block changes.
    SELECT
    s.sid,
    s.serial#,
    s.username,
    s.program,
    i.block_changes
    FROM v$session s, v$sess_io i
    WHERE s.sid = i.sid
    ORDER BY 5 desc;
    

 

Update: By using “Snapper”

I found the article by Tanel Poder on the same, and his snapper script is quite useful in identifying the users generating huge amount of redo.  Quick Tip: Use the snapredo provided by him.

Related MOS Notes:

Note 188691.1  : How to Avoid Generation of Redolog Entries
Note 167492.1  : How to Find Sessions Generating Lots of Redo
Note 199298.1  : Diagnosing excessive redo generation

If you have any other idea, please share in comments. :-)


Follow

Get every new post delivered to your Inbox.