An nightly ETL job run longer and longer recently and the SLA is jeopardized, we analyzed the query performance and reduced the job duration 5 times. three changes are made, the major improvement came from the de-fragmentation of the table subpartition.
The first step is to find out which queries are the top consumer of CPU and elapsed time of the nightly job, I use the following query sqljournal to find out what query is responsible to the slowness.
undefine mnits
undefine begin_hh24mi
undefine end_hh24mi
set linesize 300
col sample_time format a25
col username format a12
col hh24miss format a8
col sid format 9999
col blksid format 9999
col event format a30
col object_name format a15
col current_obj format 99999999
col entry_obj format 9999999
col program format a15
col module format a15
col machine format a17
col MIN(SAMPLE_TIME) format a10
col MAX(SAMPLE_TIME) format a10
col program format a30
col inst# format 99
col session_id format 9999
col username format a10
col topsqltext format a40
col cnt format 99999
col MNITS format 999
accept duration default 1 prompt 'please input the threshold of Mnits (default 1, how many minutes the query is run?) :'
select ash.inst_id inst#,ash.session_id sid,(select username from dba_users where user_id=ash.user_id) username,
ash.sql_id,top_level_sql_id,count(*) cnt,to_char(min(sample_time),'hh24:mi') BeginTime,to_char(max(sample_time),'hh24:mi') EndTime
, extract(hour from (max(sample_time)-min(sample_time)))*60+extract(minute from (max(sample_time)-min(sample_time))) Mnits,program
,dbms_lob.substr(sql_fulltext,40,1) topsqltext
--,event
from gv$active_session_history ash,gv$sql
where to_char(sample_time,'hh24mi') between '&&begin_hh24mi' and '&&end_hh24mi' and trunc(sysdate)=trunc(sample_time)
and user_id not in (select user_id from dba_users where username in ('SYS','DBSNMP'))
and ash.top_level_sql_id=gv$sql.sql_id
and ash.inst_id=gv$sql.inst_id
group by ash.inst_id,ash.session_id,ash.sql_id,top_level_sql_id,program,user_id,dbms_lob.substr(sql_fulltext,40,1)
--,event
having extract(hour from (max(sample_time)-min(sample_time)))*60+extract(minute from (max(sample_time)-min(sample_time)))>=&&Mnits
--order by count(*)
order by BeginTime
This query identify the sql_id (individual SQL) and top_level_sql (the ETL wrapping SQL).
The second step is to analyze the query statistics, I use the sqlid_exec_history_sum.sql to see the buffer_get, cpu, iowait etc for particular query:
col start_time format a5
col end_time format a5
col exec format 9999
select sql_id,PLAN_HASH_VALUE plan, to_char(BEGIN_INTERVAL_TIME,'yyyymmdd') rundate,
sum(EXECUTIONS_DELTA) as exec,
to_char(min(BEGIN_INTERVAL_TIME),'hh24mi') start_time,to_char(max(end_intervAL_TIME),'hh24mi') end_time,
sum(BUFFER_GETS_DELTA) as buffer,
round(sum(cpu_time_delta)/1000000) "cpu(s)",round(sum(ELAPSED_TIME_DELTA)/1000000) "elp(s)",sum(disk_READs_DELTA) read, sum(iowait_delta) iowait, sum(DIRECT_WRITES_DELTA) direct
from dba_hist_sqlstat st, DBA_HIST_SNAPSHOT sn
where st.snap_id=sn.snap_id
and st.instance_number=sn.instance_number
and
BEGIN_INTERVAL_TIME > sysdate-9
and sql_id='&sql_id'
group by sql_id,PLAN_HASH_VALUE,to_char(BEGIN_INTERVAL_TIME,'yyyymmdd')
order by to_char(min(BEGIN_INTERVAL_TIME),'YYYYMMDD-hh24mi')
/
I use the query below to find out which table is fragmented and need to be re-orged:
select owner,table_name,round((num_rows * avg_row_len) /(8*1024)) expected_blocks, blocks,
blocks/((num_rows * avg_row_len) /(8*1024)) ratio from dba_tables
where blocks > 10240 and num_rows * avg_row_len >0
and blocks/((num_rows * avg_row_len) /(8*1024)) > 1.5
order by blocks/((num_rows * avg_row_len) /(8*1024))
The first step is to find out which queries are the top consumer of CPU and elapsed time of the nightly job, I use the following query sqljournal to find out what query is responsible to the slowness.
undefine mnits
undefine begin_hh24mi
undefine end_hh24mi
set linesize 300
col sample_time format a25
col username format a12
col hh24miss format a8
col sid format 9999
col blksid format 9999
col event format a30
col object_name format a15
col current_obj format 99999999
col entry_obj format 9999999
col program format a15
col module format a15
col machine format a17
col MIN(SAMPLE_TIME) format a10
col MAX(SAMPLE_TIME) format a10
col program format a30
col inst# format 99
col session_id format 9999
col username format a10
col topsqltext format a40
col cnt format 99999
col MNITS format 999
accept duration default 1 prompt 'please input the threshold of Mnits (default 1, how many minutes the query is run?) :'
select ash.inst_id inst#,ash.session_id sid,(select username from dba_users where user_id=ash.user_id) username,
ash.sql_id,top_level_sql_id,count(*) cnt,to_char(min(sample_time),'hh24:mi') BeginTime,to_char(max(sample_time),'hh24:mi') EndTime
, extract(hour from (max(sample_time)-min(sample_time)))*60+extract(minute from (max(sample_time)-min(sample_time))) Mnits,program
,dbms_lob.substr(sql_fulltext,40,1) topsqltext
--,event
from gv$active_session_history ash,gv$sql
where to_char(sample_time,'hh24mi') between '&&begin_hh24mi' and '&&end_hh24mi' and trunc(sysdate)=trunc(sample_time)
and user_id not in (select user_id from dba_users where username in ('SYS','DBSNMP'))
and ash.top_level_sql_id=gv$sql.sql_id
and ash.inst_id=gv$sql.inst_id
group by ash.inst_id,ash.session_id,ash.sql_id,top_level_sql_id,program,user_id,dbms_lob.substr(sql_fulltext,40,1)
--,event
having extract(hour from (max(sample_time)-min(sample_time)))*60+extract(minute from (max(sample_time)-min(sample_time)))>=&&Mnits
--order by count(*)
order by BeginTime
This query identify the sql_id (individual SQL) and top_level_sql (the ETL wrapping SQL).
The second step is to analyze the query statistics, I use the sqlid_exec_history_sum.sql to see the buffer_get, cpu, iowait etc for particular query:
col start_time format a5
col end_time format a5
col exec format 9999
select sql_id,PLAN_HASH_VALUE plan, to_char(BEGIN_INTERVAL_TIME,'yyyymmdd') rundate,
sum(EXECUTIONS_DELTA) as exec,
to_char(min(BEGIN_INTERVAL_TIME),'hh24mi') start_time,to_char(max(end_intervAL_TIME),'hh24mi') end_time,
sum(BUFFER_GETS_DELTA) as buffer,
round(sum(cpu_time_delta)/1000000) "cpu(s)",round(sum(ELAPSED_TIME_DELTA)/1000000) "elp(s)",sum(disk_READs_DELTA) read, sum(iowait_delta) iowait, sum(DIRECT_WRITES_DELTA) direct
from dba_hist_sqlstat st, DBA_HIST_SNAPSHOT sn
where st.snap_id=sn.snap_id
and st.instance_number=sn.instance_number
and
BEGIN_INTERVAL_TIME > sysdate-9
and sql_id='&sql_id'
group by sql_id,PLAN_HASH_VALUE,to_char(BEGIN_INTERVAL_TIME,'yyyymmdd')
order by to_char(min(BEGIN_INTERVAL_TIME),'YYYYMMDD-hh24mi')
/
we found that the query is dealing with a lot of buffer and iowait on a partitioned table.
Then we check the query and query plan:
use sqlinfo.sql to check the query:
set long 10000
select sql_id,LOADS,FIRST_LOAD_TIME,DISK_READS,BUFFER_GETS,EXECUTIONS,
round(buffer_gets/decode(executions,0,1,executions)) "gets/exec"
from gv$sql
where sql_id='&&sql_id'
;
select
SQL_FULLTEXT
from gv$sql
where sql_id='&&sql_id'
;
undefine sql_id
we found that that the problematic query has a hint to append, which caused the fragmentation.
INSERT /*+ APPEND */ INTO table...
use planh.sql to check the query plan:
col action for a75
col timestamp for a20
SET VERIFY OFF
select DISTINCT TIMESTAMP,sql_id,PLAN_HASH_VALUE from dba_hist_sql_plan where sql_ID='&&sql_id' order by timestamp;
select * from table(dbms_xplan.DISPLAY_AWR('&&sql_id'));
undefine sql_id
To check the segment fragmentation, I use this query to compare expected block number and real block number, the real block number is 10 times bigger than expected block number:
exec dbms_stats.gather_table_stats(ownname=>'table_owner',TABNAME=>'table_name',granularity=>'SUBPARTITION');
select
subpartition_name, num_rows,avg_row_len,round((num_rows * avg_row_len)
/(8*1024)) expected_blocks, blocks from dba_tab_subpartitions
where table_owner='WOSTAGE' and table_name='Table_name'...
select partition_name,(bytes),(blocks) from dba_segments where
segment_name='table' and partition_name in (select SUBPARTITION_NAME from dba_tab_subpartitions where
table_owner='owner' and table_name='table')
--group by partition_name
order by 1
To defrag the partition segment:
select subPARTITION_NAME from dba_tab_subpartitions where table_name='tablename';
alter table tablename move subpartition &subpartition parallel (degree 2);
After that, check everything again, the performance is improved about 5 times:
I use the query below to find out which table is fragmented and need to be re-orged:
select owner,table_name,round((num_rows * avg_row_len) /(8*1024)) expected_blocks, blocks,
blocks/((num_rows * avg_row_len) /(8*1024)) ratio from dba_tables
where blocks > 10240 and num_rows * avg_row_len >0
and blocks/((num_rows * avg_row_len) /(8*1024)) > 1.5
order by blocks/((num_rows * avg_row_len) /(8*1024))
Comments
Post a Comment