Big Query performance improvement from defraging table partition segment

An nightly ETL job run longer and longer recently and the SLA is jeopardized, we analyzed the query performance and reduced the job duration 5 times. three changes are made, the major improvement came from the de-fragmentation of the table subpartition.

The first step is to find out which queries are the top consumer of CPU and elapsed time of the nightly job, I use the following query sqljournal to find out what query is responsible to the slowness.

undefine mnits
undefine begin_hh24mi
undefine end_hh24mi
set linesize 300
col sample_time format a25
col username format a12
col hh24miss format a8
col sid format 9999
col blksid format 9999
col event format a30
col object_name format a15
col current_obj format 99999999
col entry_obj format 9999999
col program format a15
col module format a15
col machine format a17
col MIN(SAMPLE_TIME) format a10
col MAX(SAMPLE_TIME) format a10
col program format a30
col inst# format 99
col session_id format 9999
col username format a10
col topsqltext format a40
col cnt format 99999
col MNITS format 999
accept duration default 1 prompt 'please input the threshold of Mnits (default 1, how many minutes the query is run?) :'
select ash.inst_id inst#,ash.session_id sid,(select username from dba_users where user_id=ash.user_id) username,
ash.sql_id,top_level_sql_id,count(*) cnt,to_char(min(sample_time),'hh24:mi') BeginTime,to_char(max(sample_time),'hh24:mi') EndTime
, extract(hour from (max(sample_time)-min(sample_time)))*60+extract(minute from (max(sample_time)-min(sample_time))) Mnits,program
,dbms_lob.substr(sql_fulltext,40,1) topsqltext
--,event
from gv$active_session_history ash,gv$sql
where to_char(sample_time,'hh24mi') between '&&begin_hh24mi' and '&&end_hh24mi' and trunc(sysdate)=trunc(sample_time)
and user_id not in (select user_id from dba_users where username in ('SYS','DBSNMP'))
and ash.top_level_sql_id=gv$sql.sql_id
and ash.inst_id=gv$sql.inst_id
group by ash.inst_id,ash.session_id,ash.sql_id,top_level_sql_id,program,user_id,dbms_lob.substr(sql_fulltext,40,1)
--,event
having extract(hour from (max(sample_time)-min(sample_time)))*60+extract(minute from (max(sample_time)-min(sample_time)))>=&&Mnits
--order by count(*)
order by BeginTime

This query identify the sql_id (individual SQL) and top_level_sql (the ETL wrapping SQL).

The second step is to analyze the query statistics, I use the sqlid_exec_history_sum.sql to see the buffer_get, cpu, iowait etc for particular query:

col start_time format a5
col end_time format a5
col exec format 9999
select sql_id,PLAN_HASH_VALUE plan, to_char(BEGIN_INTERVAL_TIME,'yyyymmdd') rundate,
sum(EXECUTIONS_DELTA) as exec,
to_char(min(BEGIN_INTERVAL_TIME),'hh24mi') start_time,to_char(max(end_intervAL_TIME),'hh24mi') end_time,
sum(BUFFER_GETS_DELTA) as buffer,
round(sum(cpu_time_delta)/1000000) "cpu(s)",round(sum(ELAPSED_TIME_DELTA)/1000000) "elp(s)",sum(disk_READs_DELTA) read, sum(iowait_delta) iowait, sum(DIRECT_WRITES_DELTA) direct
from dba_hist_sqlstat st, DBA_HIST_SNAPSHOT sn
where st.snap_id=sn.snap_id
and st.instance_number=sn.instance_number
and
BEGIN_INTERVAL_TIME > sysdate-9
and sql_id='&sql_id'
group by sql_id,PLAN_HASH_VALUE,to_char(BEGIN_INTERVAL_TIME,'yyyymmdd')
order by to_char(min(BEGIN_INTERVAL_TIME),'YYYYMMDD-hh24mi')
/

we found that the query is dealing with a lot of buffer and iowait on a partitioned table.

Then we check the query and query plan:

use sqlinfo.sql to check the query:

set long 10000

select sql_id,LOADS,FIRST_LOAD_TIME,DISK_READS,BUFFER_GETS,EXECUTIONS,

round(buffer_gets/decode(executions,0,1,executions)) "gets/exec"

from gv$sql

where sql_id='&&sql_id'

;

select

SQL_FULLTEXT

from gv$sql

where sql_id='&&sql_id'

;

undefine sql_id

we found that that the problematic query has a hint to append, which caused the fragmentation.

INSERT /*+ APPEND */ INTO table...

use planh.sql to check the query plan:

col action for a75

col timestamp for a20

SET VERIFY OFF

select DISTINCT TIMESTAMP,sql_id,PLAN_HASH_VALUE from dba_hist_sql_plan where sql_ID='&&sql_id' order by timestamp;

select * from table(dbms_xplan.DISPLAY_AWR('&&sql_id'));

undefine sql_id

To check the segment fragmentation, I use this query to compare expected block number and real block number, the real block number is 10 times bigger than expected block number:

exec dbms_stats.gather_table_stats(ownname=>'table_owner',TABNAME=>'table_name',granularity=>'SUBPARTITION');

select subpartition_name, num_rows,avg_row_len,round((num_rows * avg_row_len) /(8*1024)) expected_blocks, blocks from dba_tab_subpartitions where table_owner='WOSTAGE' and table_name='Table_name'...

select partition_name,(bytes),(blocks) from dba_segments where

segment_name='table' and partition_name in (select SUBPARTITION_NAME from dba_tab_subpartitions where

table_owner='owner' and table_name='table')

--group by partition_name

order by 1

To defrag the partition segment:

select subPARTITION_NAME from dba_tab_subpartitions where table_name='tablename';

alter table tablename move subpartition &subpartition parallel (degree 2);

After that, check everything again, the performance is improved about 5 times:

I use the query below to find out which table is fragmented and need to be re-orged:

select owner,table_name,round((num_rows * avg_row_len) /(8*1024)) expected_blocks, blocks,
blocks/((num_rows * avg_row_len) /(8*1024)) ratio from dba_tables
where blocks > 10240 and num_rows * avg_row_len >0
and blocks/((num_rows * avg_row_len) /(8*1024)) > 1.5
order by blocks/((num_rows * avg_row_len) /(8*1024))

DBA

Search This Blog

Big Query performance improvement from defraging table partition segment

Comments

Post a Comment

Popular posts from this blog

Opatch apply/lsinventory error: oneoff is corrupted or does not exist

oracle dba_hist_sysmetric_summary

non-existent process lock port on windows server