Skip to main content

Big Query performance improvement from defraging table partition segment

An nightly ETL job run longer and longer recently and the SLA is jeopardized, we analyzed the query performance and reduced the job duration 5 times. three changes are made, the major improvement came from the de-fragmentation of the table subpartition.

The first step is to find out which queries are the top consumer of CPU and elapsed time of the nightly job, I use the following query sqljournal to find out what query is responsible to the slowness.

 undefine mnits
  undefine begin_hh24mi
   undefine end_hh24mi
 set linesize 300
 col sample_time format a25
 col username format a12
 col hh24miss format a8
 col sid format 9999
 col blksid format 9999
 col event format a30
 col object_name format a15
 col current_obj format 99999999
 col entry_obj format 9999999
 col program format a15
 col module format a15
 col machine format a17
 col MIN(SAMPLE_TIME) format a10
 col MAX(SAMPLE_TIME) format a10
 col program format a30
 col inst# format 99
 col session_id format 9999
 col username format a10
 col topsqltext format a40
 col cnt format 99999
 col MNITS format 999
 accept duration default 1 prompt 'please input the threshold of Mnits (default 1, how many minutes the query is run?) :'
 select ash.inst_id inst#,ash.session_id sid,(select username from dba_users where user_id=ash.user_id) username,
 ash.sql_id,top_level_sql_id,count(*) cnt,to_char(min(sample_time),'hh24:mi') BeginTime,to_char(max(sample_time),'hh24:mi') EndTime
 , extract(hour from (max(sample_time)-min(sample_time)))*60+extract(minute from (max(sample_time)-min(sample_time))) Mnits,program
 ,dbms_lob.substr(sql_fulltext,40,1) topsqltext
 --,event
 from gv$active_session_history ash,gv$sql
 where to_char(sample_time,'hh24mi') between '&&begin_hh24mi' and '&&end_hh24mi' and trunc(sysdate)=trunc(sample_time)
 and user_id not in (select user_id from dba_users where username in ('SYS','DBSNMP'))
 and ash.top_level_sql_id=gv$sql.sql_id
 and ash.inst_id=gv$sql.inst_id
 group by ash.inst_id,ash.session_id,ash.sql_id,top_level_sql_id,program,user_id,dbms_lob.substr(sql_fulltext,40,1)
 --,event
 having extract(hour from (max(sample_time)-min(sample_time)))*60+extract(minute from (max(sample_time)-min(sample_time)))>=&&Mnits
 --order by count(*)
 order by BeginTime

This query identify the sql_id (individual SQL) and top_level_sql (the ETL wrapping SQL).


The second step is to analyze the query statistics, I use the sqlid_exec_history_sum.sql to see the buffer_get, cpu, iowait etc for particular query:

col start_time format a5
col end_time format a5
col exec format 9999
select sql_id,PLAN_HASH_VALUE plan, to_char(BEGIN_INTERVAL_TIME,'yyyymmdd') rundate,
sum(EXECUTIONS_DELTA) as exec,
to_char(min(BEGIN_INTERVAL_TIME),'hh24mi') start_time,to_char(max(end_intervAL_TIME),'hh24mi') end_time,
sum(BUFFER_GETS_DELTA) as buffer,
round(sum(cpu_time_delta)/1000000) "cpu(s)",round(sum(ELAPSED_TIME_DELTA)/1000000) "elp(s)",sum(disk_READs_DELTA) read, sum(iowait_delta) iowait, sum(DIRECT_WRITES_DELTA) direct
from dba_hist_sqlstat st, DBA_HIST_SNAPSHOT sn
where st.snap_id=sn.snap_id
and st.instance_number=sn.instance_number
and
BEGIN_INTERVAL_TIME > sysdate-9
and sql_id='&sql_id'
group by sql_id,PLAN_HASH_VALUE,to_char(BEGIN_INTERVAL_TIME,'yyyymmdd')
order by to_char(min(BEGIN_INTERVAL_TIME),'YYYYMMDD-hh24mi')
/


we found that the query is dealing with a lot of buffer and iowait on a partitioned table.


Then we check the query and query plan:

use sqlinfo.sql to check the query:

set long 10000
select sql_id,LOADS,FIRST_LOAD_TIME,DISK_READS,BUFFER_GETS,EXECUTIONS,
round(buffer_gets/decode(executions,0,1,executions)) "gets/exec"
from gv$sql
where sql_id='&&sql_id'
;
select
SQL_FULLTEXT
from gv$sql
where sql_id='&&sql_id'
;
undefine sql_id

we found that that the problematic query has a hint to append, which caused the fragmentation.

INSERT /*+ APPEND */ INTO table...

use planh.sql to check the query plan:

col action for a75 
col timestamp for a20 
SET VERIFY OFF
select DISTINCT TIMESTAMP,sql_id,PLAN_HASH_VALUE from dba_hist_sql_plan where sql_ID='&&sql_id' order by timestamp;
select * from table(dbms_xplan.DISPLAY_AWR('&&sql_id'));
undefine sql_id

To check the segment fragmentation, I use this query to compare expected block number and real block number, the real block number is 10 times bigger than expected block number:

exec dbms_stats.gather_table_stats(ownname=>'table_owner',TABNAME=>'table_name',granularity=>'SUBPARTITION');

select subpartition_name, num_rows,avg_row_len,round((num_rows * avg_row_len) /(8*1024)) expected_blocks, blocks   from dba_tab_subpartitions where table_owner='WOSTAGE' and table_name='Table_name'...

select partition_name,(bytes),(blocks) from dba_segments where
segment_name='table' and partition_name in (select SUBPARTITION_NAME from dba_tab_subpartitions where
table_owner='owner' and table_name='table')
--group by partition_name
order by 1

To defrag the partition segment:

select subPARTITION_NAME from dba_tab_subpartitions  where table_name='tablename';

alter table tablename move subpartition &subpartition parallel (degree 2);

After that, check everything again, the performance is improved about 5 times:



I use the query below to find out which table is fragmented and need to be re-orged:

select owner,table_name,round((num_rows * avg_row_len) /(8*1024)) expected_blocks, blocks,
blocks/((num_rows * avg_row_len) /(8*1024)) ratio  from dba_tables
where blocks > 10240 and num_rows * avg_row_len >0
and blocks/((num_rows * avg_row_len) /(8*1024)) > 1.5
order by blocks/((num_rows * avg_row_len) /(8*1024))


Comments

Popular posts from this blog

Opatch apply/lsinventory error: oneoff is corrupted or does not exist

I am applying the quarterly patch for 19c RDBMS, I tried using napply but failed, but somehow it corrupted the inventory though nothing applied. further apply and lsinventory command ran into error like this: $ ./OPatch/opatch lsinventory Oracle Interim Patch Installer version 12.2.0.1.21 Copyright (c) 2020, Oracle Corporation.  All rights reserved. Oracle Home       : /u02/app/oracle/19.0.0 Central Inventory : /u01/app/oraInventory    from           : /u02/app/oracle/19.0.0/oraInst.loc OPatch version    : 12.2.0.1.21 OUI version       : 12.2.0.7.0 Log file location : /u02/app/oracle/19.0.0/cfgtoollogs/opatch/opatch2020-09-08_13-35-59PM_1.log Lsinventory Output file location : /u02/app/oracle/19.0.0/cfgtoollogs/opatch/lsinv/lsinventory2020-09-08_13-35-59PM.txt -------------------------------------------------------------------------------- Inventory load failed... OPatch cannot load inventory ...

oracle dba_hist_sysmetric_summary

found this blog is helpful to get CPU and IO statistics on oracle database. http://shob-dbadmin.blogspot.ca/2012/12/how-to-find-total-io-of-database.html courtesy to  Shomil Bansal , below are hist writing, not mine. How to find total IO of the database instance Total IO of database instance is sum of the physical reads, physical writes and redo writes. There are several views to find these values. v$sysmetric  - Reports metric values for only the most current time sample 60 secs. v$sysmetric_summary  - Reports metric values for time sample of 1 hour. v$sysmetric_history  - Reports metric values every 60 sec from the time instance is up. Better way to analyse IO using this view to take deltas between two time periods. dba_hist_sysmetric_history  - All the above views are refreshed when the instance is restarted. This view, part of AWR, stores the historical stats. I have used this view for my report. Query: ====== set lines 350...

ORA_RMAN_SGA_TARGET

assume that we lost all the files of oracle database but we do have rman backup, when trying to bring up a dummy database before restore start, I get this error. RMAN> startup nomount force; WARNING: cannot translate ORA_RMAN_SGA_TARGET value startup failed: ORA-01078: failure in processing system parameters ORA-01565: error in identifying file '+DATA/PROD/spfilePROD.ora' ORA-17503: ksfdopn:2 Failed to open file +DATA/PROD/spfilePROD.ora ORA-15056: additional error message ORA-17503: ksfdopn:DGOpenFile05 Failed to open file +DATA/prod/spfileprod.ora ORA-17503: ksfdopn:2 Failed to open file +DATA/prod/spfileprod.ora ORA-15173: entry 'spfileprod.ora' does not exist in directory 'prod' ORA-06512: at line 4 starting Oracle instance without parameter file for retrival of spfile RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =================================...