Skip to main content

Big Query performance improvement from defraging table partition segment

An nightly ETL job run longer and longer recently and the SLA is jeopardized, we analyzed the query performance and reduced the job duration 5 times. three changes are made, the major improvement came from the de-fragmentation of the table subpartition.

The first step is to find out which queries are the top consumer of CPU and elapsed time of the nightly job, I use the following query sqljournal to find out what query is responsible to the slowness.

 undefine mnits
  undefine begin_hh24mi
   undefine end_hh24mi
 set linesize 300
 col sample_time format a25
 col username format a12
 col hh24miss format a8
 col sid format 9999
 col blksid format 9999
 col event format a30
 col object_name format a15
 col current_obj format 99999999
 col entry_obj format 9999999
 col program format a15
 col module format a15
 col machine format a17
 col MIN(SAMPLE_TIME) format a10
 col MAX(SAMPLE_TIME) format a10
 col program format a30
 col inst# format 99
 col session_id format 9999
 col username format a10
 col topsqltext format a40
 col cnt format 99999
 col MNITS format 999
 accept duration default 1 prompt 'please input the threshold of Mnits (default 1, how many minutes the query is run?) :'
 select ash.inst_id inst#,ash.session_id sid,(select username from dba_users where user_id=ash.user_id) username,
 ash.sql_id,top_level_sql_id,count(*) cnt,to_char(min(sample_time),'hh24:mi') BeginTime,to_char(max(sample_time),'hh24:mi') EndTime
 , extract(hour from (max(sample_time)-min(sample_time)))*60+extract(minute from (max(sample_time)-min(sample_time))) Mnits,program
 ,dbms_lob.substr(sql_fulltext,40,1) topsqltext
 --,event
 from gv$active_session_history ash,gv$sql
 where to_char(sample_time,'hh24mi') between '&&begin_hh24mi' and '&&end_hh24mi' and trunc(sysdate)=trunc(sample_time)
 and user_id not in (select user_id from dba_users where username in ('SYS','DBSNMP'))
 and ash.top_level_sql_id=gv$sql.sql_id
 and ash.inst_id=gv$sql.inst_id
 group by ash.inst_id,ash.session_id,ash.sql_id,top_level_sql_id,program,user_id,dbms_lob.substr(sql_fulltext,40,1)
 --,event
 having extract(hour from (max(sample_time)-min(sample_time)))*60+extract(minute from (max(sample_time)-min(sample_time)))>=&&Mnits
 --order by count(*)
 order by BeginTime

This query identify the sql_id (individual SQL) and top_level_sql (the ETL wrapping SQL).


The second step is to analyze the query statistics, I use the sqlid_exec_history_sum.sql to see the buffer_get, cpu, iowait etc for particular query:

col start_time format a5
col end_time format a5
col exec format 9999
select sql_id,PLAN_HASH_VALUE plan, to_char(BEGIN_INTERVAL_TIME,'yyyymmdd') rundate,
sum(EXECUTIONS_DELTA) as exec,
to_char(min(BEGIN_INTERVAL_TIME),'hh24mi') start_time,to_char(max(end_intervAL_TIME),'hh24mi') end_time,
sum(BUFFER_GETS_DELTA) as buffer,
round(sum(cpu_time_delta)/1000000) "cpu(s)",round(sum(ELAPSED_TIME_DELTA)/1000000) "elp(s)",sum(disk_READs_DELTA) read, sum(iowait_delta) iowait, sum(DIRECT_WRITES_DELTA) direct
from dba_hist_sqlstat st, DBA_HIST_SNAPSHOT sn
where st.snap_id=sn.snap_id
and st.instance_number=sn.instance_number
and
BEGIN_INTERVAL_TIME > sysdate-9
and sql_id='&sql_id'
group by sql_id,PLAN_HASH_VALUE,to_char(BEGIN_INTERVAL_TIME,'yyyymmdd')
order by to_char(min(BEGIN_INTERVAL_TIME),'YYYYMMDD-hh24mi')
/


we found that the query is dealing with a lot of buffer and iowait on a partitioned table.


Then we check the query and query plan:

use sqlinfo.sql to check the query:

set long 10000
select sql_id,LOADS,FIRST_LOAD_TIME,DISK_READS,BUFFER_GETS,EXECUTIONS,
round(buffer_gets/decode(executions,0,1,executions)) "gets/exec"
from gv$sql
where sql_id='&&sql_id'
;
select
SQL_FULLTEXT
from gv$sql
where sql_id='&&sql_id'
;
undefine sql_id

we found that that the problematic query has a hint to append, which caused the fragmentation.

INSERT /*+ APPEND */ INTO table...

use planh.sql to check the query plan:

col action for a75 
col timestamp for a20 
SET VERIFY OFF
select DISTINCT TIMESTAMP,sql_id,PLAN_HASH_VALUE from dba_hist_sql_plan where sql_ID='&&sql_id' order by timestamp;
select * from table(dbms_xplan.DISPLAY_AWR('&&sql_id'));
undefine sql_id

To check the segment fragmentation, I use this query to compare expected block number and real block number, the real block number is 10 times bigger than expected block number:

exec dbms_stats.gather_table_stats(ownname=>'table_owner',TABNAME=>'table_name',granularity=>'SUBPARTITION');

select subpartition_name, num_rows,avg_row_len,round((num_rows * avg_row_len) /(8*1024)) expected_blocks, blocks   from dba_tab_subpartitions where table_owner='WOSTAGE' and table_name='Table_name'...

select partition_name,(bytes),(blocks) from dba_segments where
segment_name='table' and partition_name in (select SUBPARTITION_NAME from dba_tab_subpartitions where
table_owner='owner' and table_name='table')
--group by partition_name
order by 1

To defrag the partition segment:

select subPARTITION_NAME from dba_tab_subpartitions  where table_name='tablename';

alter table tablename move subpartition &subpartition parallel (degree 2);

After that, check everything again, the performance is improved about 5 times:



I use the query below to find out which table is fragmented and need to be re-orged:

select owner,table_name,round((num_rows * avg_row_len) /(8*1024)) expected_blocks, blocks,
blocks/((num_rows * avg_row_len) /(8*1024)) ratio  from dba_tables
where blocks > 10240 and num_rows * avg_row_len >0
and blocks/((num_rows * avg_row_len) /(8*1024)) > 1.5
order by blocks/((num_rows * avg_row_len) /(8*1024))


Comments

Popular posts from this blog

non-existent process lock port on windows server

I have a database link created between oracle and sqlserver using oracle tg4odbc, the product is installed on windows server and run as service "OracleOraGtw11g_home1TNSListener", but sometime the service cannot started, the root cause of this problem is that the port number 1521 is used by an non-existent process. The first step is to use netstat -bano|find "1521" to get the process id, in my case it's 5844, which shows the connection is from my oracle server 10.8.0.169 H:\>netstat -bano|find "1521"   TCP    0.0.0.0:1521           0.0.0.0:0              LISTENING       5844   TCP    10.14.45.33:1521       10.8.0.169:42987       ESTABLISHED     5844 however the process id does not show in either task manager or process explorer. The next step is to run tcpview, which shows non-existent under process column, there are three rows, two show status as "listening", the other one shows status "established", right click and k

Opatch apply/lsinventory error: oneoff is corrupted or does not exist

I am applying the quarterly patch for 19c RDBMS, I tried using napply but failed, but somehow it corrupted the inventory though nothing applied. further apply and lsinventory command ran into error like this: $ ./OPatch/opatch lsinventory Oracle Interim Patch Installer version 12.2.0.1.21 Copyright (c) 2020, Oracle Corporation.  All rights reserved. Oracle Home       : /u02/app/oracle/19.0.0 Central Inventory : /u01/app/oraInventory    from           : /u02/app/oracle/19.0.0/oraInst.loc OPatch version    : 12.2.0.1.21 OUI version       : 12.2.0.7.0 Log file location : /u02/app/oracle/19.0.0/cfgtoollogs/opatch/opatch2020-09-08_13-35-59PM_1.log Lsinventory Output file location : /u02/app/oracle/19.0.0/cfgtoollogs/opatch/lsinv/lsinventory2020-09-08_13-35-59PM.txt -------------------------------------------------------------------------------- Inventory load failed... OPatch cannot load inventory for the given Oracle Home. LsInventorySession failed: Unable to create patchObject Possible ca

shell script to clean up oracle dumpfile

https://github.com/iacosta/personal/blob/master/shells/cleanhouse.sh #!/bin/ksh # # Script used to cleanup any Oracle environment. # # Cleans:      audit_file_dest #              background_dump_dest #              core_dump_dest #              user_dump_dest #              Clusterware logs # # Rotates:     Alert Logs #              Listener Logs # # Scheduling:  00 00 * * * /networkdrive/dba/scripts/purge/cleanup.sh -d 7 > /u01/dba/bin/cleanup.log 2>&1 # # Created By:  Lei Dao # # # RM="rm -f" RMDIR="rm -rf" LS="ls -l" MV="mv" TOUCH="touch" TESTTOUCH="echo touch" TESTMV="echo mv" TESTRM=$LS TESTRMDIR=$LS SUCCESS=0 FAILURE=1 TEST=0 HOSTNAME=`hostname` ORAENV="oraenv" TODAY=`date +%Y%m%d` ORIGPATH=/usr/local/bin:$PATH ORIGLD=$LD_LIBRARY_PATH export PATH=$ORIGPATH # Usage function. f_usage(){   echo "Usage: `basename $0` -d DAYS [-a DAYS] [-b DAYS] [