Skip to main content

RAC database restore from netbackup tape without RMAN catalog

The Oracle RAC databases were being backed up regularly without using catalog, even worse, the database controlfile autobackup was turned off.  That configuration compromise the RPO. I restored and recovered the database to the latest scn that the last controlfile backup knows, but not the latest scn that available in the archivelogs. Below are the steps:

The assumption is that all database files, controlfile are gone.

1. list available backup handles in netbackup after a particular date:

/usr/openv/netbackup/bin/bplist -l -S $netbackupserver -C $client -t 4 -R -s 11/29/2018 /
output:
...
-rw-rw---- oracle    asmadmin     19922944 Nov 29 11:55 /ctrl_dAPIMGRS_u16tje9lj_s1062_p1_t993470131
-rw-rw---- oracle    asmadmin    142082048 Nov 29 11:54 /arch_dAPIMGRS_u15tje9kd_s1061_p1_t993470093
...

From the above output, we guess the latest controlfile backup is in /ctrl_dAPIMGRS_u16tje9lj_s1062_p1_t993470131

2.  restore controlfile from this handle:

shutdown immediate
start nomount
run
{
allocate channel ch10 type 'SBT_TAPE' SEND 'NB_ORA_CLIENT=$client,NB_ORA_POLICY=$policy,NB_ORA_SERV=$netbackupserver';
restore controlfile from 'ctrl_dAPIMGRS_u16tje9lj_s1062_p1_t993470131';
}

alter database mount

3.  Find the scn that should be used for restore and recover:

RMAN> list backup of archivelog time between 'sysdate-2' and 'sysdate';


List of Backup Sets
===================


BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -----------------------------
1058    135.50M    SBT_TAPE    00:00:16     29-november -2018 11:55:09 am
        BP Key: 1058   Status: AVAILABLE  Compressed: NO  Tag: TAG20181129T115453
        Handle: arch_dAPIMGRS_u15tje9kd_s1061_p1_t993470093   Media: @aaaac

  List of Archived Logs in backup set 1058
  Thrd Seq     Low SCN    Low Time                      Next SCN   Next Time
  ---- ------- ---------- ----------------------------- ---------- ---------
  1    1855    178307993  29-november -2018 09:00:40 am 178324870  29-november -2018 11:54:52 am
  1    1856    178324870  29-november -2018 11:54:52 am 178324888  29-november -2018 11:54:53 am
  2    2803    178305976  29-november -2018 08:40:47 am 178324361  29-november -2018 11:53:11 am
  3    2759    178302578  29-november -2018 08:00:35 am 178324772  29-november -2018 11:53:46 am

find the max next_scn of the three RAC threads, then choose the smallest of the threee, which should be 178324772, but I accidentally chose 178324361 in this test, which is acceptable for test database, minor data lose, better than cannot recovering database, which error RMAN-06054 is shown at the end of this post.

Though there might be more later archivelogs backup shown in bplist in step 1, however, they are taken after this scn, so controlfile will not be able to use them. if the backup use catalog, that would be different story, because the catalog will record all the archivelogs handles and should be able to use them.

4. restore database using this scn:

run
{
allocate channel ch10 type 'SBT_TAPE' SEND 'NB_ORA_CLIENT=$client,NB_ORA_POLICY=$policy,NB_ORA_SERV=$netbackupserver';
...
restore database until scn 178324361 ;
}

5. verify the scn in datafile header: they should be all smaller than 178324361
SQL> select file#,FUZZY,CHECKPOINT_CHANGE# from V$DATAFILE_HEADER order by 3;

     FILE# FUZ CHECKPOINT_CHANGE#
---------- --- ------------------
         2 NO           178137599
         5 NO           178137622
         6 NO           178137658
         7 NO           178323950
         8 NO           178323951
         3 NO           178324160
         9 NO           178324164
        10 NO           178324197
         1 NO           178324228
         4 NO           178324271

10 rows selected.

6. recover database
run
{
allocate channel ch10 type 'SBT_TAPE' SEND 'NB_ORA_CLIENT=$client,NB_ORA_POLICY=$policy,NB_ORA_SERV=$netbackupserver';
...
recover database until scn 178324361 ;
}

7. open database
SQL> alter database open resetlogs;

8. verify database:

SQL> select scn_to_timestamp(178324361) from dual;

SCN_TO_TIMESTAMP(178324361)
---------------------------------------------------------------------------
29-NOV-18 11.53.08.000000000 AM
...

The key is to find out what archivelogs are available and known by the controlfile, choose scn number wisely, otherwise the restore database can have datafile header bigger than scn available in the known archivelogs, ends up the database cannot be recovered and open resetlogs.
the error will be:

unable to find archived log
archived log thread=1 sequence=1837
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 11/29/2018 16:08:20

RMAN-06054: media recovery requesting unknown archived log for thread 1 with sequence 1837 and starting SCN of 17xxxxxxx ( a bigger scn)

Comments

Popular posts from this blog

Opatch apply/lsinventory error: oneoff is corrupted or does not exist

I am applying the quarterly patch for 19c RDBMS, I tried using napply but failed, but somehow it corrupted the inventory though nothing applied. further apply and lsinventory command ran into error like this: $ ./OPatch/opatch lsinventory Oracle Interim Patch Installer version 12.2.0.1.21 Copyright (c) 2020, Oracle Corporation.  All rights reserved. Oracle Home       : /u02/app/oracle/19.0.0 Central Inventory : /u01/app/oraInventory    from           : /u02/app/oracle/19.0.0/oraInst.loc OPatch version    : 12.2.0.1.21 OUI version       : 12.2.0.7.0 Log file location : /u02/app/oracle/19.0.0/cfgtoollogs/opatch/opatch2020-09-08_13-35-59PM_1.log Lsinventory Output file location : /u02/app/oracle/19.0.0/cfgtoollogs/opatch/lsinv/lsinventory2020-09-08_13-35-59PM.txt -------------------------------------------------------------------------------- Inventory load failed... OPatch cannot load inventory for the given Oracle Home. LsInventorySession failed: Unable to create patchObject Possible ca

non-existent process lock port on windows server

I have a database link created between oracle and sqlserver using oracle tg4odbc, the product is installed on windows server and run as service "OracleOraGtw11g_home1TNSListener", but sometime the service cannot started, the root cause of this problem is that the port number 1521 is used by an non-existent process. The first step is to use netstat -bano|find "1521" to get the process id, in my case it's 5844, which shows the connection is from my oracle server 10.8.0.169 H:\>netstat -bano|find "1521"   TCP    0.0.0.0:1521           0.0.0.0:0              LISTENING       5844   TCP    10.14.45.33:1521       10.8.0.169:42987       ESTABLISHED     5844 however the process id does not show in either task manager or process explorer. The next step is to run tcpview, which shows non-existent under process column, there are three rows, two show status as "listening", the other one shows status "established", right click and k

shell script to clean up oracle dumpfile

https://github.com/iacosta/personal/blob/master/shells/cleanhouse.sh #!/bin/ksh # # Script used to cleanup any Oracle environment. # # Cleans:      audit_file_dest #              background_dump_dest #              core_dump_dest #              user_dump_dest #              Clusterware logs # # Rotates:     Alert Logs #              Listener Logs # # Scheduling:  00 00 * * * /networkdrive/dba/scripts/purge/cleanup.sh -d 7 > /u01/dba/bin/cleanup.log 2>&1 # # Created By:  Lei Dao # # # RM="rm -f" RMDIR="rm -rf" LS="ls -l" MV="mv" TOUCH="touch" TESTTOUCH="echo touch" TESTMV="echo mv" TESTRM=$LS TESTRMDIR=$LS SUCCESS=0 FAILURE=1 TEST=0 HOSTNAME=`hostname` ORAENV="oraenv" TODAY=`date +%Y%m%d` ORIGPATH=/usr/local/bin:$PATH ORIGLD=$LD_LIBRARY_PATH export PATH=$ORIGPATH # Usage function. f_usage(){   echo "Usage: `basename $0` -d DAYS [-a DAYS] [-b DAYS] [