Skip to main content

RAC database restore from netbackup tape without RMAN catalog

The Oracle RAC databases were being backed up regularly without using catalog, even worse, the database controlfile autobackup was turned off.  That configuration compromise the RPO. I restored and recovered the database to the latest scn that the last controlfile backup knows, but not the latest scn that available in the archivelogs. Below are the steps:

The assumption is that all database files, controlfile are gone.

1. list available backup handles in netbackup after a particular date:

/usr/openv/netbackup/bin/bplist -l -S $netbackupserver -C $client -t 4 -R -s 11/29/2018 /
output:
...
-rw-rw---- oracle    asmadmin     19922944 Nov 29 11:55 /ctrl_dAPIMGRS_u16tje9lj_s1062_p1_t993470131
-rw-rw---- oracle    asmadmin    142082048 Nov 29 11:54 /arch_dAPIMGRS_u15tje9kd_s1061_p1_t993470093
...

From the above output, we guess the latest controlfile backup is in /ctrl_dAPIMGRS_u16tje9lj_s1062_p1_t993470131

2.  restore controlfile from this handle:

shutdown immediate
start nomount
run
{
allocate channel ch10 type 'SBT_TAPE' SEND 'NB_ORA_CLIENT=$client,NB_ORA_POLICY=$policy,NB_ORA_SERV=$netbackupserver';
restore controlfile from 'ctrl_dAPIMGRS_u16tje9lj_s1062_p1_t993470131';
}

alter database mount

3.  Find the scn that should be used for restore and recover:

RMAN> list backup of archivelog time between 'sysdate-2' and 'sysdate';


List of Backup Sets
===================


BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -----------------------------
1058    135.50M    SBT_TAPE    00:00:16     29-november -2018 11:55:09 am
        BP Key: 1058   Status: AVAILABLE  Compressed: NO  Tag: TAG20181129T115453
        Handle: arch_dAPIMGRS_u15tje9kd_s1061_p1_t993470093   Media: @aaaac

  List of Archived Logs in backup set 1058
  Thrd Seq     Low SCN    Low Time                      Next SCN   Next Time
  ---- ------- ---------- ----------------------------- ---------- ---------
  1    1855    178307993  29-november -2018 09:00:40 am 178324870  29-november -2018 11:54:52 am
  1    1856    178324870  29-november -2018 11:54:52 am 178324888  29-november -2018 11:54:53 am
  2    2803    178305976  29-november -2018 08:40:47 am 178324361  29-november -2018 11:53:11 am
  3    2759    178302578  29-november -2018 08:00:35 am 178324772  29-november -2018 11:53:46 am

find the max next_scn of the three RAC threads, then choose the smallest of the threee, which should be 178324772, but I accidentally chose 178324361 in this test, which is acceptable for test database, minor data lose, better than cannot recovering database, which error RMAN-06054 is shown at the end of this post.

Though there might be more later archivelogs backup shown in bplist in step 1, however, they are taken after this scn, so controlfile will not be able to use them. if the backup use catalog, that would be different story, because the catalog will record all the archivelogs handles and should be able to use them.

4. restore database using this scn:

run
{
allocate channel ch10 type 'SBT_TAPE' SEND 'NB_ORA_CLIENT=$client,NB_ORA_POLICY=$policy,NB_ORA_SERV=$netbackupserver';
...
restore database until scn 178324361 ;
}

5. verify the scn in datafile header: they should be all smaller than 178324361
SQL> select file#,FUZZY,CHECKPOINT_CHANGE# from V$DATAFILE_HEADER order by 3;

     FILE# FUZ CHECKPOINT_CHANGE#
---------- --- ------------------
         2 NO           178137599
         5 NO           178137622
         6 NO           178137658
         7 NO           178323950
         8 NO           178323951
         3 NO           178324160
         9 NO           178324164
        10 NO           178324197
         1 NO           178324228
         4 NO           178324271

10 rows selected.

6. recover database
run
{
allocate channel ch10 type 'SBT_TAPE' SEND 'NB_ORA_CLIENT=$client,NB_ORA_POLICY=$policy,NB_ORA_SERV=$netbackupserver';
...
recover database until scn 178324361 ;
}

7. open database
SQL> alter database open resetlogs;

8. verify database:

SQL> select scn_to_timestamp(178324361) from dual;

SCN_TO_TIMESTAMP(178324361)
---------------------------------------------------------------------------
29-NOV-18 11.53.08.000000000 AM
...

The key is to find out what archivelogs are available and known by the controlfile, choose scn number wisely, otherwise the restore database can have datafile header bigger than scn available in the known archivelogs, ends up the database cannot be recovered and open resetlogs.
the error will be:

unable to find archived log
archived log thread=1 sequence=1837
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 11/29/2018 16:08:20

RMAN-06054: media recovery requesting unknown archived log for thread 1 with sequence 1837 and starting SCN of 17xxxxxxx ( a bigger scn)

Comments

Popular posts from this blog

Opatch apply/lsinventory error: oneoff is corrupted or does not exist

I am applying the quarterly patch for 19c RDBMS, I tried using napply but failed, but somehow it corrupted the inventory though nothing applied. further apply and lsinventory command ran into error like this: $ ./OPatch/opatch lsinventory Oracle Interim Patch Installer version 12.2.0.1.21 Copyright (c) 2020, Oracle Corporation.  All rights reserved. Oracle Home       : /u02/app/oracle/19.0.0 Central Inventory : /u01/app/oraInventory    from           : /u02/app/oracle/19.0.0/oraInst.loc OPatch version    : 12.2.0.1.21 OUI version       : 12.2.0.7.0 Log file location : /u02/app/oracle/19.0.0/cfgtoollogs/opatch/opatch2020-09-08_13-35-59PM_1.log Lsinventory Output file location : /u02/app/oracle/19.0.0/cfgtoollogs/opatch/lsinv/lsinventory2020-09-08_13-35-59PM.txt -------------------------------------------------------------------------------- Inventory load failed... OPatch cannot load inventory ...

oracle dba_hist_sysmetric_summary

found this blog is helpful to get CPU and IO statistics on oracle database. http://shob-dbadmin.blogspot.ca/2012/12/how-to-find-total-io-of-database.html courtesy to  Shomil Bansal , below are hist writing, not mine. How to find total IO of the database instance Total IO of database instance is sum of the physical reads, physical writes and redo writes. There are several views to find these values. v$sysmetric  - Reports metric values for only the most current time sample 60 secs. v$sysmetric_summary  - Reports metric values for time sample of 1 hour. v$sysmetric_history  - Reports metric values every 60 sec from the time instance is up. Better way to analyse IO using this view to take deltas between two time periods. dba_hist_sysmetric_history  - All the above views are refreshed when the instance is restarted. This view, part of AWR, stores the historical stats. I have used this view for my report. Query: ====== set lines 350...

ORA_RMAN_SGA_TARGET

assume that we lost all the files of oracle database but we do have rman backup, when trying to bring up a dummy database before restore start, I get this error. RMAN> startup nomount force; WARNING: cannot translate ORA_RMAN_SGA_TARGET value startup failed: ORA-01078: failure in processing system parameters ORA-01565: error in identifying file '+DATA/PROD/spfilePROD.ora' ORA-17503: ksfdopn:2 Failed to open file +DATA/PROD/spfilePROD.ora ORA-15056: additional error message ORA-17503: ksfdopn:DGOpenFile05 Failed to open file +DATA/prod/spfileprod.ora ORA-17503: ksfdopn:2 Failed to open file +DATA/prod/spfileprod.ora ORA-15173: entry 'spfileprod.ora' does not exist in directory 'prod' ORA-06512: at line 4 starting Oracle instance without parameter file for retrival of spfile RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =================================...