One day, not so very long ago, I was at a client site looking through the “passive” half of an AIX HACMP clustered server to tidy it up a little as we were experiencing pressure on space. There was a test database on there with a very large amount of historic archive logs. I thought it would be a good idea to check the database backups in RMAN and maybe do some tidying up through that route. This, it turned out, was not the most sensible thing I have done. The test database was a straight binary copy of the Production database. It had received no subsequent changes, especially the most important one from an RMAN perspective: the Database ID. Without a warning, RMAN immediately assumed that this database, with its more recent resetlogs and matching ID, was a new Incarnation of Production and promptly amended the catalog to that effect. Let’s just see that in action:
[oracle]$ export ORACLE_SID=PROD [oracle]$ rman target / catalog rman/rman@rman_db Recovery Manager: Release 10.2.0.1.0 - Production on Sun Feb 27 21:01:41 2011 Copyright (c) 1982, 2005, Oracle. All rights reserved. connected to target database: PROD (DBID=1099918981) connected to recovery catalog database RMAN> list incarnation; List of Database Incarnations DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time ------- ------- -------- ---------------- --- ---------- ---------- 1 8 PROD 1099918981 PARENT 1 30-JUN-05 1 2 PROD 1099918981 CURRENT 446075 04-APR-10 RMAN> list backup summary; List of Backups =============== Key TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag ------- -- -- - ----------- --------------- ------- ------- ---------- --- 1737 B F A DISK 27-FEB-11 1 1 YES FULL BACKUP 1752 B F A DISK 27-FEB-11 1 1 NO TAG20110227T144855 RMAN> exit Recovery Manager complete. [oracle]$ export ORACLE_SID=TEST [oracle]$ rman target / catalog rman/rman@rman_db Recovery Manager: Release 10.2.0.1.0 - Production on Sun Feb 27 21:02:23 2011 Copyright (c) 1982, 2005, Oracle. All rights reserved. connected to target database: TEST (DBID=1099918981) connected to recovery catalog database RMAN> list backup summary; new incarnation of database registered in recovery catalog starting full resync of recovery catalog full resync complete List of Backups =============== Key TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag ------- -- -- - ----------- --------------- ------- ------- ---------- --- 1737 B F A DISK 27-FEB-11 1 1 YES FULL BACKUP 1752 B F A DISK 27-FEB-11 1 1 NO TAG20110227T144855 RMAN> list incarnation; List of Database Incarnations DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time ------- ------- -------- ---------------- --- ---------- ---------- 1 8 PROD 1099918981 PARENT 1 30-JUN-05 1 2 TEST 1099918981 PARENT 446075 04-APR-10 1 1789 TEST 1099918981 CURRENT 3023938 27-FEB-11 RMAN> exit Recovery Manager complete. And lets see what happens when we go into RMAN for Production [oracle]$ export ORACLE_SID=PROD [oracle]$ rman target / catalog rman/rman@rman_db Recovery Manager: Release 10.2.0.1.0 - Production on Sun Feb 27 21:02:59 2011 Copyright (c) 1982, 2005, Oracle. All rights reserved. connected to target database: PROD (DBID=1099918981) connected to recovery catalog database RMAN> list backup summary; RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of list command at 02/27/2011 21:03:04 RMAN-06004: ORACLE error from recovery catalog database: RMAN-20011: target database incarnation is not current in recovery catalog RMAN> list incarnation; RMAN> exit Recovery Manager complete.
So where does that leave me? With the current production datawarehouse unable to access RMAN as it’s not the right Incarnation. One quick look at the clock, and you know what time it is. 30 minutes to the start of a tight backup window, which will fail. It’s inevitable that this sort of thing never happens with 8 hours of free time to work out the best way to resolve the problem, but with scant time to sort it out with no impact on the Production system. After some thought, and some Google, it became apparent that the only solution was to hack manually edit the RMAN catalog to remove the new incarnation.
EDIT! Before trying the catalog mod below you should look at the My Oracle Support document 412113.1, and check out the rman commands:
RMAN> list incarnation; RMAN> reset database to incarnation <dbinc_key>; RMAN> resync catalog; RMAN> list incarnation;
OK. Proceed at your own risk!
To remove the bad incarnation record from the recovery catalog:
[oracle]$ sqlplus rman/rman@rman_db
RMAN @ RMAN_DB > select * from rc_database_incarnation order by resetlogs_time;
DB_KEY DBID DBINC_KEY NAME RESETLOGS_CHANGE# RESETLOGS CUR PARENT_DB INC_KEY PRIOR_RESETLOGS_CHANGE# PRIOR_RES STATUS ---------- ---------- ---------- -------- ----------------- --------- --- ---------------- ----------------------- --------- -------- 1 1099918981 8 PROD 1 30-JUN-05 NO PARENT 1 1099918981 2 PROD 446075 04-APR-10 NO 8 1 30-JUN-05 PARENT 1 1099918981 1789 TEST 3023938 27-FEB-11 YES 2 446075 04-APR-10 CURRENT
RMAN @ RMAN_DB > select * from db;
DB_KEY DB_ID HIGH_CONF_RECID LAST_KCCDIVTS HIGH_IC_RECID CURR_DBINC_KEY ---------- ---------- --------------- ------------- ------------- -------------- 1 1099918981 744219186 2 1789
RMAN @ RMAN_DB > update db set curr_dbinc_key = 2;
1 row updated.
RMAN @ RMAN_DB > delete from dbinc where dbinc_key = 1789;
1 row deleted.
RMAN @ RMAN_DB > select * from rc_database_incarnation order by resetlogs_time;
DB_KEY DBID DBINC_KEY NAME RESETLOGS_CHANGE# RESETLOGS CUR PARENT_DB INC_KEY PRIOR_RESETLOGS_CHANGE# PRIOR_RES STATUS ---------- ---------- ---------- -------- ----------------- --------- --- ---------------- ----------------------- --------- -------- 1 1099918981 8 PROD 1 30-JUN-05 NO PARENT 1 1099918981 2 PROD 446075 04-APR-10 YES 8 1 30-JUN-05 PARENT
RMAN @ RMAN_DB > commit;
Commit complete.
And let's see if we can use RMAN again...
[oracle]$ rman target / catalog rman/rman@rman_db
Recovery Manager: Release 10.2.0.1.0 - Production on Mon Mar 21 22:08:45 2011
Copyright (c) 1982, 2005, Oracle. All rights reserved.
connected to target database: PROD (DBID=1099918981) connected to recovery catalog database
RMAN> list backup summary;
starting full resync of recovery catalog full resync complete
List of Backups =============== Key TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag ------- -- -- - ----------- --------------- ------- ------- ---------- --- 1737 B F A DISK 27-FEB-11 1 1 YES TAG20110227T144639 1752 B F A DISK 27-FEB-11 1 1 NO TAG20110227T144855
And so, we are just about back where we started before some idiot messed up the RMAN catalog, and the backups work just fine. Now we need to change the dbid on the TEST database, using the nid command before another DBA does the same thing.
The last thing to do was to ensure that the recovery worked too.
NOTE: 11G Update to this blog entry
Filed under: Backups, RMAN Tagged: 10G, 11G, Incarnation, oracle, problem, RMAN, RMAN-06004, RMAN-20011
