Quantcast
Channel: problem – Neil Chandler's DBA Blog
Viewing all articles
Browse latest Browse all 6

Problems with RMAN and incarnations.

$
0
0

One day, not so very long ago, I was at a client site looking through the “passive” half of an AIX  HACMP clustered server to tidy it up a little as we were experiencing pressure on space. There was a test database on there with a very large amount of historic archive logs. I thought it would be a good idea to check the database backups in RMAN and maybe do some tidying up through that route. This, it turned out, was not the most sensible thing I have done. The test database was a straight binary copy of the Production database. It had received no subsequent changes, especially the most important one from an RMAN perspective: the Database ID. Without a warning, RMAN immediately assumed that this database, with its more recent resetlogs and matching ID, was a new Incarnation of Production and promptly amended the catalog to that effect. Let’s just see that in action:

[oracle]$ export ORACLE_SID=PROD
[oracle]$ rman target / catalog rman/rman@rman_db

Recovery Manager: Release 10.2.0.1.0 - Production on Sun Feb 27 21:01:41 2011

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

connected to target database: PROD (DBID=1099918981)
connected to recovery catalog database

RMAN> list incarnation;

List of Database Incarnations
DB Key  Inc Key DB Name  DB ID            STATUS  Reset SCN  Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1       8       PROD     1099918981       PARENT  1          30-JUN-05
1       2       PROD     1099918981       CURRENT 446075     04-APR-10

RMAN> list backup summary;

List of Backups
===============
Key     TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
------- -- -- - ----------- --------------- ------- ------- ---------- ---
1737    B  F  A DISK        27-FEB-11       1       1       YES        FULL BACKUP
1752    B  F  A DISK        27-FEB-11       1       1       NO         TAG20110227T144855

RMAN> exit

Recovery Manager complete.

[oracle]$ export ORACLE_SID=TEST
[oracle]$ rman target / catalog rman/rman@rman_db

Recovery Manager: Release 10.2.0.1.0 - Production on Sun Feb 27 21:02:23 2011

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

connected to target database: TEST (DBID=1099918981)
connected to recovery catalog database

RMAN> list backup summary;

new incarnation of database registered in recovery catalog
starting full resync of recovery catalog
full resync complete

List of Backups
===============
Key     TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
------- -- -- - ----------- --------------- ------- ------- ---------- ---
1737    B  F  A DISK        27-FEB-11       1       1       YES        FULL BACKUP
1752    B  F  A DISK        27-FEB-11       1       1       NO         TAG20110227T144855

RMAN> list incarnation;

List of Database Incarnations
DB Key  Inc Key DB Name  DB ID            STATUS  Reset SCN  Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1       8       PROD     1099918981       PARENT  1          30-JUN-05
1       2       TEST     1099918981       PARENT  446075     04-APR-10
1       1789    TEST     1099918981       CURRENT 3023938    27-FEB-11

RMAN> exit

Recovery Manager complete.

And lets see what happens when we go into RMAN for Production

[oracle]$ export ORACLE_SID=PROD
[oracle]$ rman target / catalog rman/rman@rman_db

Recovery Manager: Release 10.2.0.1.0 - Production on Sun Feb 27 21:02:59 2011

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

connected to target database: PROD (DBID=1099918981)
connected to recovery catalog database

RMAN> list backup summary;

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of list command at 02/27/2011 21:03:04
RMAN-06004: ORACLE error from recovery catalog database: RMAN-20011: target database incarnation is not current in recovery catalog

RMAN> list incarnation;
RMAN> exit

Recovery Manager complete.

So where does that leave me? With the current production datawarehouse unable to access RMAN as it’s not the right Incarnation. One quick look at the clock, and you know what time it is. 30 minutes to the start of a tight backup window, which will fail. It’s inevitable that this sort of thing never happens with 8 hours of free time to work out the best way to resolve the problem, but with scant time to sort it out with no impact on the Production system. After some thought, and some Google, it became apparent that the only solution was to hack manually edit the RMAN catalog to remove the new incarnation.

EDIT! Before trying the catalog mod below you should look at the My Oracle Support document 412113.1, and check out the rman commands:

RMAN> list incarnation;
RMAN> reset database to incarnation <dbinc_key>;    
RMAN> resync catalog;
RMAN> list incarnation;

OK. Proceed at your own risk!

To remove the bad incarnation record from the recovery catalog:

[oracle]$ sqlplus rman/rman@rman_db
RMAN @ RMAN_DB >  select * from rc_database_incarnation order by resetlogs_time;
DB_KEY             DBID DBINC_KEY NAME     RESETLOGS_CHANGE# RESETLOGS         CUR       PARENT_DB INC_KEY          PRIOR_RESETLOGS_CHANGE# PRIOR_RES STATUS
----------   ---------- ----------         --------          ----------------- --------- ---       ---------------- ----------------------- --------- --------
1 1099918981          8 PROD                      1          30-JUN-05         NO                                                                     PARENT
1 1099918981          2 PROD                 446075          04-APR-10         NO          8                                              1 30-JUN-05 PARENT
1 1099918981       1789 TEST                3023938          27-FEB-11         YES         2                                         446075 04-APR-10 CURRENT
RMAN @ RMAN_DB > select * from db;
DB_KEY      DB_ID HIGH_CONF_RECID LAST_KCCDIVTS HIGH_IC_RECID CURR_DBINC_KEY
---------- ---------- --------------- ------------- ------------- --------------
1 1099918981                     744219186             2           1789
RMAN @ RMAN_DB > update db set curr_dbinc_key = 2;
1 row updated.
RMAN @ RMAN_DB > delete from dbinc where dbinc_key = 1789;
1 row deleted.
RMAN @ RMAN_DB > select * from rc_database_incarnation order by resetlogs_time;
DB_KEY             DBID DBINC_KEY NAME     RESETLOGS_CHANGE# RESETLOGS         CUR       PARENT_DB INC_KEY          PRIOR_RESETLOGS_CHANGE# PRIOR_RES STATUS
----------   ---------- ----------         --------          ----------------- --------- ---       ---------------- ----------------------- --------- --------
1 1099918981          8 PROD                      1          30-JUN-05         NO                                                                     PARENT
1 1099918981          2 PROD                 446075          04-APR-10         YES         8                                              1 30-JUN-05 PARENT

RMAN @ RMAN_DB > commit;

Commit complete.
And let's see if we can use RMAN again...
[oracle]$ rman target / catalog rman/rman@rman_db
Recovery Manager: Release 10.2.0.1.0 - Production on Mon Mar 21 22:08:45 2011
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
connected to target database: PROD  (DBID=1099918981)
connected to recovery catalog database
RMAN> list backup summary;
starting full resync of recovery catalog
full resync complete
List of Backups
===============
Key     TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
------- -- -- - ----------- --------------- ------- ------- ---------- ---
1737    B  F  A DISK        27-FEB-11       1       1       YES        TAG20110227T144639
1752    B  F  A DISK        27-FEB-11       1       1       NO         TAG20110227T144855

And so, we are just about back where we started before some idiot messed up the RMAN catalog, and the backups work just fine. Now we need to change the dbid on the TEST database, using the nid command before another DBA does the same thing.

The last thing to do was to ensure that the recovery worked too.

NOTE: 11G Update to this blog entry


Filed under: Backups, RMAN Tagged: 10G, 11G, Incarnation, oracle, problem, RMAN, RMAN-06004, RMAN-20011

Viewing all articles
Browse latest Browse all 6

Trending Articles