Oracle, Linux, stuff: What to do when you forgot about cleaning the fast recovery area on your standby

We assume, from the scratch, we've got a standalone standby database over an asm instance. If the total amount of archived logs, came from primary site, meets the capacity of the FRA, sometimes this could lead to very big issues.
Test the level of emptiness on FRA:

col name format a30;
SELECT
NAME,
TO_CHAR(SPACE_LIMIT, '999,999,999,999') AS SPACE_LIMIT,
TO_CHAR(SPACE_LIMIT - SPACE_USED + SPACE_RECLAIMABLE,
'999,999,999,999') AS SPACE_AVAILABLE,
ROUND((SPACE_USED - SPACE_RECLAIMABLE)/SPACE_LIMIT * 100, 1)
AS PERCENT_FULL
FROM V$RECOVERY_FILE_DEST;

If the percent_full column indicates
Normally, you can connect with rman and delete some archivelogs by issuing:

rman target /
delete archivelog until time 'sysdate-30';

where "30" is the number of days you want to remain safe by preserving the archived logs.

Well, let's say the operation above will not succeed and the result is a a frozen prompt. Believe me, it's a very posible situation.
What next? The last resort seems to be the access of archived logs through the file system. If this is the ASM Oracle proprietary one, you have to go under the grid user and run asmcmd utility:

asmcmd
ASMCMD> ls
CRS/
DATA/
ASMCMD>cd data
.....
and so on, until you reach the archivelogs folder. Well, now the last ls could end, also, in a frozen blinking prompt, waiting undefinitely to see the content of the archivelogs folder. Again, trust me, sounds very likely, don't say "this won't happen to me".
Next step could not be else than restarting the machine. The both ways that can be used to delete archivelogs are shut.
After reboot, 99% probably the cluster services are not ok. Probably te cssd daemon is down. This means you cannot even mount the database, no way to open it. Of course, you cannot open neither the ASM instance, through the traditional sqlplus way.
A solution can be to verify and start the crs services and asm instance. This can be done in one tempo.

srvctl status asm
srvctl stop asm
srvctl start asm
sqlplus / as sysasm
startup

then:

su - oracle
sqlplus / as sysdba
startup
exit
rman target /
delete archivelog until time 'sysdate-30';

if, after:

tail -f -n 50 $ORACLE_BASE/..../alert_$ORACLE_SID.log

you will notice that the gap is not recovered yet and the transport is still not established, probably the listener is the cause. If, after lsnrctl start command, the problem persist, maybe the database is not registered, this is a known issue, when the standby was made by grid control wizard. So:

sqlplus / as sysdba
alter system register;
exit

and if the problem persists, cool down and send a message to me.

Oracle, Linux, stuff

miercuri, 12 martie 2014

What to do when you forgot about cleaning the fast recovery area on your standby

Niciun comentariu:

Trimiteți un comentariu