Today an application team had a problem with a production system and they wanted the database restored to a point a couple of days ago. It is a configuration management system and they figured they could lose any changes . Their call.
I did all the good things
- Checked we had a change in place
- Blacked out primary and standby in OEM
- Checked flashback was on –
select flashback_on from v$database;
- Checked how far we could flashback to
set lines 120 col oldest_flashback_scn for 99999999999999999999 col oldest_flashback_time for a30 select oldest_flashback_scn, to_char(oldest_flashback_time, 'DD-MM-YYYY HH24:MI:SS') from v$flashback_database_log;
- Checked that standby had flashback on.
I stopped managed recovery on standby, shutdown that database, stopped primary and mounted and flashed back to my desired time. Everything was fine.
I went to standby, mounted it and ran
flashback standby database to timestamp to_timestamp ('19/09/2016 12:00:00','dd/mm/yyyy hh24:mi:ss');
ERROR at line 1: ORA-38729: Not enough flashback database log data to do FLASHBACK.
That stumped me. Obviously I knew what the message meant and I knew I had not run the script to see how far we could flashback to on standby but why did we have a problem?
Flashback retention was set to 1440 – 24 hours on both primary and standby but there had been no space pressure on primary so it had retained sufficient flashback logs to go back 6 days if required
It did not take a lot of digging around to find the explanation. Normally we have one database per server (physical or virtual) in production. We then map that to a dedicated standby server. In this case we had 3 databases running. Due to performance considerations we run 2 primaries and a standby on one side and 2 standbys and a primary on the other. This database was not the key/busiest database and so the standby was sharing with another standby and the primary for the biggest database. That had generated a lot of logfiles which had caused space to be consumed and the flashback logs had been tidied to meet the 1440 minute parameter setting.
A rebuild of standby is under way currently.
I always try to learn lessons from incidents like this. In this case there were two unusual requirements – flashback production and go back 48 hours.
My lessons learned
Use db_recovery_file_dest_size for each database where we have more than one and set them appropriately to the size of each database.
Could we have some sort of automated routine that creates a guaranteed restore point every hour and then drops it 48 hours later. That needs thinking about as we could risk running out of space.