Real-World DBA Implementation Scenarios
10 end-to-end, hands-on scenarios covering Memory Tuning, Performance Optimization, Security Hardening, High Availability, Multitenant Architecture, Storage Management, and Backup & Recovery.
Business ProblemA financial services company migrated its core banking application to Oracle 19c on a new server with 32 GB RAM. The DBA team manually set SGA and PGA values but observed frequent ORA-04031 errors during peak hours. Memory was not being redistributed efficiently between SGA and PGA. The decision was made to implement Automatic Memory Management (AMM) to let Oracle self-tune memory allocation.
-- Step 1: Baseline check
SELECT name, value, description
FROM v$parameter
WHERE name IN ('memory_target','memory_max_target',
'sga_target','pga_aggregate_target');
-- Step 2: Disable manual SGA/PGA settings first
ALTER SYSTEM SET SGA_TARGET=0 SCOPE=SPFILE;
ALTER SYSTEM SET PGA_AGGREGATE_TARGET=0 SCOPE=SPFILE;
-- Step 3: Set AMM parameters
ALTER SYSTEM SET MEMORY_MAX_TARGET=28G SCOPE=SPFILE;
ALTER SYSTEM SET MEMORY_TARGET=24G SCOPE=SPFILE;
-- Step 4: Restart database
SHUTDOWN IMMEDIATE;
STARTUP;
-- Step 5: Verify AMM is active
SELECT component, current_size/1024/1024 AS size_mb,
min_size/1024/1024 AS min_mb
FROM v$memory_dynamic_components
ORDER BY current_size DESC;
AMM requires /dev/shm (tmpfs) to be sized >= MEMORY_TARGET on Linux. Check with df -h /dev/shm. Ensure MEMORY_MAX_TARGET >= MEMORY_TARGET or Oracle will error on startup.
Outcome
| Metric | Before | After |
|---|
| ORA-04031 errors/day | 15–20 per peak hour | Zero after AMM enabled |
| Manual tuning effort | Weekly SGA/PGA review | None — Oracle self-tunes |
| Peak PGA usage | Capped, causing disk sorts | Dynamically expanded |
| Memory waste | 30% SGA unused at night | Oracle reclaims automatically |
Business ProblemAn ERP application generates SQL with literal values rather than bind variables, causing massive hard-parsing. The shared pool fills with single-use cursors. CPU on the DB server is at 95% with top wait event: library cache: mutex X. The application code cannot be changed immediately.
-- Identify hard parse rate
SELECT name, value FROM v$sysstat
WHERE name IN ('parse count (total)','parse count (hard)');
-- Apply cursor sharing (dynamic, no restart)
ALTER SYSTEM SET CURSOR_SHARING=FORCE SCOPE=BOTH;
-- Increase shared pool minimum
ALTER SYSTEM SET SHARED_POOL_SIZE=512M SCOPE=SPFILE;
-- Increase session cursor cache
ALTER SYSTEM SET SESSION_CACHED_CURSORS=200 SCOPE=BOTH;
-- Increase PGA for sort operations
ALTER SYSTEM SET PGA_AGGREGATE_TARGET=4G SCOPE=BOTH;
-- Flush shared pool (maintenance window only!)
ALTER SYSTEM FLUSH SHARED_POOL;
Warning: CURSOR_SHARING=FORCE may affect execution plans in rare cases where different literal values have very different selectivities. Monitor AWR Top SQL after enabling. The preferred long-term fix is application-level bind variables.
Outcome
| Metric | Before | After |
|---|
| Hard parses/second | 8,500+ | < 50 (99% reduction) |
| DB CPU utilisation | 95% peak | 42% peak |
| Library cache mutex waits | #1 wait event | < 0.1% of waits |
| Shared pool free memory | < 5 MB | 180+ MB sustained |
Business ProblemA retail company had no structured backup policy. RMAN backups were ad hoc scripts to local disk with no retention policy. Archived logs were filling the filesystem randomly. The DBA was asked to implement a proper FRA-based backup strategy meeting RPO 1 hour and RTO 4 hours.
SQL/RMAN — FRA & Backup Setup -- Enable ARCHIVELOG mode
SHUTDOWN IMMEDIATE; STARTUP MOUNT;
ALTER DATABASE ARCHIVELOG; ALTER DATABASE OPEN;
-- Configure FRA
ALTER SYSTEM SET DB_RECOVERY_FILE_DEST='/fra' SCOPE=BOTH;
ALTER SYSTEM SET DB_RECOVERY_FILE_DEST_SIZE=500G SCOPE=BOTH;
ALTER SYSTEM SET ARCHIVE_LAG_TARGET=3600 SCOPE=BOTH;
-- RMAN: Configure retention policy
CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 7 DAYS;
CONFIGURE BACKUP OPTIMIZATION ON;
CONFIGURE CONTROLFILE AUTOBACKUP ON;
-- Weekly full backup (Sunday 22:00)
RUN {
ALLOCATE CHANNEL c1 DEVICE TYPE DISK;
BACKUP AS COMPRESSED BACKUPSET
INCREMENTAL LEVEL 0 DATABASE
FORMAT '/fra/backup/full_%d_%T_%s_%p.bkp'
TAG 'WEEKLY_FULL';
BACKUP ARCHIVELOG ALL DELETE INPUT
FORMAT '/fra/backup/arch_%d_%T_%s_%p.bkp';
RELEASE CHANNEL c1;
}
-- Verify backup integrity
RESTORE DATABASE VALIDATE;
RESTORE ARCHIVELOG ALL VALIDATE;
Best Practice: Always run RESTORE DATABASE VALIDATE after backup jobs to catch corrupt backups before they are needed. Set up alerting on V$RMAN_BACKUP_JOB_DETAILS for STATUS != 'COMPLETED'.
Outcome
| Metric | Before | After |
|---|
| Backup strategy | Ad hoc, undocumented | Documented, automated, tested |
| FRA monitoring | None | Alert at 80% usage |
| RPO achieved | Unknown (days) | < 1 hour (ARCHIVE_LAG=3600) |
| RTO validated | Never tested | 2.5 hours (tested restore) |
Business ProblemA payment processing company faced an upcoming PCI-DSS audit. Security review revealed: default passwords in use, listener accessible without password, dictionary accessible to all users, no audit trail, and network traffic unencrypted. A 30-day hardening plan was required.
SQL — Security Hardening Checklist -- 1. Enforce case-sensitive passwords
ALTER SYSTEM SET SEC_CASE_SENSITIVE_LOGON=TRUE SCOPE=SPFILE;
-- 2. Limit failed login attempts
ALTER SYSTEM SET SEC_MAX_FAILED_LOGIN_ATTEMPTS=3 SCOPE=SPFILE;
-- 3. Lock dictionary from regular users
ALTER SYSTEM SET O7_DICTIONARY_ACCESSIBILITY=FALSE SCOPE=SPFILE;
-- 4. Create unified audit policy for DML on sensitive tables
CREATE AUDIT POLICY pci_data_access
ACTIONS SELECT, INSERT, UPDATE, DELETE
ON payments.card_transactions
BY USERS WITH GRANTED ROLES dba, app_user
EVALUATE PER SESSION;
AUDIT POLICY pci_data_access;
-- 5. Audit privilege use
CREATE AUDIT POLICY priv_audit
PRIVILEGES CREATE USER, DROP USER, ALTER USER,
GRANT ANY PRIVILEGE, GRANT ANY ROLE;
AUDIT POLICY priv_audit;
-- 6. Enable SIEM integration
ALTER SYSTEM SET UNIFIED_AUDIT_COMMON_SYSTEMLOG=
'SYSLOG:LOCAL1:WARNING' SCOPE=SPFILE;
-- 7. Verify audit records
SELECT event_timestamp, db_username, action_name, object_name
FROM unified_audit_trail
WHERE unified_audit_policies = 'PCI_DATA_ACCESS'
ORDER BY event_timestamp DESC FETCH FIRST 50 ROWS ONLY; Outcome
| Metric | Before | After |
|---|
| Audit trail | None | Unified audit — every DML on card data |
| Password policy | Case-insensitive | Case-sensitive + lockout after 3 fails |
| Network encryption | Cleartext | AES-256 enforced |
| PCI-DSS findings | 14 critical items | 0 critical — audit passed |
Business ProblemNightly ETL jobs on a 10 TB data warehouse were taking 14 hours to complete, breaching the 6-hour maintenance window. The DBA team needed to leverage the 16-core server's parallel processing capabilities. Parallel execution was manually hinted but inconsistently applied.
SQL — Parallel Query Configuration -- Enable AUTO parallel degree policy
ALTER SYSTEM SET PARALLEL_DEGREE_POLICY=AUTO SCOPE=BOTH;
ALTER SYSTEM SET PARALLEL_MAX_SERVERS=128 SCOPE=SPFILE;
ALTER SYSTEM SET PARALLEL_MIN_SERVERS=16 SCOPE=SPFILE;
ALTER SYSTEM SET PARALLEL_DEGREE_LIMIT=CPU SCOPE=BOTH;
-- Set large ETL tables for parallel
ALTER TABLE sales_fact PARALLEL 8;
ALTER TABLE product_dim PARALLEL 4;
ALTER TABLE customer_dim PARALLEL 4;
-- Enable adaptive statistics for accurate cardinalities
ALTER SYSTEM SET OPTIMIZER_ADAPTIVE_STATISTICS=TRUE SCOPE=BOTH;
ALTER SYSTEM SET OPTIMIZER_MODE=ALL_ROWS SCOPE=BOTH;
-- Monitor active parallel sessions
SELECT qcsid, server#, degree, req_degree, sql_id
FROM v$px_session ORDER BY qcsid;
Outcome
| Metric | Before | After |
|---|
| ETL completion time | 14 hours | 4.5 hours (within window) |
| Server CPU utilisation | 22% avg | 78% avg (efficient use) |
| Parallel degree | Manual hints | Fully automatic via AUTO policy |
| Plan regressions | N/A | Zero — adaptive stats corrected plans |
Business ProblemA telecoms company's reporting application received ORA-01555 errors nightly when long-running aggregate reports (2–3 hour queries) ran alongside heavy OLTP transactions. The undo tablespace was 8 GB with UNDO_RETENTION=900 seconds (15 minutes). Undo was being overwritten before long queries finished reading consistent data.
SQL — ORA-01555 Diagnosis & Fix -- Step 1: Diagnose undo extent turnover
SELECT begin_time, end_time, undoblks, txncount,
maxquerylen, ssolderrcnt -- ORA-01555 count
FROM v$undostat
ORDER BY begin_time DESC FETCH FIRST 24 ROWS ONLY;
-- Step 2: Extend UNDO_RETENTION to 4 hours
ALTER SYSTEM SET UNDO_RETENTION=14400 SCOPE=BOTH;
-- Step 3: Resize undo tablespace
ALTER TABLESPACE UNDOTBS1 ADD DATAFILE
'/oradata/undo02.dbf' SIZE 20G AUTOEXTEND ON MAXSIZE 30G;
-- Step 4: Enable RETENTION GUARANTEE
ALTER TABLESPACE UNDOTBS1 RETENTION GUARANTEE;
-- Step 5: Verify new undo sizing
SELECT ROUND(tuned_undoretention/3600,1) AS retention_hr,
activeblks, unexpiredblks, expiredblks
FROM v$undostat WHERE rownum=1; Outcome
| Metric | Before | After |
|---|
| ORA-01555 per night | 40–60 errors | Zero |
| UNDO_RETENTION | 900 seconds (15 min) | 14400 seconds (4 hours) |
| Undo tablespace size | 8 GB | 28 GB (auto-extend) |
| Report success rate | 60% | 100% |
Business ProblemA software company runs Oracle 19c Multitenant with production PDBs in a CDB. The DevOps team requested a new PDB for a development team in 4 hours, cloned from the production PDB but isolated. The DBA needed to clone the PDB, apply a lockdown profile, and set resource limits.
SQL — PDB Clone & Lockdown -- Step 1: Clone PROD PDB to DEV (hot clone, prod stays open)
CREATE PLUGGABLE DATABASE DEVPDB01
FROM PRODPDB01
FILE_NAME_CONVERT=('/oradata/PRODPDB01/','/oradata/DEVPDB01/')
NO DATA; -- Schema only, no production data
-- Step 2: Open the new PDB
ALTER PLUGGABLE DATABASE DEVPDB01 OPEN;
ALTER PLUGGABLE DATABASE DEVPDB01 SAVE STATE;
-- Step 3: Create lockdown profile for dev restrictions
CREATE LOCKDOWN PROFILE dev_lockdown;
ALTER LOCKDOWN PROFILE dev_lockdown
DISABLE STATEMENT=('ALTER SYSTEM');
ALTER LOCKDOWN PROFILE dev_lockdown
DISABLE FEATURE=('XStream');
-- Step 4: Apply limits to DEVPDB01
ALTER SESSION SET CONTAINER=DEVPDB01;
ALTER SYSTEM SET PDB_LOCKDOWN='dev_lockdown';
ALTER SYSTEM SET CPU_COUNT=4;
ALTER SYSTEM SET SGA_TARGET=2G;
ALTER SYSTEM SET PGA_AGGREGATE_TARGET=1G; Outcome
| Metric | Before | After |
|---|
| Provisioning time | Days (manual) | < 2 hours (cloned from prod) |
| Dev isolation | Shared server | CPU/SGA capped, lockdown applied |
| Production impact | Risk of shared | Zero — PDB fully isolated |
| Rollback option | Complex | DROP PLUGGABLE DATABASE in seconds |
Business ProblemA government agency required a disaster recovery solution with RPO < 30 minutes and RTO < 1 hour. The existing cold standby (tape restore) could not meet these SLAs. Oracle Data Guard physical standby was selected as the HA solution.
SQL — Data Guard Setup (Primary + Standby) -- PRIMARY: Enable force logging
ALTER DATABASE FORCE LOGGING;
-- PRIMARY: Configure DG
ALTER SYSTEM SET DB_UNIQUE_NAME='PRODDB' SCOPE=SPFILE;
ALTER SYSTEM SET LOG_ARCHIVE_CONFIG=
'DG_CONFIG=(PRODDB,DRDB)' SCOPE=BOTH;
ALTER SYSTEM SET LOG_ARCHIVE_DEST_2=
'SERVICE=DRDB ASYNC
VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE)
DB_UNIQUE_NAME=DRDB' SCOPE=BOTH;
-- PRIMARY: Add standby redo logs
ALTER DATABASE ADD STANDBY LOGFILE GROUP 10
'/oradata/stdby_redo10.log' SIZE 500M;
-- RMAN: Create standby from active database
DUPLICATE TARGET DATABASE FOR STANDBY
FROM ACTIVE DATABASE
DORECOVER
SPFILE SET db_unique_name='DRDB';
-- STANDBY: Start managed recovery
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE
USING CURRENT LOGFILE DISCONNECT;
-- Verify Data Guard lag
SELECT name, value, unit, time_computed
FROM v$dataguard_stats
WHERE name IN ('transport lag','apply lag'); Outcome
| Metric | Before | After |
|---|
| RPO (data loss) | Hours (tape) | < 5 seconds (ASYNC) or 0 (SYNC) |
| RTO (failover time) | 4–8 hours | < 30 minutes (managed failover) |
| Standby type | Cold (tape restore) | Physical standby — always current |
| Archive log gaps | Manual detection | FAL auto-resolves gaps |
Business ProblemA retail analytics dashboard was taking 45–90 seconds to render KPI charts querying 500M row fact tables. The server had 128 GB RAM with Oracle only using 32 GB for the buffer cache. The Oracle In-Memory Column Store (IMCS) was licensed but not configured.
SQL — In-Memory Column Store Setup -- Step 1: Allocate In-Memory area (requires restart)
ALTER SYSTEM SET INMEMORY_SIZE=20G SCOPE=SPFILE;
SHUTDOWN IMMEDIATE; STARTUP;
-- Step 2: Populate analytics tables into IMCS
ALTER TABLE sales_fact INMEMORY PRIORITY HIGH
MEMCOMPRESS FOR QUERY HIGH;
ALTER TABLE product_dim INMEMORY PRIORITY MEDIUM;
ALTER TABLE date_dim INMEMORY PRIORITY HIGH;
-- Step 3: Trigger population (full scan loads table)
SELECT /*+ FULL(f) NO_MERGE */ COUNT(*) FROM sales_fact f;
-- Step 4: Monitor population progress
SELECT segment_name, inmemory_size/1024/1024 AS imcs_mb,
bytes/1024/1024 AS table_mb, populate_status
FROM v$im_segments ORDER BY inmemory_size DESC;
-- Step 5: Enable approximate aggregation for dashboards
ALTER SYSTEM SET APPROX_FOR_AGGREGATION=TRUE SCOPE=BOTH;
-- Look for: TABLE ACCESS INMEMORY FULL in EXPLAIN PLAN Outcome
| Metric | Before | After |
|---|
| Dashboard query time | 45–90 seconds | 0.3–1.2 seconds (50–100× faster) |
| Full table scan I/O | Physical disk I/O | In-memory columnar — zero I/O |
| OLTP impact | Concern | None — IMCS is read-only copy |
| Compression ratio | 1:1 | 4:1 (MEMCOMPRESS FOR QUERY HIGH) |
Business ProblemA developer accidentally ran an UPDATE without a WHERE clause on the production ORDERS table, corrupting 2.3 million rows. The incident occurred at 14:32. The DBA had to recover the table to 14:25 (7 minutes before the incident) with minimal downtime using Flashback Database.
-- Step 1: Confirm Flashback Database is enabled
SELECT flashback_on, log_mode FROM v$database;
-- Step 2: Find SCN just before the incident
SELECT timestamp_to_scn(
TIMESTAMP '2024-03-20 14:25:00') AS target_scn
FROM dual;
-- Step 3: Flashback the ENTIRE DATABASE
SHUTDOWN IMMEDIATE; STARTUP MOUNT;
FLASHBACK DATABASE TO TIMESTAMP
TO_TIMESTAMP('2024-03-20 14:25:00','YYYY-MM-DD HH24:MI:SS');
-- Step 4: Open with resetlogs
ALTER DATABASE OPEN RESETLOGS;
-- ALTERNATIVE: Flashback just the table (less downtime)
FLASHBACK TABLE orders
TO TIMESTAMP TO_TIMESTAMP('2024-03-20 14:25:00',
'YYYY-MM-DD HH24:MI:SS');
-- Step 5: Re-apply legitimate transactions via LogMiner
EXECUTE DBMS_LOGMNR.START_LOGMNR(
STARTTIME => TO_DATE('2024-03-20 14:25:00','YYYY-MM-DD HH24:MI:SS'),
ENDTIME => TO_DATE('2024-03-20 14:32:00','YYYY-MM-DD HH24:MI:SS'),
OPTIONS => DBMS_LOGMNR.DICT_FROM_ONLINE_CATALOG);
-- Step 6: Create restore point for future
CREATE RESTORE POINT before_release_deploy
GUARANTEE FLASHBACK DATABASE;
Pre-requisites: Flashback Database requires (1) ARCHIVELOG mode, (2) FRA configured, (3) DB_RECOVERY_FILE_DEST_SIZE sufficient for flashback logs, (4) ALTER DATABASE FLASHBACK ON executed. Enable this BEFORE incidents occur.
Outcome
| Metric | Before | After |
|---|
| Recovery method | RMAN restore (2–4 hrs) | Flashback Database (18 min) |
| Data restored | N/A | 2.3M rows to correct state |
| Downtime | Estimated 4 hours | 18 minutes total |
| Legitimate data lost | N/A | 5 orders re-applied via LogMiner |
| Scenario | Key Parameters | Category |
|---|
| 01 — AMM Setup | MEMORY_TARGET, MEMORY_MAX_TARGET, SGA_TARGET, PGA_AGGREGATE_TARGET | Memory |
| 02 — Cursor Sharing | CURSOR_SHARING, SHARED_POOL_SIZE, SESSION_CACHED_CURSORS | Perf |
| 03 — RMAN / FRA | DB_RECOVERY_FILE_DEST, DB_RECOVERY_FILE_DEST_SIZE, ARCHIVE_LAG_TARGET | Storage |
| 04 — Security | SEC_CASE_SENSITIVE_LOGON, SEC_MAX_FAILED_LOGIN_ATTEMPTS, UNIFIED_AUDIT | Security |
| 05 — Parallel Query | PARALLEL_DEGREE_POLICY, PARALLEL_MAX_SERVERS, OPTIMIZER_MODE | Perf |
| 06 — ORA-01555 | UNDO_RETENTION, UNDO_TABLESPACE, UNDO_MANAGEMENT | Recovery |
| 07 — PDB Provisioning | ENABLE_PLUGGABLE_DATABASE, MAX_PDBS, PDB_LOCKDOWN, CPU_COUNT | Multitenant |
| 08 — Data Guard | DB_UNIQUE_NAME, LOG_ARCHIVE_DEST_n, STANDBY_FILE_MANAGEMENT, FAL_SERVER | HA |
| 09 — In-Memory | INMEMORY_SIZE, INMEMORY_QUERY, APPROX_FOR_AGGREGATION | Perf |
| 10 — Flashback Recovery | DB_RECOVERY_FILE_DEST, FLASHBACK DATABASE, LOG_BUFFER | Recovery |
Always capture an AWR baseline (DBMS_WORKLOAD_REPOSITORY.CREATE_BASELINE) before implementing any parameter change. Test in DEV/QA first. Track changes via V$PARAMETER_HISTORY.
No comments:
Post a Comment