alert日志報(bào)錯(cuò)信息: Wed Jun 18 09:03:51 2008 原因分析: DBA_2PC_PENDING Oracle會(huì)自動(dòng)處理分布事務(wù),保證分布事務(wù)的一致性,所有站點(diǎn)全部提交或全部回滾。一般情況下,處理過(guò)程在很短的時(shí)間內(nèi)完成,根本無(wú)法察覺(jué)到。但是,如果在commit或rollback的時(shí)候,出現(xiàn)了連接中斷或某個(gè)數(shù)據(jù)庫(kù)站點(diǎn)CRASH的情況,則提交操作可能會(huì)無(wú)法繼續(xù),此時(shí)DBA_2PC_PENDING和DBA_2PC_NEIGHBORS中會(huì)包含尚未解決的分布事務(wù)。 對(duì)于絕大多數(shù)情況,當(dāng)恢復(fù)連接或CRASH的數(shù)據(jù)庫(kù)重新啟動(dòng)后,會(huì)自動(dòng)解決分布式事務(wù),不需要人工干預(yù)。只有分布事務(wù)鎖住的對(duì)象急需被訪問(wèn),鎖住的回滾段阻止了其他事務(wù)的使用,網(wǎng)絡(luò)故障或CRASH的數(shù)據(jù)庫(kù)的恢復(fù)需要很長(zhǎng)的時(shí)間等情況出現(xiàn)時(shí),才使用人工操作的方式來(lái)維護(hù)分布式事務(wù)。
SQL> conn /as sysdba LOCAL_TRAN_ID STATE MIX A FAIL_TIME RETRY_TIME SQL> col LOCAL_TRAN_ID for a15 LOCAL_TRAN_ID IN_OUT DATABASE INTERFACE SQL> COMMIT FORCE '21.3.5270155';
SQL> select count(*) from pending_trans$; COUNT(*) SQL> select LOCAL_TRAN_ID,STATUS,STATE,TOP_DB_USER from pending_trans$; LOCAL_TRAN_ID S STATE TOP_DB_USER SQL> COMMIT FORCE '21.3.5270155'; SQL> COMMIT FORCE '21.3.5270155';
*
LOCAL_TRAN_ID
*
提交完成。 SQL> alter session set "_smu_debug_mode" = 4; 會(huì)話已更改。 SQL> EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('21.3.5270155'); PL/SQL 過(guò)程已成功完成。 SQL> SELECT LOCAL_TRAN_ID, GLOBAL_TRAN_ID FROM DBA_2PC_PENDING; 未選定行 SQL> SELECT LOCAL_TRAN_ID, IN_OUT, DATABASE, INTERFACE FROM DBA_2PC_NEIGHBORS; 未選定行 SQL> select LOCAL_TRAN_ID,STATE from dba_2pc_pending; 未選定行 SQL> commit; 提交完成。 SQL> exit
In this Document
--------------------------------------------------------------------------------
Applies to: Symptoms SQL> EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('5.23.2386'); BEGIN DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('5.23.2386'); END; SQL> execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('37.16.108');
If the remote database no longer exists then the transaction will have to be Follow the instructions on how to purge a 1. Identify the id of the transaction: SQL> SELECT LOCAL_TRAN_ID, GLOBAL_TRAN_ID FROM DBA_2PC_PENDING; 2. Purge the transaction: SQL> EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY(''); 3. Confirm that the transaction has been purged: SQL> SELECT LOCAL_TRAN_ID, GLOBAL_TRAN_ID FROM DBA_2PC_PENDING;
BEGIN DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('5.23.2386'); END;
Use the following workaround: You have to use local_tran_id..... Issue commit before alter system set "_smu_debug_mode" = 4; Follow the steps, SQL>commit; SQL> alter session set "_smu_debug_mode" = 4; SQL>execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('local_tran_id');
When executing the following procedure(dbms_transaction.purge_lost_db_entry) SQL> execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('37.16.108'); ==>For example.. Fix: The transaction to be deleted is in the prepared state and has to be either SQL> select LOCAL_TRAN_ID,STATE from dba_2pc_pending; LOCAL_TRAN_ID STATE SQL> rollback force '37.16.108'; ==>For example Rollback complete. SQL> select LOCAL_TRAN_ID,STATE from dba_2pc_pending; LOCAL_TRAN_ID STATE SQL>Commit; SQL>alter system set "_smu_debug_mode" = 4; SQL> exec dbms_transaction.purge_lost_db_entry('37.16.108'); ==>For example References ============ 其它參考: http://blog./u/12960/showart_457785.html 最近數(shù)據(jù)庫(kù)突然出現(xiàn)RECO進(jìn)程不停的報(bào)ORA-02068和ORA-03113的錯(cuò)誤:
Errors in file /oracle/admin/UBISP/bdump/ubisp_reco_23401.trc:
ORA-02068: following severe error from DBNAME
ORA-03113: end-of-file on communication channel 檢查trace文件發(fā)行tran號(hào)總是那幾個(gè),執(zhí)行語(yǔ)句:
select * from dba_2pc_pending;
可以看到記錄的所有LOCAL_TRAN_ID與trace中的一樣,這太奇怪了,從現(xiàn)象上看只能問(wèn)題RECO進(jìn)程無(wú)法清除這些失敗后的事物,這個(gè)問(wèn)題在RAC環(huán)境有可能出現(xiàn),但單節(jié)點(diǎn)上為什么出現(xiàn)這個(gè)問(wèn)題還不清楚。解決方法如下:
1. select * from dba_2pc_pending;
2. alter system disable distributed recovery ; 3. alter session set "_smu_debug_mode" = 4 ;
4. execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('<transaction id>');
5. select * from dba_2pc_pending; 6. alter system enable distributed recovery; 如果有多個(gè)事物,需要在第四步后面執(zhí)行commit;
_smu_debug_mode缺省是0,可以在完成之后改回來(lái),查看該隱藏參數(shù)的命令是:
set linesize 132 column name format a30 column value format a25 select x.ksppinm name, y.ksppstvl value, y.ksppstdf isdefault, decode(bitand(y.ksppstvf,7),1,'MODIFIED',4,'SYSTEM_MOD','FALSE') ismod, decode(bitand(y.ksppstvf,2),2,'TRUE','FALSE') isadj from sys.x$ksppi x, sys.x$ksppcv y where x.inst_id = userenv('Instance') and y.inst_id = userenv('Instance') and x.indx = y.indx and x.ksppinm like '%_smu_debug_mode%' order by translate(x.ksppinm, ' _', ' '); 另外: http://www./dodd/2008/02/ora03113.html 大家知道,ORA-03113錯(cuò)誤是Oracle數(shù)據(jù)庫(kù)常見(jiàn)的錯(cuò)誤,導(dǎo)致這個(gè)錯(cuò)誤的原因比較復(fù)雜,各種各樣的原因??赡苁蔷W(wǎng)絡(luò)中斷引起的、也可能是數(shù)據(jù)庫(kù)本身出現(xiàn)了問(wèn)題。
下面就一個(gè)案例,分析一下ORA-03113錯(cuò)誤。
故障現(xiàn)象:
開(kāi)始alert文件提示錯(cuò)誤: Error 1013 trapped in 2PC on transaction 1.60.1257421. Cleaning up. Error stack returned to user: ORA-02050: transaction 1.60.1257421 rolled back, some remote DBs may be in-doubt ORA-01013: user requested cancel of current operation ORA-06553: PLS-103: Encountered the symbol "EXCEPTION" when expecting one of the following: begin case declare exit for function goto if loop mod null package pragma procedure raise return select separate type update while with <an identifier> <a double-quoted delimited-identifier> <a bind variable> << form table call close current define delete fetch lock Mon Feb 18 09:07:19 2008 DISTRIB TRAN SMSBOSS.09aad41c.1.60.1257421 is local tran 1.60.1257421 (hex=01.3c.132fcd) insert pending collecting tran, scn=8914343855672 (hex=81b.884c8638) 然后時(shí)不時(shí)的會(huì)提示下面錯(cuò)誤: ERROR, tran=1.60.1257421, session#=1, ose=0: ORA-03113: end-of-file on communication channel *** 2008-02-18 09:45:25.919 ERROR, tran=1.60.1257421, session#=1, ose=0: ORA-03113: end-of-file on communication channel *** 2008-02-18 10:19:42.891
Oracle數(shù)據(jù)庫(kù)只有這些錯(cuò)誤提示,其余狀態(tài)均正常。
從錯(cuò)誤提示看,應(yīng)該是由于分布事務(wù)由于人為cancel中止,引起的事務(wù)失敗,下面查看相關(guān)信息:
SQL> select LOCAL_TRAN_ID,GLOBAL_TRAN_ID,STATE,MIXED,HOST,COMMIT# from dba_2pc_pending;
LOCAL_TRAN_ID GLOBAL_TRAN_ID STATE MIX HOST COMMIT# ---------------------- -------------------- ---------------- --- -------------------- ---------------- 1.60.1257421 SMSBOSS.09aad41c.1.60.1257421 collecting no WORKGROUP\LIUQING 8914343855672
SQL> select * from DBA_2PC_NEIGHBORS;
LOCAL_TRAN_ID IN_ DATABASE DBUSER_OWNER I DBID SESS# BRANCH ---------------------- --- --------------- ------------------------------ ----------------- ---------- -------------------- 1.60.1257421 in BOSSMGR N 1 0000
1.60.1257421 out SMSDBN BOSSMGR N cc3ddb9b 1 4
select * from DBA_2PC_PENDING@smsdbn;
no rows selected
select * from DBA_2PC_NEIGHBORS@smsdbn;
no rows selected
dba_2pc_pending視圖記錄等待恢復(fù)的分布式事務(wù)的信息 dba_2pc_neighbors視圖記錄未決的分布式事務(wù)的輸入輸出連接信息
有上述信息分析原因,1.60.1257421事務(wù)的狀態(tài)為collecting,本機(jī)數(shù)據(jù)庫(kù)數(shù)據(jù)流向?yàn)?FONT face="Times New Roman">in,遠(yuǎn)端smsdbn數(shù)據(jù)庫(kù)流向?yàn)?FONT face="Times New Roman">out。
詢問(wèn)業(yè)務(wù)人員,確實(shí)運(yùn)行過(guò)這么個(gè)一存儲(chǔ),中途手工中止了。并且是從smsdbn數(shù)據(jù)庫(kù)里select數(shù)據(jù)然后update本地?cái)?shù)據(jù)庫(kù)。 這基本證實(shí)了我們的猜測(cè)。
下面嘗試force commit或者 force rollback此事務(wù),
SQL> commit force '1.60.1257421'; commit force '1.60.1257421' * ERROR at line 1: ORA-02058: no prepared transaction found with ID 1.60.1257421
上述錯(cuò)誤的原因是由于collecting狀態(tài)的事務(wù)不需要commit/rollback force。 我們現(xiàn)在需要做的就是:
(1) Disable分布式恢復(fù) SQL> ALTER SYSTEM DISABLE DISTRIBUTED RECOVERY; System altered.
(2)Puege(清空)in-doubt transaction entry:
SQL> exec DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('1.60.1257421'); PL/SQL procedure successfully completed.
(3)然后enable 分布式恢復(fù): SQL> ALTER SYSTEM ENABLE DISTRIBUTED RECOVERY;
參考信息/更多閱讀:
Note:1012842.102 ORA-2019 ORA-2058 ORA-2068 ORA-2050: Failed Distributed Transactions
Note:100664.1 How to Troubleshoot Distributed Transactions
Note:274321.1
Note:126069.1 Manually Resolving In-Doubt Transactions: Different Scenarios |
|
來(lái)自: guolijiegg > 《oracle》