中午接到电话,客户的核心系统Oracle数据库实例宕机,远程过去在告警日志中看到大量的如下报错,报错很频繁: ...... Fri Jul 25 13:20:14 2014 Errors in file /u01/app/oracle/diag/rdbms/d012band/d012band/trace/d012band_smon_5964354.trc (incident=43361): ORA-00600: internal error code, arguments: [13013], [5001], [268], [8452274], [7], [8452274], [17], [], [], [], [], [] Incident details in: /u01/app/oracle/diag/rdbms/d012band/d012band/incident/incdir_43361/d012band_smon_5964354_i43361.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Non-fatal internal error happenned while SMON was doing logging scn->time mapping. SMON encountered 1 out of maximum 100 non-fatal internal errors. Fri Jul 25 13:20:17 2014 Dumping diagnostic data in directory=[cdmp_20140725132017], requested by (instance=1, osid=5964354 (SMON)), summary=[incident=43361]. Starting background process SMCO ...... 一段时间后,最后实例被PMON进程终止,导致核心系统宕机。 客户环境:Oracle Database 11.2.0.3.0 for IBM AIX 6.1,单机,IBM HACMP主备模式保护。 参考《 New and Improved: ORA-600 [13013] "Unable to get a Stable set of Records" (文档 ID 1438920.1)》和《ORA-600 [13013] "Unable to get a Stable set of Records" (文档 ID 28185.1)》文章;该报错是由于对某个表执行DML操作,该表对应的某个索引损坏导致的,解决的办法是找出操作的表和受损的索引,重建索引即可。 下面是对ORA-600的参数说明: 6 Argument format ================= This format relates to Oracle Server 8.0.3 and above Arg [a] Passcount Arg [b] Data Object number Arg [c] Tablespace Decimal Relative DBA (RDBA) of block containing the row to be updated Arg [d] Row Slot number Arg [e] Decimal RDBA of block being updated (Typically same as [c]) Arg [f] Code 由此执行下面的SQL语句查询找出DML操作的表,以及表对应的索引: SQL> SELECT OWNER,OBJECT_NAME,OBJECT_TYPE FROM DBA_OBJECTS WHERE DATA_OBJECT_ID=268; 发现问题对象是SYS用户下面的SMON_SCN_TIME表,SYS.SMON_SCN_TIME是Oracle更新非常频繁的一张表,该数据字典用于维护SCN和时间的关系。 SQL> SELECT OWNER,INDEX_NAME FROM DBA_INDEXES WHERE TABLE_NAME='SMON_SCN_TIME'; 发现这个表对应有SMON_SCN_TIME_TIM_IDX和SMON_SCN_TIME_SCN_IDX两个索引。 对表进行分析,找出有问题的索引: SQL> ANALYZE TABLE smon_scn_time VALIDATE STRUCTURE; Table analyzed. SQL> ANALYZE TABLE smon_scn_time VALIDATE STRUCTURE CASCADE ONLINE; table/Index Cross Reference Failure - see trace file 能够正常的分析表,但是不能正常的分析表上的索引。 参考《OERR: ORA-1499 table/Index Cross Reference Failure - see trace file (文档 ID 1499.1)》文章,尝试找出受影响的索引: 查看trace file: BH (0x70000078fe2df98) file#: 2 rdba: 0x0081483e (2/84030) class: 1 ba: 0x70000078ce52000 set: 174 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 2272,19 dbwrid: 5 obj: 272 objn: 272 tsn: 1 afn: 2 hint: f hash: [0x700000801ff36a0,0x700000801ff36a0] lru: [0x70000074bf81f40,0x70000074bf81ce0] ckptq: [NULL] fileq: [NULL] objq: [0x7000007f59b4138,0x7000007f59b4138] objaq: [0x7000007f59b4128,0x7000007f59b4128] use: [NULL] wait: [NULL] fast-cr-pins: 2 st: XCURRENT md: NULL fpin: 'kdiwh15: kdifxs' tch: 12 flags: LRBA: [0x0.0.0] LSCN: [0x0.0] HSCN: [0xffff.ffffffff] HSUB: [65535] buffer tsn: 1 rdba: 0x0081483e (2/84030) scn: 0x0000.2f4752d8 seq: 0x01 flg: 0x06 tail: 0x52d80601 frmt: 0x02 chkval: 0x0778 type: 0x06=trans data 0x0081483e要去掉0x,0x表示十六进制。 执行下面的SQL语句: SELECT owner, segment_name, segment_type, partition_name FROM DBA_SEGMENTS WHERE header_file = (SELECT file# FROM v$datafile WHERE rfile# = dbms_utility.data_block_address_file(to_number('0080f8b2','XXXXXXXX')) AND ts#= 1) AND header_block = dbms_utility.data_block_address_block(to_number('0080f8b2','XXXXXXXX')); 可惜没有任何的查询结果。 继续参考《Instance Terminated With Error ORA-00474: SMON Process Terminated With Error (文档 ID 1361872.1)》文章,文章有下面一段描述: CAUSE ORA600 [13011] is raised due to indexes corruption. To verify the corruption run the following statements: SQL> conn / as sysdba SQL> ANALYZE TABLE smon_scn_time VALIDATE STRUCTURE; Table analyzed. -- It should come out clean giving message table analyzed. SQL> ANALYZE TABLE smon_scn_time VALIDATE STRUCTURE CASCADE ONLINE; -- it should fail with Ora-1499 if at least one index is corrupted ORA-1499就是我们前面遇到的报错: Error: ORA 1499 Text: table/Index Cross Reference Failure - see trace file ------------------------------------------------------------------------------- Cause: An error occurred when validating an index or a table using the ANALYZE command. One or more entries does not point to the appropriate cross-reference. Action: Check the trace file for more descriptive messages about the problem. Correct these errors. 如果在分析表的时候收到ORA-1499的错误,表示至少有一个索引是损坏的。 SOLUTION Rebuild corrupted indexes: SQL> conn as sysdba SQL> ALTER INDEX SMON_SCN_TIME_TIM_IDX REBUILD ONLINE; SQL> ALTER INDEX SMON_SCN_TIME_SCN_IDX REBUILD ONLINE; then re-run SQL> ANALYZE TABLE smon_scn_time VALIDATE STRUCTURE CASCADE ONLINE; Table analyzed. Note: The last statement should not report any errors. 两个索引重建之后,ANALYZE TABLE smon_scn_time VALIDATE STRUCTURE CASCADE ONLINE语句能顺利的分析,告警日志不再有报错产生。 总结:要学会充分利用MOS,根据不同的线索搜索不同的文章来看,不要在一篇文章上吊死。 --end-- (责任编辑:IT) |