本文转自:
环境:Oracle Database 11gR2(11.2.0.2) on Linux
故障现象: 执行Group By语句报错ORA-00979。
具体的SQL如下:
SQL> select a.d1,a.EXIT_type,round(a.cnt1/b.cnt2*100,2) from
2 (select substr(LOGIN_DATE,1,8) d1,EXIT_type,count(*) cnt1 from xxx_connect_log group by substr(LOGIN_DATE,1,8),EXIT_type) a,
3 (select substr(LOGIN_DATE,1,8) d2 ,count(*) cnt2 from xxx_connect_log group by substr(LOGIN_DATE,1,8) ) b 4 where A.d1=B.d2 order by a.d1,a.EXIT_type; (select substr(LOGIN_DATE,1,8) d2 ,count(*) cnt2 from xxx_connect_log group by substr(LOGIN_DATE,1,8) ) b * ERROR at line 3: ORA-00979: not a GROUP BY expression
这条语句在其他大区的DB上执行均OK,在体验服DB上执行报错。从语法上看,确实是完全没问题的。
碰到这种ORA-报错的情况,我们可以通过设置ErrorStack对错误堆栈进行跟踪,将错误的后台信息比较详尽的转储到跟踪文件,供分析研究。
ErrorStack的四个级别及说明:
0 Error stack only
1 Error stack and function call stack
2 As level 1 plus the process state
3 As level 2 plus the context area
ErrorStack仅在特定的错误出现的时候才被触发。可以在实例或者会话级别进行设置。
下面我们进行979的ErrorStack跟踪:
SQL> alter system set events='979 trace name errorstack forever,level 3'; System altered. SQL> select a.d1,a.EXIT_type,round(a.cnt1/b.cnt2*100,2) from 2 (select substr(LOGIN_DATE,1,8) d1,EXIT_type,count(*) cnt1 from xxx_connect_log group by substr(LOGIN_DATE,1,8),EXIT_type) a, 3 (select substr(LOGIN_DATE,1,8) d2 ,count(*) cnt2 from xxx_connect_log group by substr(LOGIN_DATE,1,8) ) b 4 where A.d1=B.d2 order by a.d1,a.EXIT_type; (select substr(LOGIN_DATE,1,8) d2 ,count(*) cnt2 from xxx_connect_log group by substr(LOGIN_DATE,1,8) ) b * ERROR at line 3: ORA-00979: not a GROUP BY expression SQL> alter system set events='979 trace name errorstack off'; System altered.
alert日志中的内容:
Fri May 31 13:19:20 2013
OS Pid: 29652 executed alter system set events '979 trace name errorstack forever,level 3'
Errors in file /u/ora11g/diag/rdbms/xxxtest/xxxtest/trace/xxxtest_ora_29652.trc:
ORA-00979: not a GROUP BY expression
Fri May 31 13:19:27 2013
Dumping diagnostic data in directory=[cdmp_20130531131927], requested by (instance=1, sid=29652), summary=[abnormal process termination].
Fri May 31 13:19:52 2013
OS Pid: 29652 executed alter system set events '979 trace name errorstack off'
对应的跟踪文件是: /u/ora11g/diag/rdbms/xxxtest/xxxtest/trace/xxxtest_ora_29652.trc
我们来查看跟踪文件中具体的信息:
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=12, mask=0x0)
----- Error Stack Dump -----
ORA-00979: not a GROUP BY expression
----- Current SQL Statement for this session (sql_id=d2ccw741whuh0) -----
select a.d1,a.EXIT_type,round(a.cnt1/b.cnt2*:"SYS_B_0",:"SYS_B_1") from
(select substr(LOGIN_DATE,:"SYS_B_2",:"SYS_B_3") d1,EXIT_type,count(*) cnt1 from xxx_connect_log group by substr(LOGIN_DATE,:"SYS_B_4
",:"SYS_B_5"),EXIT_type) a,
(select substr(LOGIN_DATE,:"SYS_B_6",:"SYS_B_7") d2 ,count(*) cnt2 from xxx_connect_log group by substr(LOGIN_DATE,:"SYS_B_8",:"SYS_B
_9") ) b
where A.d1=B.d2 order by a.d1,a.EXIT_type
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
skdstdst()+36 call kgdsdst() 000000000 ? 000000000 ?
7FFFA7A0B468 ? 000000001 ?
000000001 ?
...
在跟踪文件中,我们看到针对我们发出的SQL,在Oracle底层已经将一些数据替换成了绑定变量,这个是因为我们设置了cursor_sharing=FORCE的。
那么对于这个SQL,是否是因为cursor_sharing这个设置导致group by报错了呢?
查询了metalink,Cursor_sharing确实有对应的Bug会导致这个报错:
8913729 | 11.2.0.2, 12.1.0.0 | ORA-979 with CURSOR_SHARING=SIMILAR or FORCE |
文档上宣称是11.2.0.2中是修复了此Bug的。但我们的环境就是11.2.0.2的哦。搞毛?
通过Oracle的堆栈信息确认了是bug 8913729。
类似的堆栈信息如下:
kgesev <- kgesec0="" -="" qcuerroer="" qcuerroep="" erroep="" em=""> <- qecgoc="" -="" qecsel="" qecpqbcheck="" qecdrv="" em=""> <- kkqcttcalo="" -="" kkqctdrvit="" apadrv="" pitca="" em="">
解决方法:
以下几种都可以解决:
1:通过设置CURSOR_SHARING=EXACT;
2:在SQL语句中使用hint /*+ CURSOR_SHARING_EXACT */;
3:设置optimizer_features_enable参数为10.2.0.5或者11.1.0.7(现在的值是11.2.0.2)。
我们选择第2种方式来解决:
修改后的SQL如下:
select /*+ CURSOR_SHARING_EXACT */ a.d1,a.EXIT_type,round(a.cnt1/b.cnt2*100,2) from
(select substr(LOGIN_DATE,1,8) d1,EXIT_type,count(*) cnt1 from xxx_connect_log group by substr(LOGIN_DATE,1,8),EXIT_type) a,
(select substr(LOGIN_DATE,1,8) d2 ,count(*) cnt2 from xxx_connect_log group by substr(LOGIN_DATE,1,8) ) b
where A.d1=B.d2 order by a.d1,a.EXIT_type;
已证实可行。