Site search

form.antibot { display: none !important; } You must have JavaScript enabled to use this form.

Aggregator

[latest] [categories]
Please use the contact form to include your blog here, suggest a feed, or give feedback.

If your blog is aggregated on OraFAQ, you may want to display this image on your blog:

Jonathan Lewis

Just another Oracle weblog

URL: https://jonathanlewis.wordpress.com

Updated: 1 hour 6 min ago

Rownum quiz

Thu, 2024-04-25 11:46

Here’s a silly little puzzle that baffled me for a few moments until I spotted my typing error. It starts with a small table I’d created to hold a few rows, and then deletes most of them. Here’s a statement to create and populate the table:

create table t1 (id number , c1 clob)
lob(c1) store as basicfile text_lob (
        retention disable storage in row
);

insert into t1 
select  rownum, rpad(rownum,200,'0') 
from    all_objects 
where   rownum <= 1000
;

commit;

Here’s what I meant to type to delete most of the data – followed by the response from SQL*Plus:

SQL> delete from t1 where mod(id,20) != 0;

950 rows deleted.

Here’s what I actually typed, with the response, that gave me a “What?!” moment:

SQL> delete from t1 where mod(rownum,20) != 0;

19 rows deleted.

I don’t think it will take long for you to work out why the result is so different; but I think it’s a nice warning about what can happen if you get a bit casual about using rownum.

AWR Snap ID

Wed, 2024-04-10 03:17

What to do when you hit a problem (possibly after an incomplete recovery) that reports an “ORA-00001 unique key violation” on sys.wrm$_snapshot_pk – as reported recently in this thread on the MOSC SQL Performance forum (needs a MOS account.)

Snapshot ids are carefully sequenced, without gaps, so somehow the thing that controls the “current” sequence number has gone backwards and is trying to generate a value that is lower than the current highest value in wrm$_snapshot. The thread I referenced above does point to an article dated 2017 on Alibaba discussing methods of checking for corruption and clearing up messes; but as an extra option you could simply try hacking the control table to set the “last used” snapshot id so something higher than the highest value currently in wrm$_snapshot. The table you need to hack is wrm$_wr_control and here’s an example of its contents from an instance of 19.11 (preceded by a check of the current maximum snap_id in wrm$_snapshot):

SQL> select max(snap_id) max_snap_id, max(end_interval_time) max_snap_time from wrm$_snapshot;

MAX_SNAP_ID MAX_SNAP_TIME
----------- ---------------------------------------------------------------------------
       7304 09-APR-24 07.00.14.180 PM

SQL> execute print_table('select * from wrm$_wr_control')
DBID                           : 3158514872
SNAP_INTERVAL                  : +00000 01:00:00.0
SNAPINT_NUM                    : 3600
RETENTION                      : +00008 00:00:00.0
RETENTION_NUM                  : 691200
MOST_RECENT_SNAP_ID            : 7304
MOST_RECENT_SNAP_TIME          : 09-APR-24 07.00.15.169 PM
MRCT_SNAP_TIME_NUM             : 1712685600
STATUS_FLAG                    : 2
MOST_RECENT_PURGE_TIME         : 09-APR-24 08.35.57.430 AM
MRCT_PURGE_TIME_NUM            : 1712648156
MOST_RECENT_SPLIT_ID           : 7295
MOST_RECENT_SPLIT_TIME         : 1712648156
SWRF_VERSION                   : 30
REGISTRATION_STATUS            : 0
MRCT_BASELINE_ID               : 0
TOPNSQL                        : 2000000000
MRCT_BLTMPL_ID                 : 0
SRC_DBID                       : 3158514872
SRC_DBNAME                     : CDB$ROOT
T2S_DBLINK                     :
FLUSH_TYPE                     : 0
SNAP_ALIGN                     : 0
MRCT_SNAP_STEP_TM              : 1712685613
MRCT_SNAP_STEP_ID              : 0
TABLESPACE_NAME                : SYSAUX
-----------------
1 row(s) selected

PL/SQL procedure successfully completed.

I have to state that I would not try this on a production system without getting permission – possibly written in blood – from Oracle support: but if I were really desperate to keep existing snapshots and to make it possible for Oracle to generate new snapshots as soon as possible I might try updating the most_recent_snap_id to the value shown in wrm$_snapshot.

Of course you ought to update the most_recent_snap_time as well, and the mrct_snap_time_num (which looks like the number of seconds since 1st Jan 1900 GMT (on my instance)).

Then there’s the interpretation and sanity checking of the other “most recent / mrct” columns to worry about, and the possibility of PDBs vs. CDBs – but those are topics that I’m going to leave to someone else to worry about.

dbms_output

Mon, 2024-04-08 04:02

Here’s a detail about dbms_output that is probably overlooked because (in most cases) it’s ignorable, except that it can lead to unexpected response times when you try using it to debug “busy” operations.

A question on the Oracle SQL and PL/SQL forum asked: “Why is a PL/SQL ‘for loop’ so slow on Oracle Autonomous DB?” and provided the following code to demonstrate the issue.

begin
        for i in 1..36000 loop
                dbms_output.put_line ('i value: '|| i);
        end loop;
end;
/

The OP reported the elapsed time for this block as 1 minute 40 seconds (compared to Java taking only 2 seconds), and showed a screen shot to prove the point.

A couple of people added calls to systimestamp to the block to show that the loop actually took less than a second and explained that the time reported was mostly the time spent in displaying the results, not in creating them. Correct, of course, but that still raises the question of why it takes so long to display such a small amount of data.

If you call dbms_output.put_line() in your code you only get to see the results if something subsequently calls dbms_output.get_lines() to extract the results from the buffer created by dbms_output. (In the case of SQL*Plus this call is built in and enabled by default, so that there’s a hidden call to BEGIN DBMS_OUTPUT.GET_LINES(:LINES, :NUMLINES); END; after every call you make to the database unless you’ve set serveroutput off.)

You’ll note the plural in get_lines(), and the parameter :numlines – how many lines do you think the call will get by default? In SQL*Plus it’s whatever is set by the last call to set arraysize, which defaults to 15.

I set up a little test, connected to a local instance of 19.11, to demonstrate the impact – executing the supplied code from SQL*Plus using an arraysize of 1 (which is actually honoured by dbms_output), 15 (default), and 1000, with the following times:

set arraysize 1: Time: 8.90 seconds (ca. 36,000 roundtrips)
set arraysize 15: Time: 1.43 seconds (ca. 2,400 roundtrips)
set arraysize 1000: Time: 0.41 seconds (ca. 36 roundtrips)

Footnote

Historically there was a limit of 1MB on the dbms_output buffer but that limit was removed in a fairly recent version of Oracle so it is possible to generate huge amounts of data in a single PL/SQL block. This means that whatever else you do to optimise for network traffic time you may still see time spent in “PGA memory operation” as your shadow process tries to allocate memory chunks for the buffer.

Index Usage – 3

Tue, 2024-04-02 10:34

In the second part of this series I described some of the technicalities of Index Usage Tracking and showed an example of what I was doing to test the feature. In this episode I’ll describe some of the index access methods I’ve tested and report the results. I’ve listed the tests I’ve planned so far and will show results as I find time to run the tests – if you can think of more cases add them in the comments and I’ll extend the list. (If you think a test is a very good idea, “upvote” it in the comments and I’ll try to run it sooner.

Before I’ve done all the tests I’ll add a section on Conclusions and Recommendations. This will be addressing the questions: “Is the feature worth using?” and “What’s the best way to use it”. This section of the note may develop over time as special cases or boundary conditions show up.

Setup (recap from pt.2)

I’ll be using three session to do my testing:

Session 1 logged in as the test user to execute test statements and query dba_index_usage.
Session 2 logged in as sys in the same pdb to query x$keiut and x$keiut_info/v$index_usage_info (real time summaries)
Session 3 logged in as the oracle s/w owner and connected as sys in the cdb$root to call keiutflush to make mmon flush x$keiut to disk. The session may have to “alter session set container = {the test pdb}” to flush the right x$keiut.

CDB SYS session
- call keiutFlush twice to clear all active elements from x$keiut_info/v$index_usage_info (check contents)
PDB SYS session
- check x$keiut is empty
User session:
- check dba_index_usage for targetted index(es)
- execute some statements forcing use of index and check execution plan
PDB sys session:
- check contents of x$keiut and x$keiut_info/v$index_usage_Info
CDB sys session
- call keiutFlush
End user session
- Check contents of dba_index_usage (report changes)

Presentation

For each test I’ll show (in collapsed form) the SQL I used to define the objects being tested, and describe the purpose and method of the test. Then I’ll simply show the critical changes in dba_index_usage and, in some cases, x$keiut_info/v$index_usage_info, x$keiut as a result of the test.

Initial list of tests

Gathering index stats
“Foreign Key” indexes and primary key indexes during referential integrity
Single table, single column index, multiple statements
- Unique index (equality and range predicates)
- non-unique index covering unique constraint
- non-unique index generally
Single table, Multi-column index (access/filter, and skip scan)
Inlist iterator
Index-only query (range scan, full scan, fast full scan, index joins)
Bitmap access (with multiple bitmaps per key value)
- Bitmap and / or / minus
IOT accessed by primary key
IOT accessed by secondary key – ensuring “misses on guesses”
Two-table, nested loop join
DML – single table access
DML – with subquery / join methods embedded
Locally partitioned index (single table partition, multiple table partition)
Globally partitioned index

Test Data

I’ve used the following script to generate common data for many of the initial tests in the list above – I may extend this script as I complete more and more of the tests, though I may also introduce separate scripts for some specific cases:

rem
rem     Script:         iut_02.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Mar 2024
rem     Purpose:        
rem
rem     Last tested 
rem             19.11.0.0
rem     To be tested
rem             23.3.0.0
rem             12.2.0.1
rem
rem     Notes:
rem     Create some test data to demonstrate index usage tracking
rem

create table t1 as
select
        rownum - 1                              id,
        mod(rownum-1,10000)                     n1,
        trunc((rownum - 1)/20)                  col1,
        trunc((rownum - 1)/10)                  col2,
        rownum - 1                              col3,
        round(100 * dbms_random.normal())       rand,
        cast(rpad(rownum,25) as varchar2(25))   v1,
        cast(rpad('x',80,'x') as varchar2(80))  padding
from
        all_objects
where
        rownum <= 50000
/

create table t2 as select * from t1;

-- single column primary key index (unique)

alter table t1 add constraint t1_pk primary key (id) 
        using index (
        create unique index t1_pk on t1(id)
        );

create index t1_col3 on t1(col3);
alter table t1 add constraint t1_uk unique (col3);

create index t1_3col on t1(col1, col2, col3) compress 2;
create index t1_rand on t1(rand);

-- multi-column primary key index (unique)

alter table t2 add constraint t2_pk primary key (col1, id) 
        using index (
        create unique index t2_pk on t2(col1, id) compress
        );

-- "foreign key index" (one to many)

alter table t2 add constraint t2_fk_t1 foreign key (col3) references t1(id);
create index t2_col3 on t2(col3);

Gathering Stats

The SQL used to gather index stats from any of the calls to dbms_stats.gather_xxx_stats() does a simple select statement that is hinted to access the index. In my case the indexes were all fairly small – smaller than the critical number of blocks that trigger sampling methods – so Oracle examined every block and row in the index, leading to (a check in) index usage stats looking like the following t1 primary key example:

OBJECT_ID                      : 209180
NAME                           : T1_PK
OWNER                          : TEST_USER
TOTAL_ACCESS_COUNT             : 1
TOTAL_EXEC_COUNT               : 1
TOTAL_ROWS_RETURNED            : 50000
BUCKET_1000_PLUS_ACCESS_COUNT  : 1
BUCKET_1000_PLUS_ROWS_RETURNED : 50000
LAST_USED                      : 01-apr-2024 13:48:51

So gathering stats does flag the index as used – but with the enhanced tracking it’s going to be relatively easy to spot cases where a small number of accesses account for a large number of rows – hinting that these accesses may be only for stats collection.

If you’ve read the linked article you’ll see how Oracle’s sampling strategy has changed in recent years, so a check that will be relevant to some of your indexes is whether or not the average rows returned is equivalent to roughly 1,140 leaf blocks. Other simple checks that might be apprpriate are: “is the very large access happening once per day/week” in line with your expected stats collection strategy.

If course, if large access is happening fairly frequently you can raise the question – does this look an appropriate result for the table or index, or does it hint at an index that is used when it should not be, or an index that needs to be refined (e.g. by the addition of extra columns to allow better elimination before visiting the table).

Referential Integrity (and DML)

With referential integrity in place Oracle will (in the most commonly used setup)

check that matching child rows do not exist when you try to delete a parent or modify its key value – will either operation flag a supporting “foreign key” index as used.
check that the parent row exists if you try to insert a child row or update a child row to associate it with a different parent value – will either operation flag the parent primary key index as used

In my test script t1 is the parent and t2 is the child. The referential integrity is from t2.col3 to t1.id and the relevant indexes are unique indexes named t2_col3 and t1_pk respectively. Test statements are:

-- Delete parent when child exists
-- Delete parent when child deleted
-- --------------------------------

delete from t1 where id = 17000;
-- ORA-02292: integrity constraint (TEST_USER.T2_FK_T1) violated - child record found

delete from t2 where col3 = 17000;
delete from t1 where id   = 17000;

commit;

-- Insert into child without parent
-- Insert into child with    parent
-- --------------------------------

insert into t2 values(60000, 1,1,1,60000,1,'x','x');
-- ORA-02291: integrity constraint (TEST_USER.T2_FK_T1) violated - parent key not found

insert into t2 values (18000,1,1,1,15000,1,'x','x');

commit;

-- update parent to "abandon" child
-- --------------------------------

update t1 set id = 60000 where id = 25000;
-- ORA-02292: integrity constraint (TEST_USER.T2_FK_T1) violated - child record found

-- update child to change to non-existent parent
-- update child to change to pre-existing parent
-- ---------------------------------------------

update t2 set col3 = 60000 where id = 25000;
-- ORA-02291: integrity constraint (TEST_USER.T2_FK_T1) violated - parent key not found

update t2 set col3 = 30000 where id = 25000;

commit;

After calling keiutFlush and checking that there were no active elements in x$keiutinfo/v$index_usage_Info, and no rows in x$keiut I executed all the statements above one after the other (some failed, of course, with their errors shown above). Nothing was captured in x$keiut.

Apart from the implications of “foreign key” indexes not being flagged as used during referential integrity checks, the tests above also show us that updates and deletes driven by index access do not show the driving index flagged as used: t1.id = constant, t2.col3 = constant, t2.id = constant (which used an index skip scan on t2_pk)).

Single table, single column index

This set of tests is close to a repeat of the first demonstration in part 2. Here’s the list of statements aimed at index t1_pk. Note that t1_pk(id) is a unique index on a primary key constraint, t1_uk(col3) is a non-unique index covering a unique constraints, t2_col3(col3) is a single column non-unique index.

--
--      Out of range, no rows
--
select v1 from t1 where id = -1;
select v1 from t1 where id = 60001;

--
--      Single row
--
select v1 from t1 where id = 1000;
select v1 from t1 where id = 2000;

--
--      multiple rows: 5, 50, 500, 1500
--
select max(v1) from t1 where id between 11 and 15;
select max(v1) from t1 where id between 101 and 150;
select max(v1) from t1 where id between 1001 and 1500;
select max(v1) from t1 where id between 10001 and 11500;

Here are the results after the sequence: call keiutFlush, execute test SQL, report x$keiut, call keiutFlush, report change in dba_index_usage:

SQL> select objname, num_starts, num_execs, rows_returned from x$keiut;

OBJNAME                          NUM_STARTS  NUM_EXECS ROWS_RETURNED
-------------------------------- ---------- ---------- -------------
TEST_USER.T1_PK                           8          8          2057


SQL> host expand temp1.lst
OBJECT_ID                      : 209180
NAME                           : T1_PK
OWNER                          : TEST_USER
TOTAL_ACCESS_COUNT             : 8
TOTAL_EXEC_COUNT               : 8
TOTAL_ROWS_RETURNED            : 2075
BUCKET_0_ACCESS_COUNT          : 2
BUCKET_1_ACCESS_COUNT          : 2
BUCKET_2_10_ACCESS_COUNT       : 1
BUCKET_2_10_ROWS_RETURNED      : 5
BUCKET_11_100_ACCESS_COUNT     : 1
BUCKET_11_100_ROWS_RETURNED    : 50
BUCKET_101_1000_ACCESS_COUNT   : 1
BUCKET_101_1000_ROWS_RETURNED  : 500
BUCKET_1000_PLUS_ACCESS_COUNT  : 1
BUCKET_1000_PLUS_ROWS_RETURNED : 1500
LAST_USED                      : 02-apr-2024 15:29:06

You’ll note that I executed 8 select statements, and expected a total of 2,057 rows (index rowids) being passed to the table access operation, and the changes in stats shown in dba_index_usage are an exact match for the predictions above the listed SQL statements.

Since t1.col3 is an exact match of t1.id, and since t2 is a duplicate of t1, it seems likely that tests that start by cloning the SQL and changing the column or table name as appropriate would give the matching results – and they do, so I won’t bother to print them all up.

There is one final test of a single column index before I move on to simple queries targeting a multi-column index. Here’s the statement I want to test to confirm an important point:

select v1 from t1 where id between 101 and 150 and mod(id,10) = 0;

This query will scan through 50 index entries, discarding all but 5 of them, returning 5 rows from the table. The key question is this – will dba_index_usage report 50 rows accessed or 5 rows accessed. Here’s what the changes in stats looked like after the test

TOTAL_ACCESS_COUNT             : 1
TOTAL_EXEC_COUNT               : 1
TOTAL_ROWS_RETURNED            : 5
BUCKET_0_ACCESS_COUNT          : 0
BUCKET_1_ACCESS_COUNT          : 0
BUCKET_2_10_ACCESS_COUNT       : 1
BUCKET_2_10_ROWS_RETURNED      : 5
BUCKET_11_100_ACCESS_COUNT     : 0
BUCKET_11_100_ROWS_RETURNED    : 0
BUCKET_101_1000_ACCESS_COUNT   : 0
BUCKET_101_1000_ROWS_RETURNED  : 0
BUCKET_1000_PLUS_ACCESS_COUNT  : 0
BUCKET_1000_PLUS_ROWS_RETURNED : 0

This is a very important point: the stats in dba_index_usage do not tell us how many rows (index entries) we visited in the index, they tell us how many rowids (or, possibly, key values) survived to be passed to the parent operation (typically the table access). So you might look at some stats that say: “25,000 executions, all in the 2 to 10 range – good index” when you’ve got a badly designed index does 90% of the total work of a query and discards 2,000 index entries for each rowid it uses to access a table.

Inlist Iterators

Here’s a sample query (with its result set, and actual execution plan pulled from memory) accessing the t1 table through the index on column rand. If you try to repeat this example it probably won’t give you exactly the same results because I used Oracle’s random number generator to generate a normal distribution of integer values (with mean zero and standard deviation of 100), but there’s a reasonable chance that you’ll see similar numbers in your output as I’ve been careful to pick three values that should return significantly different numbers of rows:

select
        rand, count(*) 
from    t1 
where   rand in (100, 200, 300)
and     v1 is not null
group by 
        rand 
order by 
        count(*)
/


      RAND   COUNT(*)
---------- ----------
       300          2
       200         25
       100        114

3 rows selected.

select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------
SQL_ID  fbwmnax39jqtq, child number 0
-------------------------------------
select         rand, count(*) from    t1 where   rand in (100, 200,
300) and     v1 is not null group by         rand order by
count(*)

Plan hash value: 874747030

-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |         |      1 |        |      3 |00:00:00.01 |     143 |       |       |          |
|   1 |  SORT ORDER BY                 |         |      1 |      3 |      3 |00:00:00.01 |     143 |  2048 |  2048 | 2048  (0)|
|   2 |   SORT GROUP BY NOSORT         |         |      1 |      3 |      3 |00:00:00.01 |     143 |       |       |          |
|   3 |    INLIST ITERATOR             |         |      1 |        |    141 |00:00:00.01 |     143 |       |       |          |
|*  4 |     TABLE ACCESS BY INDEX ROWID| T1      |      3 |    220 |    141 |00:00:00.01 |     143 |       |       |          |
|*  5 |      INDEX RANGE SCAN          | T1_RAND |      3 |    220 |    141 |00:00:00.01 |       5 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("V1" IS NOT NULL)
   5 - access(("RAND"=100 OR "RAND"=200 OR "RAND"=300))

This plan is probably the simplest demonstration of the difference between total_exec_count and total_access_count. Here’s the output from x$keiut after this test, followed by the details available from dba_index_usage after calling keiutFlush:

OBJNAME                          NUM_STARTS  NUM_EXECS ROWS_RETURNED
-------------------------------- ---------- ---------- -------------
TEST_USER.T1_RAND                         3          1           141

Given that x$keiut has reported three “starts”, and we can see that the plan shows three “Starts” and “A-rows” is 141, and the output shows three rows totalling 141 we might hope that dba_index_usage is going to show us exactly what happened with this query. Here are the results:

SQL> execute print_tABLE(q'(select * from dba_index_usage where name = 'T1_RAND')')
OBJECT_ID                      : 209738
NAME                           : T1_RAND
OWNER                          : TEST_USER
TOTAL_ACCESS_COUNT             : 3
TOTAL_EXEC_COUNT               : 1
TOTAL_ROWS_RETURNED            : 141
BUCKET_0_ACCESS_COUNT          : 0
BUCKET_1_ACCESS_COUNT          : 0
BUCKET_2_10_ACCESS_COUNT       : 0
BUCKET_2_10_ROWS_RETURNED      : 0
BUCKET_11_100_ACCESS_COUNT     : 3
BUCKET_11_100_ROWS_RETURNED    : 141
BUCKET_101_1000_ACCESS_COUNT   : 0
BUCKET_101_1000_ROWS_RETURNED  : 0
BUCKET_1000_PLUS_ACCESS_COUNT  : 0
BUCKET_1000_PLUS_ROWS_RETURNED : 0
LAST_USED                      : 02-apr-2024 19:02:03

The three “starts” from x$keiut show up in the total_access_count but the three separate accesses have been accumulated and averaged to appear as 3 accesses in the 11-100 range. (141 / 3 = 27). So there’s some loss of precision in the information. It’s better than just the old “yes/no” report, and in many cases it’s likely to give you numbers that close enough to the right ballpark to be useful, but there are likely to be some cases where the more detailed information would be more beneficial. It’s worth mentioning that the total_access_count (hence the averaging) will include “Starts” where no rows were returned.

Conclusions and Suggestions

Calls to gather index stats execute queries that will be captured by index usage tracking, so you need a strategy to help you ignore “BUCKET_1000_PLUS_ROWS_RETURNED” accesses that are about stats collection, while taking action when they are indications of index usage that could be made more efficient.

Indexes used during referential integrity checks are not flagged as used. This is probably not much of a threat for the relevant (primary/unique key) indexes on the parent end since you are unlikely to drop indexes that enforce uniqueness even if they don’t appear to be used; however it does mean that there is some risk of dropping a “foreign key” index that is needed to avoid locking problems.

Indexes used to drive delete and update statements are not captured by the new mechanism – at least for simple “delete from single_table where …” and “update table where …”. This could result in indexes being dropped that have been created to enhance DML performance. (There are outstanding tests for DML with subqueries using indexes, or joins using indexes may flag the indexes used accessing other tables in the statement.)

The statistics about “rows returned” tell you how many index entries are passed to the parent operation, not about the number of index entries examined; so a frequently used index that consistently reports a very small number of “rows returned” is not necessarily an efficient, well-designed index.

The stats on inlist iterators are a little disappointing: while the number of iterations appears as the total_access_count, the accesses are all allocated to the same bucket on the basis of total_rows_returned / total_access_count and no allowance is made for different values in the list returning significantly different numbers of rows.

Index Usage – 2

Wed, 2024-03-20 04:19

In the first part of this series I introduced Index Usage Tracking and the view dba_index_usage – a feature that appeared in 12.2 as a replacement for index monitoring and the view dba_object_usage. In this note I’ll give a quick sketch of the technicalities of the implementation and comments on how to test and use the feature. Actual tests, results and observations will have to wait until part 3.

A not very deep dive

There are three parameters relating to Index Usage Tracking (iut), shown below with their default values:

_iut_enable [TRUE]
_iut_max_entries [30000]
_iut_stat_collection_type [SAMPLED]

The feature is, as you can see, enabled by default; the tracking, however, is “sampled”, which seems to mean that a tiny number of executions end up being tracked. I can’t find any information about how the sampling is done, and having tried a few tests that executed thousands of statements in a couple of minutes without capturing any details of index usage I’ve given up trying and done all my testing with “_iut_stat_collection_type” set to ALL.

SQL> alter session set "_iut_stat_collection_type"=all;

According to a note on MOS (Doc ID 2977302.1) it doesn’t matter whether you set this parameter for the session or the system the effect is the same; and I found that this seemed to be true in my testing on Oracle 19.11 – either way the effect appeared across all sessions connecting to the PDB, though it didn’t seem to persist across a database restart.

The parameter _iut_max_entries probably limits the amount of global memory allowed for collecting stats about indexes. You might ask whether the 30,000 is per PDB or for the entire instance; I suspect it’s for the instance as a whole, but I’m not going to run up a test to scale on that. While I know of several 3rd party applications holding far more indexes than this, the number is probably sufficient for most investigations.

There are eight objects visibly related to Index Usage Tracking: three views, one table, three memory structures and one latch:

dba_index_usage – the user (dba) friendly view of the accumulated statistics of index usage
cdb_index_usage – the cdb equivalent of the above
v$index_usage_info – a view (holding one row) summarising the current tracking status
sys.wri$_index_usage – the main table behind the xxx_index_usage views above; the views join this table to obj$ and user$, so dropped indexes (and users) disappear from the views.
x$keiut_info – the memory structure (held in the shared pool) behind the v$index_usage_info
x$keiut – a structure holding a brief summary for each index actively being tracked. This is generated on demand in the session/process memory and my guess is that it’s an extract or summary of a larger memory structure in the shared pool holding the full histogram data for each index.
htab_keiutsg – a (small) memory allocation reported by v$sgastat in the shared pool. In my 19.11 the memory size was initially 512 bytes, and in a test with 140 indexes showing up in x$keiut the memory reported was still only 512 bytes (so it’s not a simple list of pointers, more likely a set of pointers to pointers/arrays.
“keiut hash table modification” – a single parent latch which I assume protects the htab_keiutsg memory. It’s possible that this latch is used to add an entry to the x$keiut structure (or, rather, the larger structure behind it) when an index is first tracked by the software, and that each entry in that larger structure is then protected by its own mutex to minimise collision time on updates as the stats are updated (or cleared after flushing).

Given that there’s a limit of 30,000 for iut_max_entries and only a small memory allocation for the keiut hash table, it does sound as if Oracle could end up walking a fairly long linked list or array to find the correct entry to update, which makes me wonder about two things: first, have I missed something obvious, secondly will Oracle skip updating the stats if the alternative means waiting for a mutex? There’s also the question of whether Oracle simply stops collecting when the limit is reached or whether there’s some sort LRU algorithm that allows it to discard entries for rarely used indexes to get maximum benefit from the available limit.

Another thought that goes with the 30,000 limit. I can find the merge statement that Oracle uses to update the wri$_index_usage table when the stats are flushed from memory to table (an activity that takes place every 15 minutes, with no obvious parameter to change the timing). In my19.11 instance its sql_id is 5cu0x10yu88sw, and it starts with the text:

merge into 
        sys.wri$_index_usage iu
using 
        dual
on      (iu.obj# = :objn)
when matched then 
        update set
                iu.total_access_count = iu.total_access_count + :ns,
                iu.total_rows_returned = iu.total_rows_returned + :rr,
                iu.total_exec_count = iu.total_exec_count + :ne,
...

This statement updates the table one row at a time (which you can confirm if you can find it in v$sql and compare rows_processed with executions). This could take a significant amount of time to complete on a system with a very large number of indexes.

The other thing that comes with finding the merge statement is that I couldn’t find any indication that there is a delete statement – either in v$sql, or in the Oracle executable. Spreading the search a little further I queried dba_dependencies and found that the package dbms_auto_index_internal references wri$_index_usage and various of the “autotask” packages – so perhaps there’s something a couple of layers further down the PL/SQL stack that generates dynamic SQL to delete tracking data. On the other hand, there are entries in my copy of wri$_index_usage where the last_used column has dates going back to September 2021, and there are a number of rows where the reported index has been dropped.

Testing the feature.

The most significant difficulty testing the mechanism is that it flushes the in-memory stats to the table every 15 minutes, and it’s only possible to see the histogram of index usage from the table. Fortunately it is possible to use oradebug to force mmon to trigger a flush, but I found in my Oracle 19.11 PDB I had to have a session logged into the server as the owner of the Oracle executable, and logged into the cdb$root as the SYS user (though a couple of colleagues had different degrees of success on different versions of Oracle and O/S). The following is a cut and paste after logging in showing appropriate calls to oradebug:

SQL> oradebug setorapname mmon
Oracle pid: 31, Unix process pid: 11580, image: oracle@linux19c (MMON)
SQL> oradebug call keiutFlush
Function returned 0
SQL>

Initially I had assumed I could log on as a rather more ordinary O/S user and connect as SYS to the PDB, but this produced an unexpected error when I tried to execute the flush call:

SQL> oradebug call keiutFlush
ORA-32519: insufficient privileges to execute ORADEBUG command: OS debugger privileges required for client

In my testing, then, I’m going to open three sessions:

End-user session – a session to execute some carefully designed queries.
cdb$root SYS session – a session to flush stats from memory to disc.
PDB SYS session – a session to show the effects of the end-user activity (reporting figures from x$keiut_info, x$keiut, and dba_index_usage)

I’ll be running some simple tests, covering select, insert, update, delete and merge statements with single-column indexes, multi-column indexes, locally partitioned indexes, single table queries, nested loop joins, range scans, fast full scans, skip scans, inlist iterators, union views, stats collection and referential integrity. For each test I’ll describe how the index will be used, then show what the stats look like. Given that what we really need to see are the changes in x$keiut and dba_index_usage I’ll only show the complete “before / after” values in one example here. In part 3 of the series you’ll have to trust that I can do the arithmetic and report the changes correctly.

Example

From the end-user session I have a table created with the following code:

rem
rem     Script:         iut_01.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Mar 2024
rem     Purpose:        
rem
rem     Last tested 
rem             19.11.0.0

create table t1 as
select
        rownum                                  id,
        mod(rownum-1,10000)                     n1,
        trunc((rownum - 1)/20)                  col1,
        trunc((rownum - 1)/10)                  col2,
        rownum - 1                              col3,
        cast(rpad(rownum,25) as varchar2(25))   v1,
        cast(rpad('x',80,'x') as varchar2(80))  padding
from
        all_objects
where
        rownum <= 50000
/

create index t1_pk on t1(id);
create index t1_n1 on t1(n1);
create index t1_i1 on t1(col1, col2, col3);

From the cdb$root logged on as oracle (executable owner) and connected as SYS:

SQL> startup force
ORACLE instance started.

Total System Global Area 1476391568 bytes
Fixed Size                  9134736 bytes
Variable Size             822083584 bytes
Database Buffers          637534208 bytes
Redo Buffers                7639040 bytes
Database mounted.
Database opened.
SQL> oradebug setorapname mmon
Oracle pid: 31, Unix process pid: 27738, image: oracle@linux19c (MMON)
SQL> oradebug call keiutFlush
Function returned 0

From an ordinary O/S user, connected to the PDB as SYS:

SQL> select index_stats_collection_type, alloc_elem_count, active_elem_count, flush_count, last_flush_time from x$keiut_info;

INDEX_STATS_COLLECTION_TYPE ALLOC_ELEM_COUNT ACTIVE_ELEM_COUNT FLUSH_COUNT LAST_FLUSH_TIME
--------------------------- ---------------- ----------------- ----------- ----------------------------
                          1                0                 0           2 19-MAR-24 10.53.50.584 PM

1 row selected.

SQL> alter session set "_iut_stat_collection_type"=all;

Session altered.

SQL> select index_stats_collection_type, alloc_elem_count, active_elem_count, flush_count, last_flush_time from x$keiut_info;

INDEX_STATS_COLLECTION_TYPE ALLOC_ELEM_COUNT ACTIVE_ELEM_COUNT FLUSH_COUNT LAST_FLUSH_TIME
--------------------------- ---------------- ----------------- ----------- ----------------------------
                          0                0                 0           2 19-MAR-24 10.53.50.584 PM

1 row selected.

Note how the index_stats_collection_type changes from 1 to 0 after the “alter session”. I don’t know why the flush_count showed up as 2 when I had only flushed once – but perhaps the second flush is a side effect of altering the collection type.

From an ordinary end-user session

SQL> set feedback only
SQL> select n1 from t1 where id between 1 and 5;

5 rows selected.

SQL> select n1 from t1 where id between 1 and 5;

5 rows selected.

SQL> select n1 from t1 where id between 1 and 50;

50 rows selected.

These queries will use the index t1_pk in an index range scan to access the table by rowid.

From the PDB / SYS

SQL> select index_stats_collection_type, alloc_elem_count, active_elem_count, flush_count, last_flush_time from x$keiut_info;

INDEX_STATS_COLLECTION_TYPE ALLOC_ELEM_COUNT ACTIVE_ELEM_COUNT FLUSH_COUNT LAST_FLUSH_TIME
--------------------------- ---------------- ----------------- ----------- ----------------------------
                          0                1                 1           2 19-MAR-24 10.53.50.584 PM

1 row selected.

SQL> select objnum, objname, num_starts, num_execs, rows_returned from x$keiut;

    OBJNUM OBJNAME                          NUM_STARTS  NUM_EXECS ROWS_RETURNED
---------- -------------------------------- ---------- ---------- -------------
    208077 TEST_USER.T1_PK                           3          3            60

1 row selected.

In the x$keiut_info you can see that Oracle has now allocated one “element”, and has one “active” element. Checking x$keiut (which will report some details of each active element) we can see that my t1_pk index has been used in 3 statement executions, starting a scan a total of 3 times (which matches our expectation) with a total of 60 (= 5 + 5 + 50) rows returned. Of course all we could infer from this one row is that we have returned an average of 20 rows per start, and an average of one start per execution.

From the cdb$root SYS

SQL> oradebug call keiutFlush
Function returned 0

From the PDB SYS (using Tom Kyte’s “print_table”)

SQL> execute print_table(q'[select * from dba_index_usage where name = 'T1_PK' and owner = 'TEST_USER']')
OBJECT_ID                      : 208077
NAME                           : T1_PK
OWNER                          : TEST_USER
TOTAL_ACCESS_COUNT             : 3
TOTAL_EXEC_COUNT               : 3
TOTAL_ROWS_RETURNED            : 60
BUCKET_0_ACCESS_COUNT          : 0
BUCKET_1_ACCESS_COUNT          : 0
BUCKET_2_10_ACCESS_COUNT       : 2
BUCKET_2_10_ROWS_RETURNED      : 10
BUCKET_11_100_ACCESS_COUNT     : 1
BUCKET_11_100_ROWS_RETURNED    : 50
BUCKET_101_1000_ACCESS_COUNT   : 0
BUCKET_101_1000_ROWS_RETURNED  : 0
BUCKET_1000_PLUS_ACCESS_COUNT  : 0
BUCKET_1000_PLUS_ROWS_RETURNED : 0
LAST_USED                      : 19-mar-2024 23:08:02

From the data saved in the table we can see that we’ve logged 3 accesses, of which 2 accesses returned (individually) something between 2 and 10 rows (rowids) for a total of 10 rows (5 + 5) and one access returned (individually) something between 11 and 100 rows (rowids) for a total of 50 rows.

Of course we can say confidently that the one larger access actually did return 50 rows; but looking at nothing but these figures we can’t infer that the other two access returned 5 rows each, it could have been one query returning 2 rows and the other returning 8, or 3 and 7, or 4 and 6, but we do get a reasonable indication of the volume of data from the breakdown of 0, 1, 2 – 10, 11 – 100, 101 – 1000, 1000+

You might note that we can also see our flush time (reported below) reappearing as the last_used date and time – so we know that we are looking at current statistics.

From the PDB / SYS (again)

SQL> select index_stats_collection_type, alloc_elem_count, active_elem_count, flush_count, last_flush_time from x$keiut_info;

INDEX_STATS_COLLECTION_TYPE ALLOC_ELEM_COUNT ACTIVE_ELEM_COUNT FLUSH_COUNT LAST_FLUSH_TIME
--------------------------- ---------------- ----------------- ----------- ----------------------------
                          0                1                 1           3 19-MAR-24 11.08.02.013 PM

1 row selected.

SQL> select objnum, objname, num_starts, num_execs, rows_returned from x$keiut;

    OBJNUM OBJNAME                          NUM_STARTS  NUM_EXECS ROWS_RETURNED
---------- -------------------------------- ---------- ---------- -------------
    208077 TEST_USER.T1_PK                           0          0             0

1 row selected.

The x$keiut_info shows that a third flush has taken place (and any index flushed at that time will have its last_used set very near to that flush time – the merge command uses sysdate, so the last_used could be a tiny bit after the last_flush_time). It still shows an “active” element and when we check x$keiut we find that t1_pk is still listed but the stats have been reset to zero across the board.

If we were to repeat the flush command the active count would drop to zero and the t1_pk entry would disappear from x$keiut. (Oracle doesn’t remove an element until an entire tracking period has passed with no accesses – a typical type of “lazy” strategy aimed at avoiding unnecessary work.)

That’s all for now – if there are any questions put them in the comments and if their answers belong in this note I’ll update the note.

Index Usage – 1

Fri, 2024-03-15 04:21

In 12.2 Oracle introduced Index Usage Tracking to replace the previous option for “alter index xxx monitoring usage”. A recent post on the Oracle database discussion forum prompted me to look for articles about this “new” feature and what people had to say about it. There didn’t seem to be much information online – just a handful of articles starting with Tim Hall a few years ago and ending with Maria Colgan a few months ago – so I thought I’d update my notes a little and publish them.

Unfortunately, by the time I’d written the first 6 pages it was starting to feel like very heavy going, so I decided to rewrite it as a mini-series. In part one I’ll just give you some descriptions and explanations that are missing from the manuals; in part two I’ll do a bit of a fairly shallow dive to talk about what’s happening behind the scenes and how you can do some experiments; in part three I’ll describe some of the experiments and show the results that justify the descriptions I’ve given here in part one.

History

In the bad old days you could enable “monitoring” on an index to see if it was being used. The command to do this was:

alter index {index name} monitoring usage;

After executing this statement you would wait for a bit then check the view dba_object_usage:

SQL> desc dba_object_usage
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 OWNER                         NOT NULL VARCHAR2(128)
 INDEX_NAME                    NOT NULL VARCHAR2(128)
 TABLE_NAME                    NOT NULL VARCHAR2(128)
 MONITORING                             VARCHAR2(3)
 USED                                   VARCHAR2(3)
 START_MONITORING                       VARCHAR2(19)
 END_MONITORING                         VARCHAR2(19)

SQL> select * from dba_object_usage;

OWNER           INDEX_NAME           TABLE_NAME                MON USE START_MONITORING    END_MONITORING
--------------- -------------------- ------------------------- --- --- ------------------- -------------------
TEST_USER       T2_I1                T2                        YES YES 03/12/2024 15:31:35

1 row selected.

As you can see, this didn’t give you much information – just “yes it has been used” or “no it hasn’t been used” since the moment you started monitoring it; and that’s almost totally useless as an aid to measuring or understanding the effectiveness of the index.

Apart from the almost complete absence of information, there were collateral issues: I think that, initially, gathering stats, index rebuilds and using explain plan would flag an index as used; at the opposite extreme indexes that were actually used to avoid foreign key locking problems were not flagged as used.

And now for something completely different

The promise of Index Usage Tracking is clearly visible in the description of the view you use to report the details captured:

SQL> desc dba_index_usage
 Name                                Null?    Type
 ----------------------------------- -------- ------------------------
 OBJECT_ID                           NOT NULL NUMBER
 NAME                                NOT NULL VARCHAR2(128)
 OWNER                               NOT NULL VARCHAR2(128)
 TOTAL_ACCESS_COUNT                           NUMBER
 TOTAL_EXEC_COUNT                             NUMBER
 TOTAL_ROWS_RETURNED                          NUMBER
 BUCKET_0_ACCESS_COUNT                        NUMBER
 BUCKET_1_ACCESS_COUNT                        NUMBER
 BUCKET_2_10_ACCESS_COUNT                     NUMBER
 BUCKET_2_10_ROWS_RETURNED                    NUMBER
 BUCKET_11_100_ACCESS_COUNT                   NUMBER
 BUCKET_11_100_ROWS_RETURNED                  NUMBER
 BUCKET_101_1000_ACCESS_COUNT                 NUMBER
 BUCKET_101_1000_ROWS_RETURNED                NUMBER
 BUCKET_1000_PLUS_ACCESS_COUNT                NUMBER
 BUCKET_1000_PLUS_ROWS_RETURNED               NUMBER
 LAST_USED                                    DATE

Though the columns are not very well described in the reference manuals you can see very clearly that there’s a lot more detail than just “yes/no” here. The columns clearly carry information about “how many times” and “how much data”, breaking the numbers down across a small range-based histogram. Here’s an example of output (using Tom Kyte’s print_table() routine to turn columns to rows):

SQL> execute print_table('select * from dba_index_usage where name = ''T1_I1''')
OBJECT_ID                      : 206312
NAME                           : T1_I1
OWNER                          : TEST_USER
TOTAL_ACCESS_COUNT             : 889
TOTAL_EXEC_COUNT               : 45
TOTAL_ROWS_RETURNED            : 17850
BUCKET_0_ACCESS_COUNT          : 0
BUCKET_1_ACCESS_COUNT          : 0
BUCKET_2_10_ACCESS_COUNT       : 0
BUCKET_2_10_ROWS_RETURNED      : 0
BUCKET_11_100_ACCESS_COUNT     : 878
BUCKET_11_100_ROWS_RETURNED    : 13200
BUCKET_101_1000_ACCESS_COUNT   : 9
BUCKET_101_1000_ROWS_RETURNED  : 1650
BUCKET_1000_PLUS_ACCESS_COUNT  : 2
BUCKET_1000_PLUS_ROWS_RETURNED : 3000
LAST_USED                      : 11-mar-2024 20:26:26

The order of the columns is just a little odd (in my opinion) so I’ve switched two of them around in my descriptions below:

Total_exec_count: is the total number of executions that have been captured for SQL statements using this index.
Total_access_count: is the total number of scans of this index that have been observed. If you think of a nested loop join you will appreciate that a single execution of an SQL statement could result in many accesses of an index – viz: an index range scan into the inner (second) table may happen many times, once for each row acquired from the outer (first) table.
Total_rows_returned: carries a little trap in the word rows, and in the word returned. In this context “rows” means “index entries”, and “returned” means “passed to the parent operation”. (To be confirmed / clarified)
Bucket_0_access_count: how many index accesses found no rows and there’s no bucket_0_row_count needed because it would always be 0).
Bucket_1_access_count: how many index accesses found just one row (and there’s no bucket_1_row_count because that would always match the access count).
Bucket_M_N_access_count: how many index accesses found between M and N rows.
Bucket_M_N_row_count: sum of rows across all the index accesses that returned between M and N rows.
Last_used: date and time of the last flush that updated this row of the table/view.

The most important omission in the descriptions given in the manuals is the difference between total_exec_count and total_access_count. (It was a comment on Maria Colgan’s blog note asking about the difference that persuaded me that I really had to write this note.) If you don’t know what an “access” is supposed to be you can’t really know how to interpret the rest of the numbers.

Take another look at the sample output above, it shows 45 executions and 889 accesses – I happen to know (because I did the test) that most of the work I’ve done in this interval has been reporting a two-table join that uses a nested loop from t2 into t1 using an index range scan on index t1_i1 to access table t1. I know my data well enough to know that every time I run my query it’s going to find about 20 rows in t2, and that for every row I find in t2 there will be roughly 15 rows that I will access in t1 through the index.

Give or take a little extra activity round the edges that blur the numbers I can see that the numbers make sense:

45 executions x 20 rows from t2 = 900 index range scans through t1_i1
878 index ranges scans x 15 rows per scan = 13,170

The numbers are in the right ball-park to meet my expectations. But we do have 11 more accesses reported – 9 of them reported an average of 1,650/9 = 183 rows, 2 of them reported an average of 3,000/2 = 1500 rows. Again, I know what I did, so I can explain why those numbers have appeared, but in real life you may have to do a little work to find a reasonable explanation (Spoilers: be suspicious about gathering index stats)

It’s possible, for example, that there are a few rows in the t2 table that have far more than the 15 row average in t1 and the larger numbers are just some examples from the nested loop query that happened to hit a couple of these outliers in t2. (It’s worth highlighting, as a follow-up to this suggestion, that a single execution could end up reporting accesses and row counts in multiple buckets.)

In fact the 9 “medium sized” access were the result of single table queries using a “between” clause that ranged through 10 to 15 values of t1 (returning 150 to 225 rows each), and the two “large” accesses were the result of two index-only queries where I forced an index full scan and an index fast full scan that discarded half the rows of an index holding 3,000 entries.

As I said, I’ll be presenting a few examples in part 3, but a guideline that may be helpful when considering the executions, accesses, and rowcounts is this: if you’re familiar with the SQL Monitor report then you’ll know that each call to dbms_sql_monitor.report_sql_monitor() reports one execution – then the Starts column for any index operation will (probably) be the total access count, and the Rows (Actual) column will (probably) be the total rows returned. As noted above, though, any one execution may end up splitting the total Starts and Rows (Actual) across multiple buckets.

Some questions to investigate

I hope this has given you enough information to get you interested in Index Usage Tracking, and some idea of what you’re looking at when you start using the view. There are, however, some technical details you will need to know if you want to do some testing before taking any major steps in production. There are also some questions that ought to be addressed before jumping to conclusions about what the numbers mean, so I thought I’d list several questions that came to mind when I first read about the feature:

Does a call to dbms_stats.gather_index_stats result in an update to the index usage stats, and does it matter?
Does a call to explain plan result in an update to the index usage stats, and does it matter.
Do referential integrity checks result in the parent or child indexes being reported in the usage stats. What if there is a parent delete with “on delete cascade” on the child.
Do inserts, updates, deletes or merges produce any unexpected results (e.g. double / quadruple counting); what if they’re PL/SQL forall bulk processing, what if (e.g.) you update or delete through a join view.
Does an index skip scan count as a single access, or does Oracle count each skip as a separate access (I’d hope it would be one access).
If you have an index range scan with a filter predicate applied to each index entry after the access predicate is the “rows returned” the number of index entries examined (accessed), or the number that survive the filter. (I would like it to be the number examined because that’s the real measure of the work done in the index but the name suggests it counts the survivors.)
Does an index fast full scan get reported correctly.
Are IOTs accounted differently from ordinary B-tree indexes
For bitmap indexes what is a “row” and what does the tracking information look like?
If you have an “Inlist Iterator” operation does this get summed into one access, or is it an access per iteration (which is what I would expect). And how is the logic applied with partitioned table iteration.
Does a UNION or UNION ALL operation count multiple accesses (I would expect so), and what happens with things like nvl_or_expansion with “conditional” branches.
Does a “connect by pump” through an index produce any unexpected results
Can index usage tracking tell us anything about Domain indexes
Are there any types of indexes that are not tracked (sys indexes, for example)

If you can think of any other questions where “something different” might happen, feel free to add them as comments.

Summary

Index Usage Tracking (and the supporting view dba_index_usage) can give you a good insight into how Oracle is using your indexes. This note explains the meaning of data reported in the view and a couple of ideas about how you may need to interpret the numbers for a single index.

In the next two articles we’ll look at some of the technical aspects of the feature (including how to enable and test it), and the results captured from different patterns of query execution, concluding (possibly in a 4th article) in suggestions of how to use the feature in a production system.

Footnote

At the start of this note I said it had been prompted by a question on one of the Oracle forums. The thread was about identifying indexes that could be dropped and the question was basically: “Is the old index monitoring obsolete?” The answer is “Yes, definitely, and it has been for years.”

Querying LONGs

Wed, 2024-03-13 09:19

Update for 23c: If your only need for using LONGs in predicates is to query the partitioning views by high_value you won’t need to read this note as the views now expose columns high_value_clob and high_value_json. (See comment #3 below.)

Despite their continued presence in the Oracle data dictionary, LONG columns are not an option that anyone should choose; you can’t do much with them and they introduce a number of strange space management problems. Nevertheless a recent thread on the Oracle database forum started with the following question: “How do you use LONG columns in a WHERE clause?”. The basic answer is: “You don’t”.

This user wanted to query all_tab_partitions for a given table_name and high_value, and the high_value is (still) a LONG, so attempts to use it resulted in Oracle error “ORA-00997: illegal use of LONG datatype”. A possible, and fairly obvious but undesirable, solution to the requirement is to write a PL/SQL function to read the current row from all_tab_partitions and returns the first N characters of the high_value as a varchar2(). Here’s a version (not quite the one I posted) of such a function, with a sample of use:

rem
rem     Script:         get_high_value.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Mar 2024
rem 

create or replace function get_high_value (
        i_tab_owner varchar2,
        i_tab_name varchar2, 
        i_part_name varchar2,
        i_part_posn number
)
return varchar2
is
        v1 varchar2(4000);
begin
        select  atp.high_value 
        into    v1 
        from    all_tab_partitions atp
        where   atp.table_owner         = upper(i_tab_owner)
        and     atp.table_name          = upper(i_tab_name)
        and     atp.partition_name      = upper(i_part_name)
        and     atp.partition_position  = upper(i_part_posn)
        ;
        return v1;
end;
/

select
        apt.table_owner, apt.table_name, 
        apt.tablespace_name, 
        apt.partition_name, apt.partition_position,
        apt.high_value
from
        all_tab_partitions      apt
where
        apt.table_owner = 'TEST_USER'
and     apt.table_name  = 'PT_RANGE'
and     get_high_value(
                apt.table_owner, 
                apt.table_name, 
                apt.partition_name,
                apt.partition_position
        ) = '200'
/

This seemed to work quite well and sufficiently rapidly – but I only had two partitioned tables in my schema and a total of 12 partitions, so it’s not sensible to look at the clock to see how efficient the query is.

Another possible solution introduced me to a function that has been around for years (and many versions) which I had never come across: sys_dburigen(). PaulZip supplied the following code (which I’ve altered cosmetically and edited to pick up a table in my schema):

select  *
from    (
        select 
                dbms_lob.substr(
                        sys_dburigen (
                                atp.table_owner, 
                                atp.table_name, 
                                atp.partition_name, 
                                atp.partition_position, 
                                atp.high_value, 
                                'text()'
                ).getclob(), 4000, 1)           high_value_str,
                atp.table_owner, atp.table_name, atp.partition_name,
                atp.tablespace_name, atp.high_value
        from    all_tab_partitions atp
        where   atp.table_owner = 'TEST_USER'
        and     atp.table_name  = 'PT_RANGE'
        )
where   high_value_str = '200'
/

This was so cute, and looked like a much nicer (i.e. “legal”) solution than my PL/SQL hacking that I had to take a closer look at sys_dburigen() – first to understand what it was supposed achieve (yes, I do RTFM) then to see how it actually worked.

Something I did first was simply to strip back the layers of the expression used to supplied the high_value_str which took me through the following four combinations (with and without ‘text’ , with and without ‘get_clob’). Each expression is followed by the result for the row selected above:

sys_dburigen (atp.table_owner, atp.table_name, atp.partition_name, atp.partition_position, atp.high_value, 'text()').getclob()

200

---

sys_dburigen (atp.table_owner, atp.table_name, atp.partition_name, atp.partition_position, atp.high_value).getclob()

<?xml version="1.0"?><HIGH_VALUE>200</HIGH_VALUE>

--

sys_dburigen (atp.table_owner, atp.table_name, atp.partition_name, atp.partition_position, atp.high_value, 'text()')

DBURITYPE('/PUBLIC/ALL_TAB_PARTITIONS/ROW[TABLE_OWNER=''TEST_USER'' and TABLE_NAME=''PT_RANGE'' and PARTITION_NAME=''P200'' and PARTITION_POSITION=''1'']/HIGH_VALUE/text()', NULL)

--

sys_dburigen (atp.table_owner, atp.table_name, atp.partition_name, atp.partition_position, atp.high_value)

DBURITYPE('/PUBLIC/ALL_TAB_PARTITIONS/ROW[TABLE_OWNER=''TEST_USER'' and TABLE_NAME=''PT_RANGE'' and PARTITION_NAME=''P200'' and PARTITION_POSITION=''1'']/HIGH_VALUE', NULL)

Working from the bottom pair up we see that we start by generating a dburitype which defines the type of thing we want to query and the restriction we want to use while querying. The ‘text()’ option simply adds an extra detail to the dburitype.

The top pair shows us that the get_clob() will then return the value we have requested, either as an XML value, or as the text value described by the XML value if we’ve supplied the ‘text()’ option.

Our call to sys_dburigen() has specified an object we want to access, and 4 columns in that object that will identify a unique row in that object, and a fifth column that we want returned either as an XML value or as a text value.

Tracing

I actually worked through the analysis in the opposite direction to the one I’ve been showing. When the call to sys_dburigen() I suspected that it might be doing the same thing as my PL/SQL function call, so I ran the two queries with SQL tracing enabled to see what activity took place at the database.

Ignoring driving query against all_tab_partitions the content of the PL/SQL trace was basically 3 executions (I had 3 partitions in the pt_range table) of:

SELECT ATP.HIGH_VALUE
FROM
 ALL_TAB_PARTITIONS ATP WHERE ATP.TABLE_OWNER = UPPER(:B4 ) AND
  ATP.TABLE_NAME = UPPER(:B3 ) AND ATP.PARTITION_NAME = UPPER(:B2 ) AND
  ATP.PARTITION_POSITION = UPPER(:B1 )

The content of the sys_dburigen() trace was 3 executions of a query like:

SELECT alias000$."HIGH_VALUE" AS HIGH_VALUE
FROM
 "ALL_TAB_PARTITIONS" alias000$ WHERE 1 = 1 AND ((((alias000$."TABLE_OWNER"=
  'TEST_USER') AND (alias000$."TABLE_NAME"='PT_RANGE')) AND
  (alias000$."PARTITION_NAME"='P200')) AND (alias000$."PARTITION_POSITION"=
  '1'))

Note particularly the literal values in the predicates in lines 4, 5 and 6. This version of the code has to generate and optimise (hard-parse) a new SQL statement for every partition in the table referenced in the driving query. For a table with a large number of partitions, and a system with a large number of partitioned tables, the disruption of shared pool that this might cause could be severe if (as the user said at one point) “we will be frequently selecting from all_tab_partitions”. [Damage limitation: if the session sets cursor_sharing to FORCE temporarily then the generated SQL will be subject to bind variable substitution; but that’s not an ideal workaround.]

Summary

Using LONG columns in SQL predicates is not nice – and not likely to be efficient – but there are ways of working around the limitations of LONGs. It’s undesirable to use PL/SQL that calls SQL inside a SQL statement, but we can use a PL/SQL function to return a string from a LONG in the current row – and since that’s pretty much what Oracle seems to be doing with its call to sys_dburigen() it’s hard to insist that the PL/SQL strategy is inappropriate. (But maybe the call to sys_dburigen() in this context would be considered an abuse of a feature anyway – even though it seems much more elegant and flexible once you’ve learned a little about how it works.)

Footnote

As another detail on analysing the cost/benefit of different approaches – it would be possible to avoid creating the pl/sql function by embedding it in the SQL as a “with function” clause:

with function get_high_value (
        i_tab_owner varchar2,
        i_tab_name varchar2, 
        i_part_name varchar2,
        i_part_posn number
)
return varchar2
is
        v1 varchar2(4000);
begin
        select  atp.high_value 
        into    v1 
        from    all_tab_partitions atp
        where   atp.table_owner         = upper(i_tab_owner)
        and     atp.table_name          = upper(i_tab_name)
        and     atp.partition_name      = upper(i_part_name)
        and     atp.partition_position  = upper(i_part_posn)
        ;
        return v1;
end;
select
        apt.table_owner, apt.table_name, 
        apt.tablespace_name, 
        apt.partition_name, apt.partition_position,
        apt.high_value
from
        all_tab_partitions      apt
where
        apt.table_owner = 'TEST_USER'
and     apt.table_name  = 'PT_RANGE'
and     get_high_value(
                apt.table_owner, 
                apt.table_name, 
                apt.partition_name,
                apt.partition_position
        ) = '200'
/

Footnote 2

I have asked the user why they want to query all_tab_partitions by high_value since it seems to be a slightly odd thing to do and there may be a better way of doing whatever it is that this query is supposed to support. They haven’t responded to the question, so I’ll take a guess that they want to rename (or move etc.) partitions that they don’t know the name for – perhaps because they are using interval partitioning or automatic list partitioning. If the guess is correct then the solutions offered are irrelevant – you don’t need to know the name of a partition to manipulate it, you need only know some value that is a legal member of the partition:

SQL> select partition_name from user_tab_partitions where table_name = 'PT_RANGE' order by partition_position;

PARTITION_NAME
----------------------
P200
P400
P600

3 rows selected.

SQL> alter table pt_range rename partition for (199) to pt_0200;

Table altered.

SQL> select partition_name from user_tab_partitions where table_name = 'PT_RANGE' order by partition_position;

PARTITION_NAME
----------------------
PT_0200
P400
P600

3 rows selected.

Missing Partition

Tue, 2024-03-12 07:20

Here’s a silly little detail about execution plans on (interval) partitioned tables that I hadn’t noticed until it showed up on this thread on a public Oracle forum: it’s an execution plan that claims that Oracle will be visiting a partition that clearly won’t be holding the data requested.

Here’s the starting section of a demonstration script – mostly by Solomon Yakobson with minor tweaks and additions from me:

rem
rem     Script:         non_existent_partition.sql
rem     Author:         Solomon Yakobson / Jonathan Lewis
rem     Dated:          Mar 2024
rem
rem     Last tested 
rem             19.11.0.0
rem

create table invoices_partitioned(
        invoice_no   number not null,
        invoice_date date   not null,
        comments     varchar2(500)
)
partition by range (invoice_date)
interval (interval '3' month)
(
        partition invoices_past values less than (date '2023-01-01')
);

insert into invoices_partitioned
select  level,
        date '2023-01-01' + numtoyminterval(3 * (level - 1),'month'),
        null
from  dual
connect by level <= 6
/

insert into invoices_partitioned select * from invoices_partitioned;
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/

commit
/

execute dbms_stats.gather_table_stats(user,'invoices_partitioned')

set linesize 156
column high_value format a80

select  partition_position, num_rows,
        partition_name,
        high_value
  from  user_tab_partitions
  where table_name = 'INVOICES_PARTITIONED'
  order by partition_position
/

alter table invoices_partitioned drop partition for (date'2023-09-01');
purge recyclebin;

select  partition_position, num_rows,
        partition_name,
        high_value
  from  user_tab_partitions
  where table_name = 'INVOICES_PARTITIONED'
  order by partition_position
/

The script creates an interval partitioned table, with an interval of 3 months, then inserts 131,072 rows per partition (the strange re-execution of “insert into x select from x” was my lazy way of increasing the volume of data from the original one row per partition without having to think too carefully.

After creating the data we report the partition names and high values in order, then drop (and purge) the partition that should hold the value 1st Sept 2023 (which will be the partition with the high_value of 1st Oct 2023) and report the partition names and high values again so that you can see the “gap” in the high values and the adjustment to the partition_position values. Here are the “before” and “after” outputs:

PARTITION_POSITION   NUM_ROWS PARTITION_NAME         HIGH_VALUE
------------------ ---------- ---------------------- --------------------------------------------------------------------------------
                 1          0 INVOICES_PAST          TO_DATE(' 2023-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 2     131072 SYS_P39375             TO_DATE(' 2023-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 3     131072 SYS_P39376             TO_DATE(' 2023-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 4     131072 SYS_P39377             TO_DATE(' 2023-10-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 5     131072 SYS_P39378             TO_DATE(' 2024-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 6     131072 SYS_P39379             TO_DATE(' 2024-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 7     131072 SYS_P39380             TO_DATE(' 2024-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA

7 rows selected.



PARTITION_POSITION   NUM_ROWS PARTITION_NAME         HIGH_VALUE
------------------ ---------- ---------------------- --------------------------------------------------------------------------------
                 1          0 INVOICES_PAST          TO_DATE(' 2023-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 2     131072 SYS_P39375             TO_DATE(' 2023-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 3     131072 SYS_P39376             TO_DATE(' 2023-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 4     131072 SYS_P39378             TO_DATE(' 2024-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 5     131072 SYS_P39379             TO_DATE(' 2024-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
                 6     131072 SYS_P39380             TO_DATE(' 2024-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA

6 rows selected.

Now we check the execution plan for a query that would have accessed the partition we’ve just dropped:

explain plan for
select  *
  from  invoices_partitioned
  where invoice_date = date '2023-09-01';

select  *
  from  dbms_xplan.display();


PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------
Plan hash value: 1148008570

---------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name                 | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                      |   109K|  1173K|   104   (1)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE|                      |   109K|  1173K|   104   (1)| 00:00:01 |     4 |     4 |
|*  2 |   TABLE ACCESS FULL    | INVOICES_PARTITIONED |   109K|  1173K|   104   (1)| 00:00:01 |     4 |     4 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("INVOICE_DATE"=TO_DATE(' 2023-09-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

The execution plans says it’s going to visit partition number 4 (pstart/pstop) – which we know will definitely cannot be holding any relevant data. If this were an ordinary range-partitioned table – as opposed to interval partitioned – it would be the correct partition for 1st Sept 2024, of course, but it isn’t, so it feels like the pstart/pstop ought to say something like “non-existent” and all the numeric estimates should be zero.

A quick trick for making an interval partition appear without inserting data into it is to issue a “lock table … partition for () …” statement (See footnote to this blog note). I did wonder if the attempt to explain a plan that needed a non-existent partition had actually had the same effect of making Oracle create the partition, so I ran the query against user_tab_partitions again just to check that this hadn’t happend.

So what’s going to happen at run-time: is this an example of “explain plan” telling us a story that’s not going to match what shows up in v$sql_plan (dbms_xplan.display_cursor). Let’s run the query (with rowsource execution stats enabled) and find out:

set serveroutput off
alter session set statistics_level = all;
alter session set "_rowsource_execution_statistics"=true;

select  *
  from  invoices_partitioned
  where invoice_date = date '2023-09-01';


select  *
  from  table(dbms_xplan.display_cursor(format=>'allstats last partition'));


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------
SQL_ID  d42kw12htubhn, child number 0
-------------------------------------
select  *   from  invoices_partitioned   where invoice_date = date
'2023-09-01'

Plan hash value: 1148008570

---------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name                 | Starts | E-Rows | Pstart| Pstop | A-Rows |   A-Time   |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                      |      1 |        |       |       |      0 |00:00:00.01 |
|   1 |  PARTITION RANGE SINGLE|                      |      1 |    109K|     4 |     4 |      0 |00:00:00.01 |
|*  2 |   TABLE ACCESS FULL    | INVOICES_PARTITIONED |      0 |    109K|     4 |     4 |      0 |00:00:00.01 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("INVOICE_DATE"=TO_DATE(' 2023-09-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

It’s the same plan with the same “wrong” partition identified, and the same estimate for rows returned – but the access never actually happened: Starts = 0 on the table access full.

My hypothesis about this misleading reporting is that Oracle knows from the table definition everything about every partition that might eventually exist – the high_value for the “anchor” partition is known and the interval is known so the appropriate partition number for any partition key value can be derived. Then, at some point, a disconnect appears between the theoretical partition position and the set of physically instantiated partitions, so the optimizer gets the message “theoretically it’s in the 4th partition” and collects the stats from “partition_position = 4” to do the arithmetic and produce the plan.

Fortunately there’s some metadata somewhere that means the run-time engine doesn’t try to access the wrong partition, so this little glitch doesn’t really matter for this simple query – beyond its ability to cause a little confusion.

It’s possible, though, that this behaviour leaves the optimizer with another reason for getting the arithmetic wrong and picking the wrong path if you have a number of “missing” partitions in an interval partitioned table that you’re querying with a range-based predicate that crosses several (non-existent) partitions. So treat this as a warning/clue if you recognise that pattern in any of your partitioned table.

Disable oradebug

Tue, 2024-03-12 05:43

Here’s a little detail that I discovered recently when I wanted to do something a little exotic in 23c on the official Virtualbox VM. There’s been a hidden parameter to disable oradebug since (at least) 12.2.0.1. The clue is in the name:

_disable_oradebug_commands

The valid values for this parameter are none, restricted, and all. The default value in 12c, 18c, and 19c is none; the default value in 21.3 and 23.3 is restricted. This means some of the investigative code you may have used in the earlier versions may produce an error in the newer versions. To change the value you have to restart the database. Here are a couple of the error messages you might see:

_disable_oradebug_commands=all

SQL> oradebug setmypid
ORA-32519: insufficient privileges to execute ORADEBUG command: execution of ORADEBUG commands is disabled for this instance
SQL> 


_disable_oradebug_commands=restricted

SQL> oradebug setmypid
Statement processed.

SQL> oradebug peek 0x000000008E65E1D8 16
[08E65E1D8, 08E65E1E8) = 99FC0501 00000004 8E519690 00000000

SQL> oradebug call keiutFlush
ORA-32519: insufficient privileges to execute ORADEBUG command: execution of ORADEBUG commands is disabled for this instance

SQL>

The “restricted” example above is the one that led me to the parameter when I was testing 23c. However, setting the value to “none” in the spfile and restarting the instance didn’t help. This is what I saw when I tried running my code from a PDB:

SQL> oradebug setmypid
Statement processed.
SQL> oradebug call keiutFlush
ORA-32519: insufficient privileges to execute ORADEBUG command: OS debugger privileges required for client
SQL>

I have no idea how to give myself OS debugger privileges. It’s a nuisance, but since I.m running the whole thing from the server and have to have at least two sessions active for the testing I’m doing, it’s not a big issue to have one more running from the oracle (OS account) connected to the cdb$root to call the flush command.

Footnote

To check for the list of valid values for string parameters, you can query v$parameter_valid_values, but that view won’t show you the “underscore” parameters (the commented where clause above is the text in v$fixed_view_definition that allows Oracle to hide the hidden parameter). To see the valid values for the hidden parameters you need to access the x$ structure underneath the v$, and I wrote a little script (that has to be run by sys) to do that a long time ago.

Missing SQL_ID

Wed, 2024-03-06 06:13

A recent (Mar 2024) question on the MOSC DB Admin forum (needs an account) asks why a query which runs every few minutes and executes 25 times in 2 hours according to an AWR report never shows up in v$session when you query for the SQL_ID.

SQL> select * from V$session where sql_id = '0qsvkwru0agwg';  

no rows selected

There are a few reasons why you might be unlucky with the timing but it seems likely that this query, if repeated very frequently for a couple of minutes, ought to be able to capture details of the session executing it, especially since the (edited) line supplied from the “SQL ordered by Elapsed Time” section of the AWR shows the average execution time of the query to be 1.93 seconds. There is, however, an obvious clue about why the OP is out of luck.

The OP has “obfuscated” the sql_id in question – which is a fairly pointless exercise since it’s not possible to reverse engineer an sql_id back to the originating text. I mention this simply because the supplied “sql_id” is 6 letters long and not the 13 that every sql_id (and my fake above) should be – it’s the type of tiny detail that leads to suggestions like: “maybe the OP just didn’t copy the sql_id correctly when querying v$session”.

Take a look at the fragment of “SQL Text” reported with the sql_id:

DECLARE V_SEUIL VARCHAR2(2) :=

This is not an SQL statement, it’s the start of an anonymous PL/SQL block. This might be the reason why the sql_id is (statistically speaking) never spotted in v$session or v$active_session_history; it also tells us what the OP probably ought to be looking for … the SQL that’s executing inside the PL/SQL, which can be found by following the link that the AWR will have supplied to the corresponding entry in the “Complete List of SQL Text”. Allowing for the way that embedded SQL is “normalised” to all capitals with excess white space removed, the text of the PL/SQL will provide some text that the OP can use to search the rest of the AWR report for more appropriate SQL ids.

Examples

Just to demonstrate the point, and note some side effects of SQL running inside PL/SQL, here’s a demonstration script with three variations on the theme; followed by the critical part of the output and a separate section with the corresponding “SQL ordered by Elapsed Time” extracts from the AWR reports generated after running the test on 11.2.0.4 (because that was the version being used by the OP):

rem
rem     Script:         awr_plsql.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Mar 2024
rem     Purpose:        
rem
rem     Last tested 
rem             11.2.0.4
rem

create table t1 as select * from all_objects where rownum <= 10000;
create table t2 as select * from t1;

execute dbms_workload_repository.create_snapshot('TYPICAL');

set timing on
set feedback off

declare
        ct number;
begin
        select
                /*+
                        leading(t1 t2)
                        use_nl(t2)
                */
                count(*)
        into    ct
        from
                t1, t2
        where
                t2.object_id = t1.object_id
        ;

        dbms_output.put_line('Count: ' || ct);
end;
/

set timing off
execute dbms_workload_repository.create_snapshot('TYPICAL');

set timing on
declare
        ct      number := 0;
        c_tot   number := 0;
begin
        for r in (select * from t1) loop
                select  count(*) 
                into    ct
                from    t2
                where   t2.object_id = r.object_id
                ;
                c_tot := c_tot + ct;
        end loop;
        dbms_output.put_line('Count: ' || c_tot);
end;
/

set timing off
execute dbms_workload_repository.create_snapshot('TYPICAL');

set timing on
declare
        ct      number := 0;
        c_tot   number := 0;
begin
        for r in (
                select
                        /*+
                                leading(t1 t2)
                                use_nl(t2)
                        */
                        t1.object_id
                from    t1,t2
                where   t2.object_id = t1.object_id
        ) loop
                select  count(*) 
                into    ct
                from    t2
                where   t2.object_id = r.object_id
                ;
                c_tot := c_tot + ct;
        end loop;
        dbms_output.put_line('Count: ' || c_tot);
end;
/

set timing off
execute dbms_workload_repository.create_snapshot('TYPICAL');


Output
--------
Count: 10000
Elapsed: 00:00:13.70

Count: 10000
Elapsed: 00:00:10.35

Count: 10000
Elapsed: 00:00:24.81

The three anonymous blocks do a lot of pointless work in PL/SQL: the first statement forces a simple nested loop join using full tablescans over two tables of 10,000 rows, the second statement “emulates” this using a PL/SQL loop over the first table, scanning the second table once per row; the third statement combines both pieces of idiocy, driving through the nested loop then scanning the second table once per row returned. It’s not really surprising that the three times recorded are in the order of seconds, nor that we can see T3 (time for the third block) is approximately T1 + T2. But what do the separate AWR reports say:

Test 1 – nested loop join, block run time reported as 13.70 seconds:

        Elapsed                  Elapsed Time
        Time (s)    Executions  per Exec (s)  %Total   %CPU    %IO    SQL Id
---------------- -------------- ------------- ------ ------ ------ -------------
            13.6              1         13.63   92.3   98.1     .0 1sxdt7cswq8z0
Module: MyModule
declare ct number; begin select /*+ leading(t1 t2) use_nl(t2) */ c
ount(*) into ct from t1, t2 where t2.object_id = t1.object_id ; dbms_
output.put_line('Count: ' || ct); end;

            13.6              1         13.63   92.3   98.1     .0 502tvyky9s2ta
Module: MyModule
SELECT /*+ leading(t1 t2) use_nl(t2) */ COUNT(*) FROM T1, T2 WHERE T2.OB
JECT_ID = T1.OBJECT_ID

The anonymous PL/SQL block appears at the top of the list reporting 13.6 seconds – but that’s nearly 100% of the total time reported, and the SQL statement inside the block also reports 13.6 seconds. The actual time Oracle spent in the PL/SQL execution engine was virtually zero, but the AWR report has (in effect) double counted the time. You’ll notice that the “pure” SQL has, except for the comments/hints, been converted to upper case and the “into ct” has been removed.

Test 2 – PL/SQL emulating an inefficient nested loop join

        Elapsed                  Elapsed Time
        Time (s)    Executions  per Exec (s)  %Total   %CPU    %IO    SQL Id
---------------- -------------- ------------- ------ ------ ------ -------------
            10.2              1         10.24   90.0   98.2     .0 201ptprw6ngpq
Module: MyModule
declare ct number := 0; c_tot number := 0; begin for r in (select * from t1)
loop select count(*) into ct from t2 where t2.object_id = r.object_id
; c_tot := c_tot + ct; end loop; dbms_output.put_line('Count: ' || c_tot); e
nd;

             9.5         10,000          0.00   83.5   98.2     .0 4qwg9bknnjbr0
Module: MyModule
SELECT COUNT(*) FROM T2 WHERE T2.OBJECT_ID = :B1

Again, the total time reported exceeds that elapsed execution time from the SQL*Plus output. Again the anonymous PL/SQL block is at the top of the list, but this time the SQL accounts for noticeable less time than the PL/SQL block. The time spent in the PL/SQL engine has become visible – after all, the PL/SQL has called a SQL statement 10,000 times, and it has amended a variable value 10,000 times.

Test 3 – inefficient nested loop join driving an emulated join

-> Captured SQL account for   97.4% of Total DB Time (s):              26
-> Captured PL/SQL account for  100.1% of Total DB Time (s):              26


        Elapsed                  Elapsed Time
        Time (s)    Executions  per Exec (s)  %Total   %CPU    %IO    SQL Id
---------------- -------------- ------------- ------ ------ ------ -------------
            24.8              1         24.81   95.3   97.2     .0 a30dkwt9at2w5
Module: MyModule
declare ct number := 0; c_tot number := 0; begin for r in ( select /*+
 leading(t1 t2) use_nl(t2) */ t1.object_id from t1,t2 where t2.o
bject_id = t1.object_id ) loop select count(*) into ct from t2 where t2
.object_id = r.object_id ; c_tot := c_tot + ct; end loop; dbms_output.put_

            14.3              1         14.26   54.8   97.1     .0 877jph80b0t2x
Module: MyModule
SELECT /*+ leading(t1 t2) use_nl(t2) */ T1.OBJECT_ID FROM T1,T2 WHERE
 T2.OBJECT_ID = T1.OBJECT_ID

            10.1         10,000          0.00   38.7   97.9     .0 4qwg9bknnjbr0
Module: MyModule
SELECT COUNT(*) FROM T2 WHERE T2.OBJECT_ID = :B1

I’ve included in this example two lines from the heading of the “SQL ordered by…” section. It’s often very helpful to check for summary comments like this – and not just in SQL sections of the AWR/Statspack report – sometimes you’ll notice that some derived total is more than (maybe much more than) 100% of the figures you’re trying to address; sometimes you may notice that the “biggest, most important number” in the detail may be pretty irrelevant because the “total accounted for” is only a small fraction of the actual work done.

This example shows another fairly common pattern – a PL/SQL block where (even if you didn’t cross-check carefully at first) you might notice that there were a few SQL statements reported a little further down that conveniently summed to the total of the PL/SQL. (There’s a pretty little example of this from a production system that I published in 2007 in one of my Analysing Statspack notes).

Again you’ll notice that there’s a little extra time in the PL/SQL line than the sum of the two SQL lines – but again we have to allow for PL/SQL calling 10,000 SQL executions and summing 10,000 results. In the last two examples there’s a slightly better chance of capturing the SQL_ID of the PL/SQL block by querying v$session. In 80 samples (driven by hand) of query by the correct SID while the last example was running I saw the sql_id of the PL/SQL block in v$session.prev_sql_id once, most of the samples showed me the sql_id of the simple tablescan as the v$session.sql_id, or which roughly half showed the sql_id of the driving loop as the v$session.prev_sql_id.

Summary

If you have an AWR report that shows a PL/SQL block as an expensive item in the “SQL ordered by …” sections you may never see its sql_id in any queries you make against v$session or v$active_session_history because the component of the work done by the PL/SQL may be incredibly tiny, but the AWR is reporting the sum of the workload due to the SQL executing inside that block as if it were part of the pure PL/SQL workload.

A big hint about this appears in the summary notes above (most of) the details reports, where there may be a couple of lines telling you what percentage of the workload/time/etc. the detail has captured. Sometimes it’s vastly more than 100% (and sometimes it’s much less – which tells you that you may be looking at things that are not part of the real problem).

Object_id

Sun, 2024-02-11 12:23

This is a note that will probably be of no practical use to anyone – but someone presented me with the question so I thought I’d publish the answer I gave:

Do you know how object ID is assigned? It doesn’t look as if a sequence is used

I’m fairly sure the mechanism has changed over versions. (Including an enhancement in 23c where the object number of a dropped (and purged) object can be reused.)

I don’t really know what Oracle does, but I do know that there is an object in obj$ called “_NEXT_OBJECT” and Oracle uses that as a control mechanism for the dataobj# and obj# (data_object_id and object_id) in some way. I think Oracle uses the row a bit like the row used by a sequence in seq$ – the dataobj# is bumped by a small value (seems to be 5) each time it is reached (cp. seq$.highwater for a sequence) and it’s possible that the obj# is used to record the instance number of the instance that bumped it. The instance then (I think) has a small cache of obj# values it can use before it has to read and update the “_NEXT_OBJECT” row again.

Footnote.

You might note that this description means that it is the dataobj# that actually drives the generation of a new obj# / object_id. You can demonstrate this most easily (if you have sole access to the database) by:

creating a table,
checking its object_id and data_object_id (which will match),
moving it a couple of time (which will increment the data_object_id – and only the data_object_id – each time),
creating another table.

The second table will have an object_id that is one more than the current data_object_id of the first table.

Descending Bug

Thu, 2023-12-21 09:12

This is another example of defects in the code to handle descending columns in indexes, with the added feature that the problem is addressed somewhere between 19.4 and 19.10 (it’s present in 19.3, gone in 19.11) – which means that if you upgrade to a recent RU of from some of earlier versions some of your production code may return rows in a different order. On the plus side, it will be the correct order rather than the previously incorrect order. It’s likely that if your code was exhibiting this bug you would have noticed it before the code got to production, so this note is more a confirmation than a realistic warning of a future threat.

The bug originally showed up in a thread on the Oracle developer forum more than a year ago but I was prompted to finish and publish this note after seeing an article on deadlocks by Frank Pachot where the behaviour of his demonstration code could vary with version of Oracle because of this bug.

Here’s some code to create a demonstration data set:

rem
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2022
rem     Purpose:        
rem
rem     Last tested 
rem             19.11.0.0       Right order
rem             19.3.0.0        Wrong order
rem             12.2.0.1        Wrong order
rem
rem     Notes
rem     From 12.1.0.2 to ca. 19.3(++?) the optimizer loses a "sort order by" 
rem     operation when a "descending" index meets an in-list iterator.
rem     

create table t1 
as
with generator as (
        select  rownum id
        from    dual
        connect by
                level <= 1e4
)
select
        rownum                                  id,
        substr(dbms_random.string('U',6),1,6)   v1,
        rpad('x',100,'x')                       padding
from
        generator
/

alter table t1 modify v1 not null;

update t1 set v1 = 'BRAVO'      where id = 5000;
update t1 set v1 = 'MIKE'       where id = 1000;
update t1 set v1 = 'YANKEE'     where id = 9000;

create index t1_i1 on t1(v1 desc);

I’ve created a table with a column generated as short random strings, then set three rows scattered through that table to specific values, and created an index on that column – but the index is defined with the column descending.

(Reminder: if all the columns in an index are declared as descending that all you’ve done is waste space and introduce an opportunity for the optimizer to go wrong – descending columns in indexes only add value if the index uses a combination of ascending and descending columns).

Here’s a simple query – with the results when executed from SQL*Plus in 12.2.0.1. Note, particularly, the order by clause, the order of the results, and the body of the execution plan:

set serveroutput off

select  v1, id
from    t1
where   v1 in (
                'MIKE',
                'YANKEE',
                'BRAVO'
        ) 
order by 
        v1
;

select * from table(dbms_xplan.display_cursor(format=>'outline'));


V1                               ID
------------------------ ----------
YANKEE                         9000
MIKE                           1000
BRAVO                          5000

3 rows selected.


SQL_ID  6mpvantc0m4ap, child number 0
-------------------------------------
select v1, id from t1 where v1 in (   'MIKE',   'YANKEE',   'BRAVO'  )
order by  v1

Plan hash value: 4226741654

---------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |       |       |    22 (100)|          |
|   1 |  INLIST ITERATOR              |       |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID | T1    |     3 |    33 |    22   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN DESCENDING| T1_I1 |    40 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

The most important point, of course, is that the result set is in the wrong order. It’s interesting to note that there is no “sort order by” operation and that the index range scan is described as “descending”. A brief pause for thought suggests that if you do a descending range scan of a “descending index” then the results ought to come out in ascending order which might explain why the optimizer thought it could eliminate the sort operation. However that thought isn’t necessarily valid since the “inlist iterator” means Oracle should be executing “column = constant” once for each value in the list, which would make the ascending/descending nature of the index fairly irrelevant (for this “single-column” example).

When I created the same data set and ran the same query on 19.11.0.0 I got exactly the same execution plan, including matching Predicate Information and Outline Data (apart from the db_version and optimizer_features_enable values, of course), but the result set was in the right order. (It was still wrong in a test against 19.3, so the fix must have appeared somewhere in the 19.4 to 19.11 range.)

Workaround

In this example one of the ways to work around the problem (in 12.2) was to add the index() hint (which is equivalent to the index_rs_asc() hint) to the query, resulting in the following plan (again identical in 12c and 19c):

SQL_ID  6x3ajwf41x91x, child number 0
-------------------------------------
select  /*+ index(t1 t1_i1) */  v1, id from t1 where v1 in (   'MIKE',
 'YANKEE',   'BRAVO'  ) order by  v1

Plan hash value: 1337030419

-----------------------------------------------------------------------------------------------
| Id  | Operation                             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |       |       |       |    23 (100)|          |
|   1 |  SORT ORDER BY                        |       |     3 |    33 |    23   (5)| 00:00:01 |
|   2 |   INLIST ITERATOR                     |       |       |       |            |          |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     3 |    33 |    22   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | T1_I1 |    40 |       |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

The “index range scan” operation is no longer “descending”, and we have a “sort order by” operation. You’ll note that, thanks to blocking sort operation the table access is now “batched”.

Best Guess

The way Oracle handles an IN-list is to start by reducing it to a sorted list of distinct items, before iterating through each item in turn. Then, if there is an order by clause that matches the order of the sorted in-list, and Oracle can walk the index in the right order then it can avoid a “sort order by” operation.

I’m guessing that there may be two separate optimizer strategies in the “descending columns” case that have collided and effectively cancelled each other out:

Hypothetical Strategy 1: If there is a “descending index” that can be range scanned for the data the in-list should be sorted in descending order before iterating. (There is a flaw in this suggestion – see below)
Hypothetical strategy 2: Because the query has an order by (ascending) clause the index scan should be in descending order to avoid a sort operation.

The flaw in the first suggestion is that the Predicate Information suggests that it’s not true. This is what you get in every case (though the operation number changes to 4 when the plan includes a “sort order by” operation):

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access(("T1"."SYS_NC00004$"=HEXTORAW('BDADBEA9B0FF') OR
              "T1"."SYS_NC00004$"=HEXTORAW('B2B6B4BAFF') OR
              "T1"."SYS_NC00004$"=HEXTORAW('A6BEB1B4BABAFF')))
       filter((SYS_OP_UNDESCEND("T1"."SYS_NC00004$")='BRAVO' OR
              SYS_OP_UNDESCEND("T1"."SYS_NC00004$")='MIKE' OR
              SYS_OP_UNDESCEND("T1"."SYS_NC00004$")='YANKEE'))

As you can see the values appearing in the access() predicate are the one’s complements of BRAVO, MIKE and YANKEE in that order; in no case was the order reversed, and previous experience says that predicates are used in the order they appear in the Predicate Information.

On the other hand, it’s arguable that the three predicate values should have been reported (in some form) at the inlist iterator operation – so this may be a case where the simplest strategy for presenting the plan doesn’t match the actual activity of the plan.

Post script

If I change the unhinted query to “order by v1 desc” the rows are reported in ascending order in 12.2.0.1, but in the correct descending order in 19.11.

SDU size etc.

Tue, 2023-12-19 05:38

I sketched the first draft of this note on 7th Feb 2010, then forgot about it until some time in July 2019 which is when I expanded enough of it to publish as a demonstration of how the session data unit ( sdu / default_sdu_size) parameters in SQL*Net affected the number of waits for “SQL*Net more data from dblink / to client”. Once again, though, I failed to complete the note I had started to write 9 years previously.

I recently (Sept 2023) posted a tweet linking to the July note and received a reply asking me about “the role and impact of setting TCP send_buf_size and recv_buf_size parameter in sqlnet”; The answer to that question was in the rest of the draft, so I’m finally completing the note I started A Mole of years ago (Okay, silly reference to “The Secret Diary of Adrian Mole after 13 and 3/4). Since that time, of course, there’s been a lot of change; in particular (for the purposes of this note) machine memories have got bigger, numbers of concurrent users have increased, and various defaults for SQL*Net parameters have changed – possibly to the extent that this note may have become irrelevant for most administrators.

Layers

If you execute a query in SQL*Plus to pull data from an Oracle database there are essentially three nested layers to the communication where the rate of flow of data can be affected by Oracle:

At the highest level you can set an arraysize in SQL*Plus to specify the number of rows that should pass from the server to the client in each fetch call. (Other client programs have something similar, of course – sometimes described as a “prefetch” setting.)
At the next level down you can set the SDU (session data unit) size in the SQL*Net configuration files to specify the maximum size of the SQL*Net data packets that can pass between server and client.
Finally there is the MTU (maximum transmission unit) which is the size of the largest data packet that can pass between the two ends of a tcp/ip (or UDP etc.) connection.

Until the arrival of the 9KB “jumbo frames” [the MOS note is a little elderly, with recent updates] the typical MTU for a tcp/ip link was roughly 1,400 bytes, which is not very large if you want to pull thousands (or millions) of rows across a network, especially if the network layer expects an acknowledgement from the far end after every single packet (as I think it did when I first started using Oracle 6 on a small HP9000).

Imagine yourself back in Oracle 6 days, running a query from SQL*Plus to fetch 480 “reasonably sized” rows and you’ve previously executed “set arraysize 100”:

The Oracle server says: here’s the first 100 rows (which happens to total about 22KB)
The server-side SQL*Net layers says: I can take your first 8KB (because that’s my SDU size) and send it to the tcp layer
The service-side tcp layer says: I’ll send 1400 bytes for in first packet, and wait for the far end to ACK. Then it repeats this step 5 more times, waiting for the client TCP to ACK on each packet.
The client-side SQL*Net layer finally receives enough tcp packets for the client tcp layer to reconstruct the SDU packet and passes it up to the SQL*Plus client process.
Repeat from step 2 until the first 100 rows has arrived at the client – at which point the client asks for the next 100.

In the bad old days performance could get much worse because of the Nagle algorithm which aimed to make network communications more efficient by pausing briefly before sending a “small” network packet in case a little more data was about to come down the stack. So if the final SDU packet from a fetch was “small” the network layer would pause – but there would never be any more data from the server-size SQL*Net layer until the next fetch call from the client. This is the origin and purpose of the sqlnet.ora parameter tcp.no_delay = on. My very first presentation to a user group included a review of a client/server query that was “very slow” if the array size was 15, but much faster if it was 14 or 16 thanks to this effect of “a last little bit”.

The frequency of the ACK is, I think, where the send_buf_size and recv_buf_size – and their predecessor the SQL*Net parameter TDU (transmission data unit) – come into play. Essentially the network layer will still send packets of the MTU size, but will not expect an ACK until it has forwarded the current contents of the send_buf_size. The exact details of how this all works probably depend in some way on what acknowledgement the SQL*Net layer expects for each SDU packet, but in outline the following steps take place:

Server process receives a fetch call
Server process fills one SDU packet (possibly stopping if the next complete row won’t fit) then waits for one of “fetch next array” (if the SDU packet included the last piece of the array) or “give me the next packet” if not.
SQL*Net copies as much of the SDU packet as will fit into the network send buffer
the network layer sends the buffer content to the client machine using MTU-sized chunks, waiting for ACK only after sending the last chunk
repeat from step 3 until the entire SDU packet has been passed to the client
Repeat from step 2 if the server was waiting for “SQL*Net more data to client” else go to 1

You can appreciate from this that the minimum “waste” of resources and time occurs when the entire array fetch size fits inside the SDU size (with a little spare for SQL*Net overheads), and the entire SDU size fits inside the send_buf_size (with a little spare for tcp/ip overheads). In this case a single fetch would result in only one SQL*Net wait for the fetch array to travel (no waits for “more data”) and only one tcp wait for ACK after streaming out MTU packets to send the entire SDU packet/send_buf_size.

There are two possible downsides to this “ideal” configuration:

Lots of sessions (specifically their shadow processes) could end up allocating a large chunk of memory for a tcp buffer that they never really needed – and I’ve seen lots of systems in the last few years with thousands of sessions connected, but an average of less than 25 sessions active.
A session that actually used a very large send_buf_size could (I speculate) adversely affect the latency (time to get on the wire) for all the sessions that had a requirement for turning around multiple small messages. (This, of course, is just another little detail to consider in the impedance mismatch between OLTP and DW systems).

The first can be handled by ensuring that processes that could really benefit from a larger SDU size and send_buf_size connect to a specially defined service name set up in the (client) tnsnames.ora and (server) listener.ora.

We know that when a client connects to the server through SQL*Net they negotiate the SDU size of the connection as the smaller of the client’s and server’s settings. I don’t know what happens if the receive buffer at one end is different from the send buffer at the other, or whether it even matters – but it’s something that could be worth researching.

Warning

Many years ago I had some subtle and sophisticated experiments that I set up between a Windows PC running SQL*Plus and my personal HP9000 running the Oracle server to investigate what I could do to minimise network chatter due to the client/server interaction. To a large degree the means tweaking SQL*Net parameters, enabling the 10079 trace, and watching the HP’s version of netstat for messages sent / messages received.

I haven’t yet managed to get down to creating similar experiments between two virtual Linux machines running on a Window host; so any comments about what goes on a the level below SQL*Net (i.e. the MTU, TDU, and xxxx_buf_size are still conjecture in need to testing and confirming.

Footnote

There are a number of statistics in v$sysstat (though not necessarily in v$sesstat – statistics 2010 to 2036 are “missing” in 19.11 – and not necessarily enabled by default) about this level of network activity that the interested reader might want to examine. There are also some figures in v$osstat about network configuration.

tl;dr

When moving data from server to client

The server generates “packets” dictated by the client array fetch size
SQL*Net exchanges packets limited by the negotiated SDU size
tcp sends packets limited by the MTU (max transmission unit)
(Assumption): The sending tcp layer expects acknowledgements from the receiving tcp layer only after a volume limited by the sending send_buf_size (possibly further limited by the receiving recv_buf_size).

To minimise the time spent in “non-data” chatter on the wire when transferring large volumes of data you could define a service that allows the SDU (plus a little overhead) to fit inside the send/receive buffer size, and an SDU large enough to cope with a “reasonably large” array fetch in a single SQL*Net round trip. (Increasing the MTU at the O/S level may also be possible, and you could also look at the possibility of using “jumbo frames” for tcp.)

Remember that the send/receive buffers are per session, so be careful that you don’t end up with thousands of sessions that have allocated a couple of megabytes they don’t need – you’re allowed to create multiple services with different configurations for the same database so your OLTP users could attach through one service and your DSS/DW/etc. users through another.

sys_op_descend()

Mon, 2023-11-27 03:46

When you specify that a column used in an index should be a “descending” column Oracle uses the internal, undocumented, sys_op_descend() function to derive a value to store for that column in the index leaf block.

For many years I’ve claimed that this function simply takes the one’s-complement of a character value and appends 0xff to the result before storing it. This is nothing like the whole story and in a recent comment to a note on descending indexes and finding max() values I was given another part of the story in response to my confession that I did not know why a varchar2(128) had turned into a raw(193) – it’s the length not the rawness that puzzled me – when subject to sys_op_descend().

Here’s a little script to generate some data that helps to explain what’s going on.

rem
rem     Script:         sys_op_descend.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2023
rem     Purpose:        
rem
rem     Last tested 
rem             19.11.0.0
rem

create table t1 (
        v1      varchar2(10),
        nv1     nvarchar2(10)
)
/

insert into t1 values('A','A');
insert into t1 values('AB','AB');
insert into t1 values('ABC','ABC');
commit;

create table t2 (
        v1      varchar2(10),
        nv1     nvarchar2(10)
)
/

insert into t2 values(chr(0), chr(0));
insert into t2 values(chr(1), chr(1));
insert into t2 values(chr(2), chr(2));
insert into t2 values(chr(3), chr(3));
insert into t2 values(chr(4), chr(4));
commit;
;

commit;

There’s nothing particularly special about the two tables and data I’ve inserted, I’m just trying to generate patterns that make it easy to see what’s going on; and I’ll be comparing a varchar2() and an nvarchar2() because my nvarchar2() column is a multi-byte character set.

Let’s look at the (nearly raw) data from t1 where I’ve stored growing strings of printable characters. Here’s a simple query and its result set:

set linesize 50
break on row

select
        dump(v1,16)                     v1,
        dump(sys_op_descend(v1), 16)    v1_desc,
        dump(nv1,16)                    nv1,
        dump(sys_op_descend(nv1), 16)   nv1_desc
from
        t1
/


V1
--------------------------------------------------
V1_DESC
--------------------------------------------------
NV1
--------------------------------------------------
NV1_DESC
--------------------------------------------------
Typ=1 Len=1: 41
Typ=23 Len=2: be,ff
Typ=1 Len=2: 0,41
Typ=23 Len=4: fe,fb,be,ff

Typ=1 Len=2: 41,42
Typ=23 Len=3: be,bd,ff
Typ=1 Len=4: 0,41,0,42
Typ=23 Len=7: fe,fb,be,fe,fb,bd,ff

Typ=1 Len=3: 41,42,43
Typ=23 Len=4: be,bd,bc,ff
Typ=1 Len=6: 0,41,0,42,0,43
Typ=23 Len=10: fe,fb,be,fe,fb,bd,fe,fb,bc,ff

If you start with the first two columns of the output rows you can see: ‘A’ == 0x41, which becomes 0xbe, 0xff, following the “one’s complement with an appended 0xff” rule. The same pattern is visible for ‘AB’ and ‘ABC’.

When you look at the 3rd and 4th columns of each row (the nvarchar2), ‘A’ is now a 2-byte value (0x00, 0x41) which turns into the four bytes: 0xfe, 0xfb, 0xbe, 0xff. The value 0xbe is recognisable as the one’s-complement of 0x41 that appeared for the varchar2() values – but the 0x00 in the original nvarchar2() value seems to have turned into a two-byte 0xfe, 0xfb rather than the “expected” 0xff.

Why doesn’t Oracle use the “right” one’s complement for zero? Maybe because 0xff is the byte that’s supposed to mark the end of the string, and it’s important to avoid the risk of ambiguity. But now we have a new problem: Oracle is using 0xfe as the first of two bytes to represent the “descending” zero, and 0xfe is the one’s-complement of 0x01. So how is Oracle working around the fact that it would be a bad idea to have two possible meanings for the value 0xfe? That’s where the second data set comes in; here’s the same query, with results, run against the t2 table:

select
        dump(v1,16)                     v1,
        dump(sys_op_descend(v1), 16)    v1_desc,
        dump(nv1,16)                    nv1,
        dump(sys_op_descend(nv1), 16)   nv1_desc
from
        t2
/

V1
--------------------------------------------------
V1_DESC
--------------------------------------------------
NV1
--------------------------------------------------
NV1_DESC
--------------------------------------------------
Typ=1 Len=1: 0
Typ=23 Len=3: fe,fe,ff
Typ=1 Len=2: 0,0
Typ=23 Len=3: fe,fd,ff

Typ=1 Len=1: 1
Typ=23 Len=3: fe,fa,ff
Typ=1 Len=2: 0,1
Typ=23 Len=3: fe,fc,ff

Typ=1 Len=1: 2
Typ=23 Len=2: fd,ff
Typ=1 Len=2: 0,2
Typ=23 Len=4: fe,fb,fd,ff

Typ=1 Len=1: 3
Typ=23 Len=2: fc,ff
Typ=1 Len=2: 0,3
Typ=23 Len=4: fe,fb,fc,ff

Typ=1 Len=1: 4
Typ=23 Len=2: fb,ff
Typ=1 Len=2: 0,4
Typ=23 Len=4: fe,fb,fb,ff

Looking at the last three groups of 4 lines we can see the varchar2() column following the “one’s complement, append 0xff” rule and the nvarchar2() following the additional “use 0xfe 0xfb for zero” rule; but for chr(0) and chr(1) the dumps need some further untangling. With the tests so far all we can say with any confidence is that “if you see the 0xfe byte then a different pattern is about to appear briefly”.

I don’t really need to do any more experiments to guess why, in my previous note, the descending varchar2(128) was projected as raw(193) (though you might want to play around to see what happens with strings of several consecutives zeros or ones). I suppose the size reported could vary with character set, but if I have a two-byte fixed width character set and most of the data corresponds to basic ASCII characters then I’ll have a lot of characters where every other byte is a zero that encodes into two bytes when descending – so it makes sense to use for the projection a size derived as: 64 * 1 + 64 * 2 + 1 (0xff) = 193. Q.E.D.

Index sizing

An important side effect of this improved understanding is the realisation of what can happen to the size of an index when declared descending. For a simple demonstration, here’s a table with 4 columns and 50,000 rows copied from all_objects.object_name, using the nvarchar2() type for one of the pairs of copies.

create table t1a (
        nva nvarchar2(128),
        nvd nvarchar2(128),
        va  varchar2(128),
        vd  varchar2(128)
)
/

insert into t1a
select  object_name, object_name, object_name, object_name
from    all_objects
where
        rownum <= 50000
;

create index t1a_nva on t1a(nva);
create index t1a_nvd on t1a(nvd desc);
create index t1a_va on t1a(va);
create index t1a_vd on t1a(vd desc);

execute dbms_stats.gather_table_stats(user,'t1a')

select index_name, leaf_blocks
from
        user_indexes
where
        table_name = 'T1A'
order by
        index_name
/

select 
        column_name, avg_col_len 
from    user_tab_cols 
where   table_name = 'T1A' 
order by 
        column_name
/


INDEX_NAME           LEAF_BLOCKS
-------------------- -----------
T1A_NVA                      590
T1A_NVD                      854
T1A_VA                       336
T1A_VD                       343

4 rows selected.

COLUMN_NAME          AVG_COL_LEN
-------------------- -----------
NVA                           74
NVD                           74
SYS_NC00005$                 111
SYS_NC00006$                  39
VA                            38
VD                            38

6 rows selected.

As you can see, the descending varchar2() index (backed by column sys_nc0006$) is only slightly larger than the corresponding ascending index, but the descending nvarchar2() (backed by column sys_nc00005$) has increased by something much closer to 50% in size because half the bytes in each object_name were zeroes that have been replaced by the two byte 0xfe 0xfb. This is much worse than the “one byte per descending column per row” that I’ve been claiming for the last 20 or 30 years.

gby_pushdown

Mon, 2023-11-20 12:42

This is a note that’s been awaiting completion for nearly 10 years. It’s about a feature (or, at least, a hint for the feature) that appeared in 10.2.0.5 to control some of the inter-process messaging that takes place in parallel execution.

It’s a very simple idea that can make a significant difference in CPU usage for large parallel queries – can you aggregate the raw data before distributing it between slave sets (minimising the volume of traffic) or should you simply distribute the data and aggregate late to avoid aggregating twice. The strategy of aggregating early is known as “group by pushdown”.

I’ll start with a script to create a simple data set and a trivial query with a parallel hint:

rem
rem     Script:         gby_pushdown.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2023
rem
rem     Last tested 
rem             19.11.0.0
rem             12.2.0.1
rem

set linesize 180
set pagesize 60
set trimspool on
set tab off

create table t1 
as 
select 
        * 
from 
        all_objects 
where 
        rownum <= 50000
/


alter session set statistics_level = all;

set serveroutput off

prompt  ==================================================
prompt  Default (for this data) pushdown and hash group by
prompt  ==================================================

set feedback only

select 
        /*+ 
                qb_name(main)
                parallel(t1 2) 
--              no_gby_pushdown(@main)
--              no_use_hash_gby_for_pushdown(@main)
        */ 
        owner, object_type, count(*) ct
from 
        t1 
group by 
        owner, object_type
order by
        owner, object_type
/

set feedback on
select * from table(dbms_xplan.display_cursor(format=>'outline allstats cost hint_report -predicate'));

In the absence of any hints (apart from the qb_name() and parallel() hints), the plan I get from the query is as follows:

Plan hash value: 2919148568

------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |      1 |        |    77 (100)|     96 |00:00:00.07 |       5 |      0 |       |       |          |
|   1 |  PX COORDINATOR          |          |      1 |        |            |     96 |00:00:00.07 |       5 |      0 | 11264 | 11264 |     1/0/0|
|   2 |   PX SEND QC (ORDER)     | :TQ10001 |      0 |    276 |    77   (8)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   3 |    SORT GROUP BY         |          |      2 |    276 |    77   (8)|     96 |00:00:00.01 |       0 |      0 |  4096 |  4096 |     2/0/0|
|   4 |     PX RECEIVE           |          |      2 |    276 |    77   (8)|    121 |00:00:00.01 |       0 |      0 |       |       |          |
|   5 |      PX SEND RANGE       | :TQ10000 |      0 |    276 |    77   (8)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   6 |       HASH GROUP BY      |          |      2 |    276 |    77   (8)|    121 |00:00:00.04 |    1043 |    991 |  1079K|  1079K|     2/0/0|
|   7 |        PX BLOCK ITERATOR |          |      2 |  50000 |    73   (3)|  50000 |00:00:00.02 |    1043 |    991 |       |       |          |
|*  8 |         TABLE ACCESS FULL| T1       |     26 |  50000 |    73   (3)|  50000 |00:00:00.01 |    1043 |    991 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"MAIN")
      FULL(@"MAIN" "T1"@"MAIN")
      GBY_PUSHDOWN(@"MAIN")
      USE_HASH_GBY_FOR_PUSHDOWN(@"MAIN")
      END_OUTLINE_DATA
  */

You’ll notice in the Outline Data that Oracle has recorded the use_hash_gby_for_pushdown() hint and the gby_pushdown() hint. I’ll be repeating the query disabling each of these hints in turn – which is why the negative versions of the hints appear as comments in my original query.

If you look at operation 6 of the plan you can see that the optimizer has chosen to use a hash group by to aggregate the selected rows, reducing 50,000 rows to 121 rows. We could query v$pq_tqstat, or run the SQL Monitor report to get more detail about how much work each PX process did, but it’s fairly safe to assume that it was shared reasonably evenly between the two processes.

After aggregating their data the first layer of PX processes distributes the results by range (operation 5, PX Send Range) to the second layer of PX processes, which re-aggregate the much reduced data set. At this point Oracle chooses to aggregate by sorting (Sort Group By) as this will deal with the order by clause at the same time. (Note: you could tell Oracle to split the aggregation and ordering by adding a use_hash_aggregation hint to the query.)

With my data set it’s fairly clear that it’s a good idea to do this “two stage” aggregation because the initial raw data is reduced by a very large factor the first layer of PX processes before they forward the results – and the reduction in inter-process messaging is likely to be a good idea.

There may be cases, of course, where some feature of the data pattern means that two-stage aggregation is a good idea, but aggregating by a sort is more efficient than an aggregation by hashing – a cardinality or clustering estimate might have persuaded the optimizer to make the wrong choice – so let’s see what happens to the plan if we enable the no_use_hash_gby_for_pushdown() hint:

lan hash value: 3954212205

---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem |  O/1/M   |
---------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |      1 |        |    77 (100)|     96 |00:00:00.14 |       5 |       |       |          |
|   1 |  PX COORDINATOR          |          |      1 |        |            |     96 |00:00:00.14 |       5 | 11264 | 11264 |     1/0/0|
|   2 |   PX SEND QC (ORDER)     | :TQ10001 |      0 |    276 |    77   (8)|      0 |00:00:00.01 |       0 |       |       |          |
|   3 |    SORT GROUP BY         |          |      2 |    276 |    77   (8)|     96 |00:00:00.03 |       0 |  4096 |  4096 |     2/0/0|
|   4 |     PX RECEIVE           |          |      2 |    276 |    77   (8)|    120 |00:00:00.03 |       0 |       |       |          |
|   5 |      PX SEND RANGE       | :TQ10000 |      0 |    276 |    77   (8)|      0 |00:00:00.01 |       0 |       |       |          |
|   6 |       SORT GROUP BY      |          |      2 |    276 |    77   (8)|    120 |00:00:00.13 |    1043 |  9216 |  9216 |     2/0/0|
|   7 |        PX BLOCK ITERATOR |          |      2 |  50000 |    73   (3)|  50000 |00:00:00.07 |    1043 |       |       |          |
|*  8 |         TABLE ACCESS FULL| T1       |     26 |  50000 |    73   (3)|  50000 |00:00:00.02 |    1043 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------

It doesn’t really make any difference in this very small test case, though the sorting does seem to have take a fraction of a second more CPU. The key change is that operation 6 has become a Sort Group By.

There is an interesting detail to look out for, though, in the Outline Data and Hint Report:

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"MAIN")
      FULL(@"MAIN" "T1"@"MAIN")
      GBY_PUSHDOWN(@"MAIN")
      END_OUTLINE_DATA
  */

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2
---------------------------------------------------------------------------
   1 -  MAIN
           -  qb_name(main)

   8 -  MAIN / T1@MAIN
           -  parallel(t1 2)

I added the hint /*+ no_use_hash_gby_for_pushdown(@main) */ to the query, and the hint has clearly worked; but it’s not reported in the Hint Report, and it doesn’t appear in the Outline Data. This suggests that if you created an SQL Baseline for this query with this hint in place the plan would not reproduce because SQL Baseline would be missing the critical hint. (To be tested – left as an exercise to readers.)

The next step is to enable the no_gby_pushdown() hint. For the purposes of the note this is the important one. It’s also the one that you are more likely to use as it’s fairly obvious (if you know the data) when it’s a good idea to use it. (In some cases, of course, the optimizer might have avoided “group by pushdown” when it should have used it – in which case you’d be using the gby_pushdown() hint.) Here’s the plan when I block “group by pushdown”.

--------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem |  O/1/M   |
--------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |      1 |        |    77 (100)|     96 |00:00:00.08 |       5 |       |       |          |
|   1 |  PX COORDINATOR         |          |      1 |        |            |     96 |00:00:00.08 |       5 |  6144 |  6144 |     1/0/0|
|   2 |   PX SEND QC (ORDER)    | :TQ10001 |      0 |    276 |    77   (8)|      0 |00:00:00.01 |       0 |       |       |          |
|   3 |    SORT GROUP BY        |          |      2 |    276 |    77   (8)|     96 |00:00:00.11 |       0 |  4096 |  4096 |     2/0/0|
|   4 |     PX RECEIVE          |          |      2 |  50000 |    73   (3)|  50000 |00:00:00.07 |       0 |       |       |          |
|   5 |      PX SEND RANGE      | :TQ10000 |      0 |  50000 |    73   (3)|      0 |00:00:00.01 |       0 |       |       |          |
|   6 |       PX BLOCK ITERATOR |          |      4 |  50000 |    73   (3)|  50091 |00:00:00.02 |    1051 |       |       |          |
|*  7 |        TABLE ACCESS FULL| T1       |     28 |  50000 |    73   (3)|  50091 |00:00:00.01 |    1051 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"MAIN")
      FULL(@"MAIN" "T1"@"MAIN")
      END_OUTLINE_DATA
  */

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 3
---------------------------------------------------------------------------
   1 -  MAIN
           -  no_gby_pushdown(@main)
           -  qb_name(main)

   7 -  MAIN / T1@MAIN
           -  parallel(t1 2)

Key details to highlight here are:

There’s only one aggregation step, appearing at operation 3 after the PX SEND/RECEIVE
50,000 rows are passed from slave set 1 (operation 6) to slave set 2 (operation 4).
The no_gby_pushdown(@main) does appear in the Hint Report
The no_gby_pushdown(@main) doesn’t appear in the Outline Data (which is, again, a little worrying).

Again, with this small data set, the change in plan isn’t going to make much difference to the end user, but you may find cases where there is a best choice to keep the user sufficiently happy and save machine resources.

Summary

When using parallel query the optimizer may choose to aggregate the data in two steps so that the current rowsource is reduced by one set of PX processes before it is passed to the next set of PX processes that does a second aggregation step. This tends to be a good idea if the first set of processes can reduce the size of the data set by a large factor and save a lot of machine resources on the subsequence inter-process communication; but it is a bad idea if the data doesn’t reduce and a large volume of data ends up being aggregated in both sets of PX processes.

You can over-ride the optimizer’s choice with the [no_]gby_pushdown() hint.

Session Activity Stats

Mon, 2023-11-20 09:35

A little detail to remember when trouble-shooting at the session level – some of the information summarised in the Instance Activity figures (v$sysstat) is not available in the Session Activity figures (v$sesstat / v$mystat). The difference goes right down to the x$ objects, and here are two versions of a simple query I wrote for 19c to check for the missing statistics (you’ll have to be able to connect as SYS to get results from the first one):

rem
rem     Script:         mystat_missing.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2023
rem     Purpose:        
rem
rem     Last tested 
rem             23.3.0.0
rem             19.11.0.0
rem             12.2.0.1

spool mystat_missing

set linesize 132
set pagesize 60
set trimspool on
set tab off


select 
        usd.indx, usd.ksusdnam, usd.ksusdcls
from 
        x$ksusd usd
where
        usd.indx not in (
                select  sta.indx 
                from    x$ksumysta      sta
        )
/

select
        sys.statistic#, sys.name, sys.class
from
        v$sysstat sys
where
        sys.statistic# not in (
                select  mst.statistic# 
                from    v$mystat        mst
        )
/

spool off

If you run the script against 12.2.0.1 you’ll find that there are no “missing” session stats, but when you upgrade to 19c (in my case 19.11.0.0) you’ll find a couple of dozen statistics reported. This was the output I got:

STATISTIC# NAME                                                                  CLASS
---------- ---------------------------------------------------------------- ----------
      2010 Instance Statistic test case                                            384
      2011 cell RDMA reads unused                                                  264
      2012 cell RDMA writes                                                        264
      2013 cell pmem cache read hits unused                                        264
      2014 cell pmem cache writes                                                  264
      2015 NS transactions                                                         257
      2016 NS transactions timed out                                               257
      2017 NS transactions interrupted                                             257
      2018 NS transactions not using all standbys                                  257
      2019 NS transactions skipping send                                           257
      2020 NS transaction setups                                                   257
      2021 NS transaction bytes loaded                                             320
      2022 NS transaction bytes to network                                         320
      2023 NS transaction bytes relogged                                           322
      2024 NS transaction bytes logged                                             322
      2025 NS transaction send time                                                320
      2026 NS transaction setup time                                               320
      2027 NS transaction confirm time                                             320
      2028 NS recovery timeout interrupts                                          384
      2029 NS recovery DTC full interrupts                                         384
      2030 NS recovery fetch requests made                                         384
      2031 NS recovery fetch ranges received                                       384
      2032 NS recovery fetch requested bytes                                       384
      2033 NS recovery fetch received bytes                                        384
      2034 NS recovery fetch received time                                         384
      2035 NS recovery fetch requests remade                                       384

26 rows selected.

Running the query against 23c Free (23.3) on the Developer VM created by Oracle, the number of “missing” statistics jumped to 1,052 – so I won’t list them. Given the size of the list I did a quick check to remind myself of how many statistics were actually present in v$sysstat, and that produced an interesting comparison.

--
--      19.11.0.0
--
SQL> select count(*) ct_19c from v$sysstat;

    CT_19C
----------
      2036

--
--      23.3.0.0 (on premises)
--
SQL> select count(*) ct_23c from v$sysstat;

    CT_23C
----------
      2661

So there are 600-ish extra stats available in 23c, but 1,000-ish stats that don’t appear at the session level. So if you’ve been taking advantage of some of the “new” 19c stats to help with trouble-shooting you may find that they disappear on the upgrade. More on that later.

If you look at the output I’ve listed above you might spot that all the missing stats satisfy the predicate “class >= 256”. In fact, the class is a bit vector, and a more appropriate predicate would be: “bitand(class,256) = 256”. Either option gives you a fairly lazy way to do any checks you might be interested in. For example, after setting up a database link from a 19c instance to a 23c instance, I ran the following query to find out how many statistics that were visible in the 19c v$sesstat had changed their class to become invisible in the 23c v$sesstat.

select name from v$sysstat where class < 256
intersect
select name from V$sysstat@oracle23 where class >= 256
/


NAME
----------------------------------------------------------------
...
SMON posted for dropping temp segment
SMON posted for instance recovery
SMON posted for txn recovery for other instances
SMON posted for undo segment recovery
SMON posted for undo segment shrink
TBS Extension: bytes extended
TBS Extension: files extended
TBS Extension: tasks created
TBS Extension: tasks executed
...
commit cleanout failures: block lost
commit cleanout failures: buffer being written
commit cleanout failures: callback failure
commit cleanout failures: cannot pin
commit cleanout failures: delayed log
commit cleanout failures: flashback
commit cleanout failures: hot backup in progress
commit cleanout failures: write disabled
...
db corrupt blocks detected
db corrupt blocks recovered
...


502 rows selected.

Of the 502 stats a very large fraction were about In Memory (prefixed IM), and there were a number that looked as if they were only relevant to background processes and therefore (to some extent, possibly) not of interest when debugging user activity. It’s also possible that some of the statistics would fall into different classes if certain features (e.g hybrid columnar compression) were linked in to the executable.

Another query that might be of interest is a typical “what’s changed” query. What statistics are available in 23c that aren’t in 19c (and vice versa):

select
        name, version
from    (
        select name, '23c' version from v$sysstat@oracle23
        minus
        select name, '23c' from V$sysstat
        union all
        select name, '19c' version from v$sysstat
        minus
        select name, '19c' from V$sysstat@oracle23
        )
order by
        version desc, name
/

Again there are so many that I don’t think it would be helpful to reproduce my results, but I’ll just say that there were 663 stats in 23c that weren’t in 19c, and 38 stats in 19c that weren’t in 23c. Of course, it’s possible (and I didn’t check carefully) that some of these new/”disappearing” statistics show up only because they’ve had a name change as the version changed.

A couple of the new 23c stats that I like the look of (and would like to see at the session level) are:

user commits PL/SQL
user commits Autonomous

I’m sure that there are more that will reveal themselves as (occasionally) useful over time, and further investigation is left to the reader.

Swap_Join_Inputs

Mon, 2023-11-06 08:47

This is just a short note (I hope) prompted by a conversation on the Oracle-L list server. A query from a 3rd party application was doing a very large full tablescan with hash join when it should have been doing a high precision index driven nested loop join, and the poor choice of plan was due an optimizer defect when handling column groups (fixed in 23c) when one or more of the columns involved is always, or nearly always, null.

As a follow-up the owner of the problem asked what hints should go into an SQL Patch to make the optimizer use the nested loop. There’s a simple guideline that usually works for this type of “wrong join” problem: report the “Outline Data” from the current execution plan; find the relevant join hint(s) (in this case a use_hash() hint and a full() hint), change those join hint(s) (e.g. use_nl(), index()), and write the entire edited outline data into an SQL Patch watching out for a recently fixed defect in the SQL patch code.

There are, however, various refinements that add complexity to this strategy, as you can appreciate from a note I wrote some years ago about how to hint a hash join properly. This note is an example of handling one such refinement.

The query was a little complex, and the optimizer had unnested a subquery that consisted of a union all of 4 branches, and one of those branches had contributed a very large cardinality estimate to the total for the view, so the optimizer had chosen a hash join between the unnested subquery and a large table. Stripped to a bare minimum that part of the plan looked like this:

HASH JOIN
      VIEW                   VW_NSO_1
      TABLE ACCESS FULL      PO_LINE_LOCATIONS_ALL

A quick scan of the Outline Data found the join hint (converted to lower case, with quotes removed): use_hash(@sel$ac90cd92 vw_nso_1@sel$ac90cd92), and an over-hasty response might be to convert the use_hash to a use_nl and leave it at that – except for three possible warnings:

we wanted to see a nested loop into po_line_locations_all, so the suggested use_nl() hint would be looping into the wrong “table”
the plan shows the view vw_nso_1 as the build table, while the reported hint is about vw_nso_1 being the second table in the join order
there are further references to vw_nso_1 (and to po_line_locations_all) in the outline data

Here’s the complete set of original hints that might have been relevant to this particular part of the plan:

full(@sel$ac90cd92 po_line_locations_all@sel$2)
no_access(@sel$ac90cd92 vw_nso_1@sel$ac90cd92)
leading(@sel$ac90cd92 po_line_locations_all@sel$2 vw_nso_1@sel$ac90cd92)
use_hash(@sel$ac90cd92 vw_nso_1@sel$ac90cd92)
swap_join_inputs(@sel$ac90cd92 vw_nso_1@sel$ac90cd92)

index_rs_asc(@sel$b584fdd1 po_line_locations_all@sel$16 (......))
batch_table_access_by_rowid(@sel$b584fdd1 po_line_locations_all@sel$16)

index_rs_asc(@sel$5ed1c707 po_line_locations_all@sel$13 (......))
batch_table_access_by_rowid(@sel$5ed1c707 po_line_locations_all@sel$13)

index_rs_asc(@sel$2f35337b po_line_locations_all@sel$10 (......))
batch_table_access_by_rowid(@sel$2f35337b po_line_locations_all@sel$10)

index_rs_asc(@sel$1b7d9ae9 po_line_locations_all@sel$7 (......))

This is where knowing about the “Query Block / Object Alias” section the execution plans is important – I’ve split the list into several parts based on the query block (@sel$xxxxxxxxxx) they are aimed at, and it’s only the first 5 we need to worry about.

Conveniently this part of the plan is a self-contained query block (@sel$ac90cd92) and we can see why we have an apparent contradiction between vw_nso_1 being the second table in the join order while being the build table: it’s second because of the leading() hint which dictates the join order, but it becomes the build table, hence appearing to be the first table in the join order, because of the swap_join_inputs() hint.

What we want is a join order where vw_nso_1 really is the first table in the join order, followed by a nested loop join into po_line_locations_all, using an index (not the full tablescan that the current hints dictate). It would probably be a good idea to get rid of the redundant no_swap_join_inputs() hints at the same time because that hint applies only to hash joins. So I think we need to replace the 5 hints above with the following 4 hints:

no_access(@sel$ac90cd92 vw_nso_1@sel$ac90cd92)
leading(@sel$ac90cd92 vw_nso_1@sel$ac90cd92 po_line_locations_all@sel$2)
use_nl(@sel$ac90cd92 po_line_locations_all@sel$2)
index(@sel$ac90cd92 po_line_locations_all@sel$2(line_location_id))

The index hint references column line_location_id because that’s the access predicate used in original hash join and I’m assuming that there is an index that starts with that column. It’s always a bit hit and miss with hinting and it might have been sufficient (as a first attempt) to use the index hint without trying to reference a specific index, and there might be good reasons for adding more columns to the list, or simple naming the index rather than describing it.

It’s quite likely that if this change in the hints is sufficient the resulting Outline Data would look a little different anyway; in particular the index() hint that I’ve suggested might get expanded to index_rs_asc(), and there might be a batch_table_access_by_rowid() added. Basically you do test runs until you get the result you want and then use the resulting Outline Data for the patch (although, occasionally, you still find that the Outline Data doesn’t reproduce the plan that it came from).

Frivolous Footnote

There were 75 hints totalling 3,191 bytes in the original Outline Data. If the text gets too long and messy for you to cope with when you create the patch you can probably remove all the double quotes, all the table names from the fully qualified column names in indexes, all the outline() and outline_leaf() hints, all the opt_param() hints that reflect system level parameter changes and the begin/end_outline_data hints and the ignore_optim_embedded_hints hint. You could also change long index descriptions to index_names and, if you’re like me, change it all to lower case anyway because I hate reading in capitals – and if you do change it all to lower case you have to remove the quotes. When I did all this to the original outline data the result was 1,350 bytes for 30 hints.

Descending max()

Wed, 2023-11-01 09:54

I’ve written a few notes about problems with “descending” indexes in the past (the word is in quotes because it’s not the index that’s defined as descending, it’s a proper subset of the columns of the index). A silly example came up recently on twitter where the cost of an execution plan changed so dramatically that it had to be hinted when the query changed from “order by list of column names” to “order by list of numeric positions”.

Just to make things a little messier, the index was not just “descending” but the column that had been declared as descending was actually a to_char() of a date column. I won’t reproduce the example here as the tweet links to a pastebin drop of a self-contained example. What I have, instead, is an even simpler example of a “descending” index producing a most unsuitable plan – even when you try to hint around it.

Here’s the demonstration that I’ve run on 19.11 and 23.3:

rem
rem     Script:         ind_desc_max_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2023
rem
rem     Last tested 
rem             21.3.0.0
rem             19.11.0.0
rem

create table t1 
as 
select  * 
from    all_objects
where   rownum <= 10000
/
  
alter table t1 modify object_name not null;

-- create index t1_i1a on t1(object_name);
create index t1_i1d on t1(object_name desc);

execute dbms_stats.gather_table_stats(user,'t1',cascade=>true)

alter session set statistics_level=all;
set serveroutput off

select  max(object_name)
from    t1
/

select * from table(dbms_xplan.display_cursor(format=>'cost allstats last hint_report projection'));

set serveroutput on
alter session set statistics_level = typical;

A couple of details to mention:

The code includes lines to create two indexes, one ascending and one descending. When I run the code I create only one of them.
I gather stats after creating the index – this is for the benefit of the descending index only, which needs stats collected on the underlying hidden column definition that Oracle creates to support it.
There’s a call to define object_name as not null – this is for the benefit of 23c. In 19c the view has several columns which carry forward their underlying not null declarations. In 23c none of the view columns has a not null declaration.

If I create the ascending index 19c and 23c both produce the following plan:

Plan hash value: 1421318352

------------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |        |      1 |        |     2 (100)|      1 |00:00:00.01 |       2 |
|   1 |  SORT AGGREGATE            |        |      1 |      1 |            |      1 |00:00:00.01 |       2 |
|   2 |   INDEX FULL SCAN (MIN/MAX)| T1_I1A |      1 |      1 |     2   (0)|      1 |00:00:00.01 |       2 |
------------------------------------------------------------------------------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=0) MAX("OBJECT_NAME")[128]
   2 - "OBJECT_NAME"[VARCHAR2,128]

If I create the descending index the plan changes (19c and 23c behave the same way, the following plan is from 23c):

-------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |        |      1 |        |     8 (100)|      1 |00:00:00.01 |      49 |
|   1 |  SORT AGGREGATE       |        |      1 |      1 |            |      1 |00:00:00.01 |      49 |
|   2 |   INDEX FAST FULL SCAN| T1_I1D |      1 |  10000 |     8   (0)|  10000 |00:00:00.01 |      49 |
-------------------------------------------------------------------------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=0) MAX(SYS_OP_UNDESCEND("T1"."SYS_NC00029$"))[128]
   2 - "T1"."SYS_NC00029$"[RAW,193]

There was a slight difference between versions – there are a couple of extra columns in the 23c view so the hidden column referenced in the Projection Information was sys_nc00027$ in 19c compared to sys_nc00029$ in 23c). I don’t know why the length is reported as 193 – I would have expected it to be 129 (since it’s going to hold the one’s-complement of the object_name and a trailing 0xFF byte).

The critical point, of course, is that the query is no longer using the special min/max path, it’s doing an index fast full scan, scanning through 49 buffers instead of accessing just the 2 buffers the min/max needed.

I added a no_index_ffs(t1) hint to see what would happen if I tried to block the bad path: Oracle did a tablescan; so I also added an index(t1) hint to see if that would help and got the following plan:

--------------------------------------------------------------------------------------------------
| Id  | Operation        | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |        |      1 |        |    46 (100)|      1 |00:00:00.01 |      46 |
|   1 |  SORT AGGREGATE  |        |      1 |      1 |            |      1 |00:00:00.01 |      46 |
|   2 |   INDEX FULL SCAN| T1_I1D |      1 |  10000 |    46   (0)|  10000 |00:00:00.01 |      46 |
--------------------------------------------------------------------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=0) MAX(SYS_OP_UNDESCEND("T1"."SYS_NC00029$"))[128]
   2 - "T1"."SYS_NC00029$"[RAW,193]

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1
---------------------------------------------------------------------------
   2 -  SEL$1 / "T1"@"SEL$1"
           -  index(t1)

As you can see from the Hint Report, the optimizer accepted and used my hint. But it hasn’t used the min/max optimisation, it’s done an index full scan, walking through 46 buffers in index order, which could well be more resource-intensive than the fast full scan.

I tried various tricks to get back the min/max optimisation, and I did finally manage to achieve it – but it’s not a realistic solution so don’t copy it. Here’s the SQL and plan:

select
        /*+ index(t1) */
        utl_raw.cast_to_varchar2(
                sys_op_undescend(
                        min(sys_op_descend(object_name))
                )
        )
from    t1
/

-------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |      1 |        |    46 (100)|      1 |00:00:00.01 |       2 |
|   1 |  SORT AGGREGATE             |        |      1 |      1 |            |      1 |00:00:00.01 |       2 |
|   2 |   FIRST ROW                 |        |      1 |  10000 |    46   (0)|      1 |00:00:00.01 |       2 |
|   3 |    INDEX FULL SCAN (MIN/MAX)| T1_I1D |      1 |  10000 |    46   (0)|      1 |00:00:00.01 |       2 |
-------------------------------------------------------------------------------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=0) MIN("T1"."SYS_NC00029$")[193]
   2 - "T1"."SYS_NC00029$"[RAW,193]
   3 - "T1"."SYS_NC00029$"[RAW,193]

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1
---------------------------------------------------------------------------
   3 -  SEL$1 / "T1"@"SEL$1"
           -  index(t1)

There is a “extra” first row operator in the plan but you can see that we’re back to a min/max optimisation accessing only 2 buffers to get the result we want. (And it was the same result).

For 23c the indexed column is the hidden column SYS_NC00029$, and I know that the optimizer will recognise the equivalence between the column and the expression sys_op_descend(object_name) so it’s been fooled into using the min/max optimisation on the index.
If I want the maximum object name I want the minimum sys_op_descend(object_name). Possibly the first row operation appears because the optimizer doesn’t know that the function call will always return a non-null result.
Once I’ve found the minimum I need to reverse the effect of the sys_op_descend() – which is where the sys_op_undescend() comes in, but that returns a raw value, so I’ve had to call a utl_raw function to convert the raw to a varchar2(). Watch out, though, because it’s going to be a maximum length varchar2().

If I can make it happen without even changing the shape of the query the optimizer ought to be able to make it happen – but there’s probably a little bit of generic index-usage code that’s currently getting in the way.

Summary

We’ve seen the pattern fairly often: indexes with a mix of ascending and descending columns can be very helpful in specific cases, but we shouldn’t be surprised when a mechanism that appears for “ascending only” indexes doesn’t work perfectly for an index with some descending columns.

Push Group by

Wed, 2023-11-01 05:40

Jump to summary.

A new optimizer feature that appears in 23c (probably not 21c) was the ability to push group by clauses into union all set operations. This will happen unhinted, but can be hinted with the highly memorable [no_]push_gby_into_union_all() hint that appeared in 23.1.0.0 according to v$sql_hint. and the feature can be disabled by setting the (equally memorable) hidden parameter _optimizer_push_gby_into_union_all to false.

From a couple of simple experiments it looks as if the hint should be used to identify query blocks where you want an aggregation (group by) that appears “outside” a union all (inline) view to happen “inside” the view. Here’s a trivial demonstration that I’ve run on 23.3

rem
rem     Script:         push_gby_ua.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2023
rem
rem     Last tested 
rem             23.3.0.0
rem

create table t1 
as 
select  *
from    all_Objects 
where   rownum <= 50000
;

set linesize 156
set pagesize 60
set trimspool on
set tab off
set serveroutput off

alter session set statistics_level = all;

select
        /*+
                -- qb_name(main)
                push_gby_into_union_all(@sel$2)
                no_push_gby_into_union_all(@sel$3)
        */
        owner, count(*)
from    (
        select /*  qb_name(u1) */ owner from t1 where owner = 'SYS'
        union all
        select /*  qb_name(u2) */ owner from t1 where owner = 'PUBLIC'
)       
group by owner
/

select * from table(dbms_xplan.display_cursor(format=>'allstats last cost outline alias hint_report qbregistry qbregistry_graph '));

All I’ve done here is create a table that copies 50,000 rows from the view all_objects, then executed a query that reports the number of objects for owners SYS and PUBLIC by selecting the two sets of objects separately and aggregating a union all of those sets.

For maximum visibility I’ve shown the positive and negative versions of the hint – the aggregation doesn’t have to apply to all the branches of the view and it’s not unknown for the optimizer to make the wrong choices if it hasn’t managed to produce a good cost estimate.

I had to know that the two subqueries inside the view would be named sel$2 and sel$3 by default, and I’ve used these names in the hints. In general it can be quite hard to work out the internal names of query blocks and you’ll note that I had used the qb_name() hint for the two subqueries (and the main query block) but then turned those hints into comments because the qbregistry_graph format I’d used in my call to dbms_xplan can’t cope with user-supplied query block names (yet). [Correction: I must have done something wrong when I was using the qb_name hints. I’ve re-run the tests and the graph comes out correctly]

Here’s the execution plan (with some of the bits removed) that I got from 23.3 for this test:

----------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |      1 |        |   295 (100)|      2 |00:00:00.03 |    2232 |   1114 |
|   1 |  HASH GROUP BY          |      |      1 |     15 |   295   (4)|      2 |00:00:00.03 |    2232 |   1114 |
|   2 |   VIEW                  |      |      1 |   3334 |   293   (3)|   9288 |00:00:00.03 |    2232 |   1114 |
|   3 |    UNION-ALL            |      |      1 |   3334 |   293   (3)|   9288 |00:00:00.03 |    2232 |   1114 |
|   4 |     SORT GROUP BY NOSORT|      |      1 |      1 |   147   (3)|      1 |00:00:00.02 |    1116 |   1114 |
|*  5 |      TABLE ACCESS FULL  | T1   |      1 |   3333 |   147   (3)|  39724 |00:00:00.01 |    1116 |   1114 |
|*  6 |     TABLE ACCESS FULL   | T1   |      1 |   3333 |   147   (3)|   9287 |00:00:00.01 |    1116 |      0 |
----------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('23.1.0')
      DB_VERSION('23.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$FC1F66D1")
      OUTLINE_LEAF(@"SEL$3")
      OUTLINE_LEAF(@"SET$22FBD6DA")
      PUSH_GBY_INTO_UNION_ALL(@"SEL$2")
      OUTLINE_LEAF(@"SEL$1")
      OUTLINE(@"SEL$2")
      OUTLINE(@"SET$1")
      NO_ACCESS(@"SEL$1" "from$_subquery$_001"@"SEL$1")
      USE_HASH_AGGREGATION(@"SEL$1" GROUP_BY)
      FULL(@"SEL$3" "T1"@"SEL$3")
      FULL(@"SEL$FC1F66D1" "T1"@"SEL$2")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - filter("OWNER"='SYS')
   6 - filter("OWNER"='PUBLIC')

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2
---------------------------------------------------------------------------
   4 -  SEL$FC1F66D1
           -  push_gby_into_union_all(@sel$2)

   6 -  SEL$3
           -  no_push_gby_into_union_all(@sel$3)

It’s interesting to note that the Hint Report tells us that both my hints were valid (and used); but the Ouline Data echoes only one of them (the “positive” push_gby_into_union_all). Because I’ve used the same table twice it’s not instantly clear that the optimizer has pushed the subquery that I had specified but if you check the Predicate Information you can confirm that the SYS data has been aggregated inside the union all and the PUBLIC data has been passed up to the union all operator without aggregation. (In the absence of the hints both data sets would have been aggregated early.)

Here, in comparison, is the plan (slightly reduced, and with the qbregistry options removed) that I got from 19.11.0.0

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |   265 (100)|      2 |00:00:00.09 |    1986 |    991 |       |       |          |
|   1 |  HASH GROUP BY       |      |      1 |     13 |   265   (4)|      2 |00:00:00.09 |    1986 |    991 |  1422K|  1422K|  653K (0)|
|   2 |   VIEW               |      |      1 |   7692 |   263   (3)|  48446 |00:00:00.07 |    1986 |    991 |       |       |          |
|   3 |    UNION-ALL         |      |      1 |        |            |  48446 |00:00:00.06 |    1986 |    991 |       |       |          |
|*  4 |     TABLE ACCESS FULL| T1   |      1 |   3846 |   131   (3)|  42034 |00:00:00.02 |     993 |    991 |       |       |          |
|*  5 |     TABLE ACCESS FULL| T1   |      1 |   3846 |   131   (3)|   6412 |00:00:00.01 |     993 |      0 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("OWNER"='SYS')
   5 - filter("OWNER"='PUBLIC')

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (E - Syntax error (1))
---------------------------------------------------------------------------
   1 -  SEL$1
         E -  push_gby_into_union_all

As you can see, 19.11 treats the hint as an error and both subqueries against t1 pass their rows up to the union all without aggregation. The 19.11 plan also gives you some idea of why it can be worth pushing the group by: 23.3 doesn’t report any memory used for either of the aggregation operations that take place while the postponed (or, rather, unpushed) aggregation in 19.11 reports 1.4M of memory used. As a general principle we might expect several small aggregations have a lower peak of memory usage than one large aggregation. There’s also a CPU benefit when Oracle doesn’t have to push lots of rows up through a couple of operations.

In fact the absence of memory-related columns in the 23.3 plan is a little suspect and I may have to examine it further. It may simply be the case that the size of the “small allocation” that doesn’t get reported in earlier versions has been increased to (best guess) 1MB; it may be that dbms_xplan in 23c has got a little bug that omits that part of the report.

Summary

Oracle 23c has a new transformation that will probably help to reduce memory and CPU consumption when it comes into play. Queries that aggregate over union all views may change plans to push the aggregation into some or all of the separate subqueries inside the union.

The feature is cost-based but you can over-ride the optimizer’s choice of which subqueries should be aggregated early with the hint [no_]push_gby_into_union_all(@qbname). The feature can also be disabled completely by setting the hidden parameter _optimizer_push_gby_into_union_all to false.

Addendum

It occurred to me that the optimizer will transform an IN-list to a list of equalities with OR, and it’s also capable of using OR-expansion then there might be cases where an aggregate based on an IN-list could go through the two steps and then benefit from this new feature, for example:

select
        sts, count(*) ct
from    t1
where   sts in ('B','C')
group by
        sts
/

-------------------------------------------------------------------------------
| Id  | Operation	     | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |	      |       |       |     3 (100)|	      |
|   1 |  SORT GROUP BY NOSORT|	      |     2 |     4 |     3	(0)| 00:00:01 |
|   2 |   INLIST ITERATOR    |	      |       |       | 	   |	      |
|*  3 |    INDEX RANGE SCAN  | T1_I1A |   100 |   200 |     3	(0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access(("STS"='B' OR "STS"='C'))

Alas, no. Although we can see the rewrite of the IN-list the optimizer doesn’t then use OR-expansion. And when I added the hint /*+ or_expand */ to try to push Oracle into the right direction the Hint Report told me:

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------
   1 -	SEL$1
	 U -  or_expand / No valid predicate for OR expansion

Maybe in the next release.

SQL_trace 23

Fri, 2023-10-27 06:46

That’s not the 23rd article I’ve written on sql_trace, it’s just that there’s a little “catch-up” detail about sql_trace that I recently discovered in 23.2 (though it might also be working in 21c or even in recent versions of 19c).

Although level 4 and 8 (binds and waits) of the SQL trace event are commonly known, and the more modern style of calls to sql_trace with the added plan_stat options are fairly well known it’s only in 23c (possibly 21c) that all the plan_stat options get implemented in dbms_session and dbms_monitor. From some time back in the 11g timeline we’ve been able to execute calls like:

alter system set events '
        sql_trace[SQL:9tz4qu4rj9rdp]
        bind=false,
        wait=true,
        plan_stat=all_executions
'
;

Taking advantage of the “oradebug doc” command (which can only be run by a highly privileged user), you could discover the available options for the plan_stat:

SQL> oradebug doc event name sql_trace

sql_trace: event for sql trace

Usage
-------
sql_trace
   wait       < false | true >,
   bind       < false | true >,
   plan_stat  < never | first_execution | all_executions | adaptive >,
   level      <ub4>

The default value for plan_stat is first_execution which is usually adequate, but if you were trying to find out why a particular query sometimes runs very slowly compared to usual you might want to set it to all_executions. If the query executes extremely frequently, though, and produces a fairly length execution plan you might decide to set the value to adaptive (see end notes) which ought to limit the frequency with which the execution plan is dumped into the trace file.

Note: the threat of very long plans for very quick executions of an extremely popular statement comes from two directions – the extra “execution” time to get the plan written to the trace file, and the total size of all the files dumped to the trace directory. Depending on your requirements you might be able to use the “UTS trace” setup to limit the volume kept on disc.

If you check the packages dbms_session and dbms_monitor you will find the following procedures:

QL> desc dbms_session
...
PROCEDURE SESSION_TRACE_ENABLE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 WAITS                          BOOLEAN                 IN     DEFAULT
 BINDS                          BOOLEAN                 IN     DEFAULT
 PLAN_STAT                      VARCHAR2                IN     DEFAULT
...

SQL> desc dbms_monitor
...
PROCEDURE CLIENT_ID_TRACE_ENABLE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 CLIENT_ID                      VARCHAR2                IN
 WAITS                          BOOLEAN                 IN     DEFAULT
 BINDS                          BOOLEAN                 IN     DEFAULT
 PLAN_STAT                      VARCHAR2                IN     DEFAULT

PROCEDURE DATABASE_TRACE_ENABLE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 WAITS                          BOOLEAN                 IN     DEFAULT
 BINDS                          BOOLEAN                 IN     DEFAULT
 INSTANCE_NAME                  VARCHAR2                IN     DEFAULT
 PLAN_STAT                      VARCHAR2                IN     DEFAULT

PROCEDURE SERV_MOD_ACT_TRACE_ENABLE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SERVICE_NAME                   VARCHAR2                IN
 MODULE_NAME                    VARCHAR2                IN     DEFAULT
 ACTION_NAME                    VARCHAR2                IN     DEFAULT
 WAITS                          BOOLEAN                 IN     DEFAULT
 BINDS                          BOOLEAN                 IN     DEFAULT
 INSTANCE_NAME                  VARCHAR2                IN     DEFAULT
 PLAN_STAT                      VARCHAR2                IN     DEFAULT

PROCEDURE SESSION_TRACE_ENABLE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SESSION_ID                     BINARY_INTEGER          IN     DEFAULT
 SERIAL_NUM                     BINARY_INTEGER          IN     DEFAULT
 WAITS                          BOOLEAN                 IN     DEFAULT
 BINDS                          BOOLEAN                 IN     DEFAULT
 PLAN_STAT                      VARCHAR2                IN     DEFAULT
...

Chris Antognini wrote an article some years ago listing the “10046 trace” levels and the relationships between the available numeric levels and the named parameter format (adaptive = 64). In the note he pointed out that adaptive was not available as a plan_stat value in dbms_session and dbms_monitor, and that Enhancement Request #14499199 had been raised to correct this omission. Since then I’ve run a little script from time to time to check if the ER has been fulfilled:

olumn  sid     new_value my_sid
column  serial# new_value my_serial

select  sid, serial#
from    v$session       ses
where   ses.sid = (
                select  ms.sid
                from    v$mystat ms
                where   rownum = 1
        )
;

execute dbms_monitor.session_trace_enable( &my_sid, &my_serial, waits=>false, binds=>false, plan_stat=>'adaptive')

execute dbms_session.session_trace_enable(waits=>false, binds=>false, plan_stat=>'adaptive');

execute dbms_monitor.session_trace_disable( &my_sid, &my_serial)
execute dbms_session.session_trace_disable;

This depends on the user having suitable privileges on the packages and on a couple of dynamic performance views, and all that happens is that it succeeds (or fails) to enable tracing. Here’s the output from 19.11.0.0 of the two calls to enable:

BEGIN dbms_monitor.session_trace_enable(         40,      56799, waits=>false, binds=>false, plan_stat=>'adaptive'); END;

*
ERROR at line 1:
ORA-30449: syntax error in parameter
ORA-06512: at "SYS.DBMS_MONITOR", line 123
ORA-06512: at line 1


BEGIN dbms_session.session_trace_enable(waits=>false, binds=>false, plan_stat=>'adaptive'); END;

*
ERROR at line 1:
ORA-30449: syntax error in parameter
ORA-06512: at "SYS.DBMS_SESSION", line 343
ORA-06512: at line 1

Then on the upgrade to 23.2.0.0, the boring but satisfying:

PL/SQL procedure successfully completed.

PL/SQL procedure successfully completed.

Readers with versions in the gap between 19.11.0.0 and 23.2.0.0 are invited to run a check and report the results in the comments to narrow down when the enhancement became available.

plan_stat=adaptive

There are a couple of notes on MOS about the meaning/effect of this setting. They’re not 100% consistent with each other.

Doc ID 21154.1 says:

This dumps the STAT information if a SQL took more than about 1 minute thereby giving information for the more expensive SQLs and for different executions of such SQLs.

Doc ID 8328200 (relating to Bug 8328200) says:

“adaptive” mode dumps STAT lines in SQL trace for every minute of dbtime per shared cursor.

As you can see you can read the two statements as trying to say the same thing, but there is some ambiguity in both statements, and some need for clarification of terms. So I’ve done a couple of simple experiments – basically running a PL/SQL loop that executes the same statement 120 times, where each execution takes a little less 3 seconds to complete.

Cutting the top and tail from each trace file left me with 120 “FETCH” lines; a few of these lines were immediately followed by STAT lines, most were followed by CLOSE lines with no STAT lines reported for the execution. Here are the first few lines of the results from a trace file generated by 23.2 from a call to: “grep -A+1 FETCH {tracefile}”:

FETCH #140175454862744:c=2737978,e=2814244,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36476287935
STAT #140175454862744 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=277822 pr=0 pw=0 str=1 time=2814245 dr=0 dw=0 us)'
--
FETCH #140175454862744:c=2758633,e=2885235,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36479173822
CLOSE #140175454862744:c=2,e=1,dep=1,type=3,tim=36479173972
--

Piping this into “grep -B+1 STAT” extracted just the FETCH/STAT pairs, which I could then pass through “grep FETCH” to get the output that I wanted to see: the “tim=” values for only those FETCH calls that were followed by STAT lines:

grep -A+1 FETCH {tracefile}  | grep -B+1 STAT  |  grep FETCH
 
FETCH #140175454862744:c=2737978,e=2814244,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36476287935
FETCH #140175454862744:c=2716296,e=2782323,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36541290925
FETCH #140175454862744:c=2804165,e=2916326,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36610338064
FETCH #140175454862744:c=2677000,e=2737197,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36676533249
FETCH #140175454862744:c=2722436,e=2796966,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36744048403
FETCH #140175454862744:c=2725066,e=2792661,p=0,cr=277822,cu=0,mis=0,r=1,dep=1,og=1,plh=3153623734,tim=36811536449

If you do the arithmetic with the tim= values you’ll find that the time between the executions that reported their execution plan in STAT lines varies from about 65 seconds to 69 seconds. The results from 19.11 were similar, varying from 64.9 seconds to 67.9 seconds. So the dumping STAT lines is dependent on spending (approximately) one minute of dbtime in execution (as both MOS Documents indicate) but it’s once every minute, and it’s time that can be accumulated over many executions of the statement. Part of the deviation from one minute comes, of course, from the fact the STAT lines are dumped only as the cursor closes.

The other detail that needs investigation is the interpretation of the clause “per shared cursor”. If I ran my test from two separate sessions concurrently would I be accumulating “db minutes” twice as fast and dumping STAT lines twice as frequently? The answer was yes – providing the two sessions acquired the same child cursor in v$sql.

To demonstrate this, I ran two concurrent copies of the PL/SQL loop, starting one session about 65 seconds after the other. My first trace file showed its second set of STAT lines after about 63 seconds then roughly every 33 seconds, and the second trace file (starting about one minute late) showed its second set of STAT lines after about 33 seconds, repeating that interval between dumps until nearly the end of its run when the last gap (after the first session had finished) stretched to 50 seconds.

Is it good enough?

Maybe, maybe not. It gives you a sample of execution plans for queries that accumulate “a lot” of time while using a particular child cursor, and that might be sufficient for some purposes.

If you have a frequently executed query that usually takes a couple of seconds but occasionally takes 90 seconds then that 90 second execution will show up in the trace file, either because it’s a different plan, and gets dumped on the first execution, or because it will have increased the shared execution time by more than 60 seconds.

If you have a query that executes extremely frequently and takes a couple of hundredths of a second each time but occasionally takes 5 seconds the adaptive option may not help. As with the 90 second/change of plan case you’ll see the plan; but all it does if the plan doesn’t change is improve your chances of seeing the stats of that slow execution plan – it has to take place in that little window where its execution time just tips the running total over the next 60 second limit.

Site search

Site navigation

Aggregator

Jonathan Lewis

Rownum quiz

AWR Snap ID

dbms_output

Index Usage – 3

Index Usage – 2

Index Usage – 1

Querying LONGs

Missing Partition

Disable oradebug

Missing SQL_ID

Object_id

Descending Bug

SDU size etc.

sys_op_descend()

gby_pushdown

Session Activity Stats

Swap_Join_Inputs

Descending max()

Push Group by

SQL_trace 23

Pages

Site search

User login

Site navigation

Aggregator

You are here

Jonathan Lewis

Pages