Which has the higher priority in your organization: Deploying a new database or securing the ones you already have?
Looking for a website, and resources, dedicated solely to securing Oracle databases? Check out DBSecWorx.
Be sure to view the full listing of monographs in Morgan's Library
Histogram Types
Data skew in a column can make it difficult for the optimizer to accurately estimate the cardinality of an operation.
Without a histogram it will assume an even distribution amongst the distinct values for the column.
Take the case of a Yes/No flag for active records in a table. There may be 1 million rows in the table with only 100 being marked as active.
The optimizer would assume half are marked as "Yes" and half as "No", which would be a really bad assumption in this case.
Histograms describe the distribution of data in a column, which helps identify data skew and allows the optimizer to make a better decision.
Oracle uses four types of histograms to assist the optimizer in better understanding the nature of the data. All histograms data is stored in the HISTOGRAM column of the
[CDB_, DBA_, ALL_, and USER_] TAB_COL_STATISTICS views.
Creation
Histograms are most commonly created by the DBMS_STATS built-in package when it runs but examining predicates logged in the sys.col_usage$ table.
When stats collection next runs this table is checked to see if histogram creation will have a positive impact on future access.
Non default parameters supplied to the METHOD_OPT parameter can be used to manually set the column(s) for histogram creation and the desired bucket size.
Histogram Note
As far as histograms go, the best strategy is to have none by default, and only create specific histograms when you are sure you need them.
~ Jonathan Lewis, 31-Jan-2010 in the OTN Database-General forum.
In a frequency histogram, each value of the column corresponds to a single bucket of the histogram. Each bucket contains the number of occurrences of that single value.
Frequency histograms are automatically created instead of height-balanced histograms when the number of distinct values is less than or equal to the number of histogram buckets specified
(the number of buckets defaults to 254 if not otherwise specified).
Frequency histograms can be viewed using the *TAB_HISTOGRAMS views.
ALTER TABLE uwclass.freqbal
ADD CONSTRAINT pk_freqbal
PRIMARY KEY (rec_id);
CREATE INDEX ix_freqbal_rec_type
ON uwclass.freqbal(rec_type);
INSERT INTO uwclass.freqbal
SELECT level AS recid,
CASE WHEN MOD(level, 2) = 0 THEN 0
ELSE TRUNC(dbms_random.value(1,10))
END AS rec_type, dbms_random.string('x', 30)
FROM dual
CONNECT BY level <= 10000;
COMMIT;
SELECT rec_type, COUNT(*)
FROM uwclass.freqbal
GROUP BY rec_type
ORDER BY 1;
SELECT column_id, column_name, histogram
FROM dba_tab_cols
WHERE owner = 'UWCLASS'
AND table_name = 'FREQBAL'
ORDER BY 1;
COLUMN_ID COLUMN_NAME HISTOGRAM
---------- ------------ ---------------
1 ID NONE
2 RECORD_TYPE FREQUENCY 3 DESCRIPTION NONE
View Histogram Data
CREATE OR REPLACE FUNCTION raw2num(rawval IN RAW) RETURN NUMBER AUTHID DEFINER IS
n NUMBER;
BEGIN
dbms_stats.convert_raw_value(rawval, n);
RETURN n;
END raw2num;
/
Function created.
SELECT column_name, num_distinct, raw2num(low_value) LV, raw2num(high_value) HV,
density, num_nulls, avg_col_len, histogram
FROM dba_tab_cols
WHERE owner = 'UWCLASS'
AND table_name = 'FREQBAL';
SELECT endpoint_value, endpoint_number, (endpoint_number - LAG(endpoint_number, 1, 0) OVER (ORDER BY endpoint_value)) AS frequency
FROM dba_tab_histograms
WHERE owner = 'UWCLASS'
AND table_name = 'FREQBAL'
AND column_name = 'REC_TYPE'
ORDER BY 1;
In Oracle database 12c and above, height-balanced histograms are created only if dynamic sampling is used during statistics collection.
When explicit sampling does not take place the database performs a full table scan and creates a top frequency or hybrid histogram (both covered below).
ALTER TABLE uwclass.htbal
ADD CONSTRAINT pk_htbal
PRIMARY KEY (rec_id);
CREATE INDEX ix_htbal_rec_type
ON uwclass.htbal(rec_type);
INSERT INTO uwclass.htbal
SELECT level AS recid,
CASE WHEN MOD(level, 2) = 0 THEN 0
ELSE level
END AS rec_type, dbms_random.string('x', 30)
FROM dual
CONNECT BY level <= 10000;
COMMIT;
SELECT COUNT(*)
FROM uwclass.htbal;
COUNT(*)
---------
10000
SELECT rec_type, COUNT(*)
FROM uwclass.htbal
GROUP BY rec_type
HAVING COUNT(*) > 1
ORDER BY 1;
REC_TYPE COUNT(*)
--------- --------
0 5000
EXPLAIN PLAN FOR
SELECT rec_desc
FROM htbal
WHERE rec_type = 0;
SELECT * FROM TABLE(dbms_xplan.display);
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4813 | 141K | 14 (0)|
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| HTBAL | 4813 | 141K | 14 (0)|
|* 2 | INDEX RANGE SCAN | IX_HTBAL_REC_TYPE | 4813 | | 14 (0)|
--------------------------------------------------------------------------------------------
Manual Histogram Creation
col column_name format a20
col range format a20
SELECT column_id, column_name, histogram
FROM dba_tab_cols
WHERE owner = 'UWCLASS'
AND table_name = 'HTBAL'
ORDER BY 1;
CREATE OR REPLACE FUNCTION raw2num(rawval IN RAW) RETURN NUMBER AUTHID DEFINER IS
n NUMBER;
BEGIN
dbms_stats.convert_raw_value(rawval, n);
RETURN n;
END raw2num;
/
SELECT column_name, num_distinct, raw2num(low_value) LV, raw2num(high_value) HV,
density, num_nulls, avg_col_len, histogram
FROM dba_tab_cols
WHERE owner = 'UWCLASS'
AND table_name = 'HTBAL';
SELECT endpoint_value, endpoint_number, (endpoint_number - LAG(endpoint_number, 1, 0) OVER (ORDER BY endpoint_value)) AS frequency
FROM dba_tab_histograms
WHERE owner = 'UWCLASS'
AND table_name = 'HTBAL'
AND column_name = 'REC_TYPE'
AND rownum < 11
ORDER BY 1;
Top-frequency histograms are a frequency histogram variant. Frequency histograms target common values, ignoring the less common values as statistically insignificant.
Top-frequency histograms have maximum value when the number of distinct most common values is less than or equal to the number of histogram buckets, while the number of less common values are significantly less common in comparison to the most common values.
ALTER TABLE topfreq
ADD CONSTRAINT pktopfreq
PRIMARY KEY (rec_id);
CREATE INDEX ix_topfreq_rec_type
ON topfreq(rec_type);
INSERT INTO topfreq
SELECT level AS recid,
CASE
WHEN level <= 9990 THEN TRUNC(dbms_random.value(1,10))
ELSE level
END AS rec_type, dbms_random.string('x', 30)
FROM dual
CONNECT BY level <= 10000;
COMMIT;
SELECT rec_type, COUNT(*)
FROM uwclass.topfreq
GROUP BY rec_type
ORDER BY 1;
SELECT (LAG(endpoint_value, 1, 0) OVER (ORDER BY endpoint_value)+1) || '-' || endpoint_value AS RANGE,
endpoint_value - (LAG(endpoint_value, 1, 0) OVER (ORDER BY endpoint_value)+1) + 1 AS VALS_IN_RNG,
endpoint_number - LAG(endpoint_number, 1, 0) OVER (ORDER BY endpoint_value) AS FREQUENCY
FROM dba_tab_histograms
WHERE owner = 'UWCLASS'
AND table_name = 'TOPFREQ'
AND column_name = 'REC_TYPE'
ORDER BY endpoint_value;
CREATE OR REPLACE FUNCTION raw2num(rawval IN RAW) RETURN NUMBER AUTHID DEFINER IS
n NUMBER;
BEGIN
dbms_stats.convert_raw_value(rawval, n);
RETURN n;
END raw2num;
/
SELECT column_name, num_distinct, raw2num(low_value) LV, raw2num(high_value) HV,
density, num_nulls, avg_col_len, histogram
FROM dba_tab_cols
WHERE owner = 'UWCLASS'
AND table_name = 'TOPFREQ';
SELECT endpoint_value, endpoint_number, (endpoint_number - LAG(endpoint_number, 1, 0) OVER (ORDER BY endpoint_value)) AS frequency
FROM dba_tab_histograms
WHERE owner = 'UWCLASS'
AND table_name = 'TOPFREQ'
AND column_name = 'REC_TYPE'
ORDER BY 1;
EXPLAIN PLAN FOR
SELECT rec_desc
FROM topfreq
WHERE rec_type = 2;
SELECT * FROM TABLE(dbms_xplan.display);
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1075 | 22575 | 25 (0)|
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| TOPFREQ | 1075 | 22575 | 25 (0)|
----------------------------------------------------------------------------------------------
-- compare the above plan with the one from before histogram creation
-- the index scan is gone
EXPLAIN PLAN FOR
SELECT rec_desc
FROM topfreq
WHERE rec_type = 10000;
Hybrid histograms combine the value of both frequency and height-balanced histograms. The database will, if possible, create a hybrid histograms rather than height-balanced histograms due to their increased value.
Unlike height-balanced histograms, a single endpoint value cannot span buckets.
In addition to the highest value in the bucket, the histogram stores the number of occurrences the highest value is represented in the bucket, providing an accurate representation of its popularity,
as well as giving an indication of the popularity of the other endpoints in the bucket.
ALTER TABLE uwclass.hybrid ADD CONSTRAINT pk_hybrid PRIMARY KEY (rec_id);
CREATE INDEX ix_hybrid_rec_type ON uwclass.hybrid(rec_type);
INSERT INTO uwclass.hybrid
SELECT level AS recid,
CASE WHEN MOD(level,2) = 0 THEN TRUNC(dbms_random.value(1,100))
ELSE level
END AS rec_type, dbms_random.string('x', 30)
FROM dual
CONNECT BY level <= 10000;
COMMIT;
SELECT rec_type, COUNT(*)
FROM hybrid
WHERE rec_type IN (2, 9999)
GROUP BY rec_type;
SELECT (LAG(endpoint_value, 1, 0) OVER (ORDER BY endpoint_value)+1) || '-' || endpoint_value AS range,
endpoint_value - (LAG(endpoint_value, 1, 0) OVER (ORDER BY endpoint_value)+1) + 1 AS vals_in_range,
endpoint_number - LAG(endpoint_number, 1, 0) OVER (ORDER BY endpoint_value) AS frequency,
endpoint_repeat_count
FROM dba_tab_histograms
WHERE owner = 'UWCLASS'
AND table_name = 'HYBRID'
AND column_name = 'REC_TYPE'
ORDER BY endpoint_value;
CREATE OR REPLACE FUNCTION raw2num(rawval IN RAW) RETURN NUMBER AUTHID DEFINER IS
n NUMBER;
BEGIN
dbms_stats.convert_raw_value(rawval, n);
RETURN n;
END raw2num;
/
SELECT column_name, num_distinct, raw2num(low_value) LV, raw2num(high_value) HV,
density, num_nulls, avg_col_len, histogram
FROM dba_tab_cols
WHERE owner = 'UWCLASS'
AND table_name = 'HYBRID';
SELECT endpoint_value, endpoint_number, (endpoint_number - LAG(endpoint_number, 1, 0) OVER (ORDER BY endpoint_value)) AS frequency
FROM dba_tab_histograms
WHERE owner = 'UWCLASS'
AND table_name = 'HYBRID'
AND column_name = 'REC_TYPE'
ORDER BY 1;
EXPLAIN PLAN FOR
SELECT rec_desc
FROM hybrid
WHERE rec_type = 2;
SELECT * FROM TABLE(dbms_xplan.display);
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 48 | 1680 | 18 (0)|
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED | HYBRID | 48 | 1680 | 18 (0)|
|* 2 | INDEX RANGE SCAN | IX_HYBRID_REC_TYPE | 48 | | 1 (0)|
----------------------------------------------------------------------------------------------
-- compare the above plan with the one from before histogram creation,
-- the index scan is gone
EXPLAIN PLAN FOR
SELECT rec_desc
FROM hybrid
WHERE rec_type = 9999;
SELECT * FROM TABLE(dbms_xplan.display);
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 35 | 2 (0)|
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| HYBRID | 1 | 35 | 2 (0)|
|* 2 | INDEX RANGE SCAN | IX_TOPFREQ_REC_TYPE | 1 | | 1 (0)|
----------------------------------------------------------------------------------------------
Histogram Demos
Histogram Demo
Create the servers and serv_inst tables: Click Here
SELECT column_id, column_name, histogram
FROM dba_tab_cols
WHERE owner = 'UWCLASS'
AND table_name IN ('SERVERS', 'SERV_INST')
ORDER BY 1;
set autotrace traceonly explain
SELECT s.srvr_id, i.type
FROM servers s, serv_inst i
WHERE s.srvr_id = i.srvr_id
AND type = 'WIN';
SELECT *
FROM (
SELECT *
FROM sys.col_usage$
WHERE obj# = (
SELECT object_id
FROM dba_objects
WHERE object_name = 'T'
AND owner = 'UWCLASS')
)
UNPIVOT (value FOR x IN (
EQUALITY_PREDS, EQUIJOIN_PREDS, NONEQUIJOIN_PREDS, RANGE_PREDS, LIKE_PREDS, NULL_PREDS));
SELECT *
FROM (
SELECT *
FROM sys.col_usage$
WHERE obj# = (
SELECT object_id
FROM dba_objects
WHERE object_name = 'T'
AND owner = 'UWCLASS')
)
UNPIVOT (value FOR x IN (
EQUALITY_PREDS, EQUIJOIN_PREDS, NONEQUIJOIN_PREDS, RANGE_PREDS, LIKE_PREDS, NULL_PREDS));
This demo is from Oracle 10.2.0.5 but serves as an important cautionary example for anyone using histograms.
The demo code is mine and the explanation from Jonathan Lewis.
Both of these explain plans were created using the same session logged into the same database within a matter of a few minutes.
Note in the first explain plan "BITMAP CONVERSION" ... sometimes I get the first result, other times the second.
All indexes are verified to be B*Tree (database version 10.2.0.5 on RedHat Linux V5U6). The SQL statement that generated the plans, below, was not altered in any way.
EXPLAIN PLAN FOR
2 SELECT member_id, SUM(cnt)
3 FROM (
4 SELECT member_id, 1 CNT
5 FROM webstore.orders
6 WHERE site_id IN (23,24,25,29,30,31,32,33)
7 AND status_id = 7
8 AND date_received BETWEEN add_months(SYSDATE, -12) AND SYSDATE
9 AND rownum = 1
10 UNION ALL
11 SELECT member_id, 1
12 FROM webstore.orders
13 WHERE site_id IN (23,24,25,29,30,31,32,33)
14 AND status_id = 7
15 AND date_received BETWEEN add_months(SYSDATE, -24) AND add_months(SYSDATE, -12)
16 AND rownum = 1
17 UNION ALL
18 SELECT member_id, 1
19 FROM webstore.orders
20 WHERE site_id IN (23,24,25,29,30,31,32,33)
21 AND status_id = 7
22 AND date_received BETWEEN add_months(SYSDATE, -36) AND add_months(SYSDATE, -24)
23 AND rownum = 1)
24 WHERE rownum < 4
25 GROUP BY member_id
26 HAVING SUM(cnt) > 1;
/*
The anomaly is, in principle, perfectly reasonable - and in your case could be explained simply by a histogram on the date column.
add_months(sysdate,N) is a known constant at optimisation time so, with a re-optimisation, a small change in sysdate, could produce different cardinalities and therefore different plans.
~Jonathan Lewis 19 November, 2011
*/