Oracle | Data Mining Functions

Oracle Data Mining Functions Version 21c

General Information

Library Note

Morgan's Library Page Header

The best Oracle News for FY2026

Oracle Database 26ai will be available on generic Linux platforms in January and soon on AIX and Windows

The data mining functions operate on models that have been built using the DBMS_DATA_MINING package or the Oracle Data Mining Java API.

For a close to complete list of Oracle built-in functions and demos in the library, both stand-alone and in built-in packages: [ Click Here ].

System Privileges

ALTER ANY MINING MODEL	CREATE ANY MINING MODEL	DROP ANY MINING MODEL
COMMENT ANY MINING MODEL	CREATE MINING MODEL	SELECT ANY MINING MODEL

CLUSTER_DETAILS

Predicts cluster membership for each row. It can use a pre-defined clustering model or perform dynamic clustering. Returns an XML string that describes the predicted cluster or a specified cluster. CLUSTER_DETAILS ([schema.]<model> [,<cluster_id>[,<topN>]] [<ABS | ASC | DESC>] USING <mining_attribute_clause>)

SELECT s.cluster_id, probability prob, 

        cluster_details(em_sh_clus_sample, s.cluster_id, 
      5 using t.*) det

      FROM (

        SELECT v.*, CLUSTER_SET(em_sh_clus_sample, NULL, 0.2
      using *) pset

        FROM mining_data_apply_v v

        WHERE cust_id = 100955) t,

        TABLE(t.pset) s

      ORDER BY 2 DESC;

CLUSTER_DISTANCE

Predicts cluster membership for each row. It can use a pre-defined clustering model or perform dynamic clustering. Returns the raw distance between each row and the centroid of either the predicted cluster or a specified. CLUSTER_DISTANCE([schema.]<model> [,<cluster_id>] USING <mining_attribute_clause>)

SELECT cust_id

      FROM (

      SELECT cust_id,

      rank() OVER

      (order by Ccluster_distance(km_sh_clus_sample
      using *) DESC) rnk

      FROM mining_data_apply_v)

      WHERE rnk <= 11

      ORDER BY rnk;

CLUSTER_ID

Returns the cluster identifier of the predicted cluster with the highest probability for the set of predictors specified in the mining_attribute_clause CLUSTER_ID(<schema.model> <mining_attribute_clause>)

SELECT cluster_id(km_sh_clus_sample
      using *) AS clus, COUNT(*) AS cnt

      FROM km_sh_sample_apply_prepared

      GROUP BY cluster_id(km_sh_clus_sample
      using *)

      ORDER BY cnt DESC;

CLUSTER_PROBABILITY

Returns a measure of the degree of confidence of membership of an input row in a cluster associated with the specified model CLUSTER_PROBABILITY(<schema.model>, <cluster_id> <mining_attribute_clause>)

SELECT *

      FROM (

        SELECT cust_id, cluster_probability(km_sh_clus_sample, 2 
      using *) prob

        FROM km_sh_sample_apply_prepared

        ORDER BY prob DESC)

      WHERE ROWNUM < 11;

CLUSTER_SET

Returns a varray of objects containing all possible clusters that a given row belongs to. Each object in the varray is a pair of scalar values containing the cluster ID and the cluster probability. The object fields are named CLUSTER_ID and PROBABILITY, and both are Oracle NUMBER CLUSTER_SET(<schema.model>, <top N>, <cutoff> <mining_attribute_clause>);

WITH clus_tab AS (

        SELECT id, a.attribute_name aname, a.conditional_operator op,

        NVL(A.attribute_str_value,

        ROUND(DECODE(a.attribute_name, n.col,

                     a.attribute_num_value * N.scale + N.shift,

                     a.attribute_num_value),4)) val,

                     a.attribute_support support,

                     a.attribute_confidence confidence

      FROM TABLE(dbms_data_mining.get_model_details_km('km_sh_clus_sample')) t,

           TABLE(t.rule.antecedent) a, km_sh_sample_norm n

      WHERE a.attribute_name = n.col(+)

      AND a.attribute_confidence > 0.55), clust AS (

        SELECT id, CAST(COLLECT(Cattr(aname, op, TO_CHAR(val), support, confidence)) AS cattrs) cl_attrs

        FROM clus_tab

        GROUP BY id), custclus AS (

          SELECT T.cust_id, S.cluster_id, S.probability

          FROM (

            SELECT cust_id, 
      cluster_set(km_sh_clus_sample, NULL, 0.2 
      using *) pset

            FROM km_sh_sample_apply_prepared

            WHERE cust_id = 101362) t,

            TABLE(t.pset) s)

          SELECT a.probability prob, a.cluster_id cl_id, b.attr, b.op, b.val, b.supp, b.conf

          FROM custclus a, (

            SELECT t.id, c.*

            FROM clust t, TABLE(t.cl_attrs) c) b

            WHERE a.cluster_id = B.id

            ORDER BY prob DESC, cl_id ASC, conf DESC, attr ASC, val ASC;

CUBE_TABLE

Extracts data from a cube or dimension and returns it in the two-dimensional format of a relational table, which can be used by SQL-based applications CUBE_TABLE('<schema_name>.<cube_name> [<HIERARCHY | HRR> <DIMENSION> <HIERARCHY>'); CUBE_TABLE('<schema_name>.<dimension_name> [<HIERARCHY | HRR> [<DIMENSION>] <HIERARCHY>');

conn global/global

      

      SELECT * FROM TABLE(cube_table('global.channel'));

      

      -- the docs claim this works ... but from my experience it does not

      

      SQL> SELECT * FROM TABLE(CUBE_TABLE('global.units_cube HIERARCHY customer market_rollup HIERARCHY time calendar'));

      SELECT * FROM TABLE(CUBE_TABLE('global.units_cube HIERARCHY customer market_rollup HIERARCHY time ca

      *

      ERROR at line 1:

      ORA-36842: Hierarchy GLOBAL.CUSTOMER.MARKET_ROLLUP was not found

      

      -- the error has been reported to the tech writer

FEATURE_COMPARE

Uses a Feature Extraction model to compare two different documents, including short ones such as keyword phrases or two attribute lists, for similarity or dissimilarity. Can be used with Feature Extraction algorithms such as Singular Value Decomposition (SVD), Principal Component Analysis PCA), Non-Negative Matrix Factorization (NMF), and Explicit Semantic Analysis (ESA). FEATURE_COMPARE([<schema_name>.]<model><mining_attribute_clause> AND <mining_attribute_clause> USING '<document>' AND USING '<document>';

SELECT 1-feature_compare(esa_wiki_mod

      using 'There are several PGA tour golfers from South Africa' text

      AND using 'Nick Price won the 2002 Mastercard Colonial Open' text) similarity

      FROM dual;

      

      SIMILARITY

      ----------

       
          .258

FEATURE_DETAILS

Predicts feature matches for each row. Can use a pre-defined feature extraction model or perform dynamic feature extraction. Returns an XML string that describes the predicted feature or a specified feature. FEATURE_DETAILS(<schema.model><model>[feature_id[,topN]][<ABS | ACS | DESC>]<mining_attribute_clause>

TBD

FEATURE_ID

Returns an Oracle NUMBER that is the identifier of the feature with the highest coefficient value FEATURE_ID(<schema.model> <mining_attribute_clause>)

SELECT feature_id(nmf_sh_sample
      using *) AS feat, COUNT(*) AS cnt

      FROM nmf_sh_sample_apply_prepared

      GROUP BY feature_id(nmf_sh_sample
      using *)

      ORDER BY cnt DESC;

FEATURE_SET

Returns a varray of objects containing all possible features. Each object in the varray is a pair of scalar values containing the feature ID and the feature value. The object fields are named FEATURE_ID and VALUE, and both are Oracle NUMBERs. FEATURE_SET(<schema.model>, <top N>, <cutoff> <mining_attribute_clause>)

WITH feat_tab AS (

        SELECT f.feature_id fid,

             a.attribute_name attr,

             TO_CHAR(a.attribute_value) val,

             a.coefficient coeff

        FROM TABLE(DBMS_DATA_MINING.GET_MODEL_DETAILS_NMF('nmf_sh_sample')) f,

             TABLE(F.attribute_set) a

        WHERE A.coefficient > 0.25),

           feat AS (

        SELECT fid,

             CAST(COLLECT(Featattr(attr, val, coeff)) AS FeatAttrs) f_attrs

        FROM feat_tab

        GROUP BY fid),

          cust_10_features AS (

        SELECT t.cust_id, s.feature_id, s.value

        FROM (SELECT cust_id, feature_set(nmf_sh_sample, 10 
      using *) pset

              FROM nmf_sh_sample_apply_prepared

              WHERE cust_id = 100002) t, TABLE(T.pset) s)

      SELECT a.value, A.feature_id fid, b.attr, b.val, b.coeff

      FROM cust_10_features A,

        (SELECT t.fid, f.* FROM feat t, TABLE(t.f_attrs) f) b

      WHERE A.feature_id = B.fid

      ORDER BY a.value DESC, a.feature_id ASC, coeff DESC, attr ASC, val ASC;

FEATURE_VALUE

Returns the value of a given feature. If you omit the feature_id argument, then the function returns the highest feature value. You can use this form in conjunction with the FEATURE_ID function to obtain the largest feature/value combo. FEATURE_VALUE(<schema.model>,<feature_id><mining_attribute_clause>)

SELECT *

      FROM (

        SELECT cust_id, feature_value(nmf_sh_sample,3
      using *) match_qual

        FROM nmf_sh_sample_apply_prepared

        ORDER BY match_quality DESC)

      WHERE ROWNUM < 11;

ORA_DM_PARTITION_NAME

A single row function, returns the name of the partition associated with the input row. When used on a non-partitioned model, the result is NULL. ORA_DM_PARTITION_NAME(<schema.model> <mining_attribute_clause>) USING <using_clause>

SELECT prediction(mymodel
      using *) pred,

      ora_dm_partition_name(mymodel using *)
      pname

      FROM customers;

PREDICTION

Returns the best prediction for the model. The datatype returned depends on the target value type used during the build of the model. For regression models, this function returns the expected value. PREDICTION(<schema.model>, <cost_matrix_clause> <mining_attribute_clause>)

SELECT cust_gender, COUNT(*) AS cnt, ROUND(AVG(age)) AS avg_age

      FROM mining_data_apply_v

      WHERE prediction(DT_SH_Clas_sample COST MODEL

      using cust_marital_status, education, household_size) = 1

      GROUP BY cust_gender

      ORDER BY cust_gender;

PREDICTION_BOUNDS

For use only with generalized linear models. It returns an object with two NUMBER fields LOWER and UPPER PREDICTION_BOUNDS(<schema.model>, <confidence_level>, <class_value> USING <* | schema.table.* | expression AS alias>

SELECT cust_marital_status, COUNT(cust_id) AS CUST_COUNT

      FROM (

        SELECT cust_id, cust_marital_status

        FROM mining_data_apply_v

        WHERE prediction_bounds(glmr_sh_regr_sample,0.98
      using *).LOWER > 24

        AND prediction_boundsS(glmr_sh_regr_sample,0.98
      using *).UPPER < 46)

      GROUP BY cust_marital_status;

PREDICTION_COST

Returns a measure of cost for a given prediction as an Oracle NUMBER PREDICTION_COST(<schema.model>, <class> <cost_matrix_clause> <mining_attribute_clause>)

WITH cust_italy AS (

        SELECT cust_id

        FROM mining_data_apply_v

        WHERE country_name = 'Italy'

        ORDER BY prediction_cost(DT_SH_Clas_sample,1 
      COST MODEL using *) ASC,1)

      SELECT cust_id

      FROM cust_italy

      WHERE rownum < 11;

PREDICTION_DETAILS

Returns an XML string containing model-specific information related to the scoring of the input row PREDICTION_DETAILS(<schema.model> <mining_attribute_clause>)

SELECT cust_id, education,

      prediction_details(DT_SH_Clas_sample
      using *) treenode

      FROM mining_data_apply_v

      WHERE occupation = 'TechSup' AND age < 25

      ORDER BY cust_id;

PREDICTION_PROBABILITY

Returns the probability for a given prediction as an Oracle NUMBER PREDICTION_PROBABILITY(<schema.model> <class> <mining_attribute_clause>)

SELECT cust_id

      FROM (

        SELECT cust_id

        FROM mining_data_apply_v

        WHERE country_name = 'Italy'

        ORDER BY predicton_probability(DT_SH_Clas_sample, 
      1 using *)

        DESC, cust_id)

      WHERE rownum < 11;

PREDICTION_SET

Returns a varray of objects containing all classes in a multi-class classification scenario PREDICTION_SET(<schema.model>, <best N>, <cutoff> <cost_matrix_clause> <mining_attribute_clause>)

SELECT T.cust_id, S.prediction, S.probability, S.cost

      FROM (

        SELECT cust_id,

       prediction_set(dt_sh_clas_sample COST MODEL using *) pset

        FROM mining_data_apply_v

        WHERE cust_id < 100011) T,

      TABLE(T.pset) S

      ORDER BY cust_id, S.prediction;