ACE Director Alum Daniel Morgan, founder of Morgan's Library, is scheduling
complimentary technical Workshops on Database Security for the first 30
Oracle Database customers located anywhere in North America, EMEA, LATAM, or
APAC that send an email to
asra_us@oracle.com. Request a Workshop for
your organization today.
The data mining functions operate on models that have been built using the DBMS_DATA_MINING package or the Oracle Data Mining Java API.
For a close to complete list of Oracle built-in functions and demos in the library, both stand-alone and in built-in packages: [ Click Here ].
Predicts cluster membership for each row. It can use a pre-defined clustering model or perform dynamic clustering. Returns an XML string that describes the predicted cluster or a specified cluster.
CLUSTER_DETAILS ([schema.]<model> [,<cluster_id>[,<topN>]] [<ABS | ASC | DESC>] USING <mining_attribute_clause>)
SELECT s.cluster_id, probability prob,
cluster_details(em_sh_clus_sample, s.cluster_id,
5 using t.*) det
FROM (
SELECT v.*, CLUSTER_SET(em_sh_clus_sample, NULL, 0.2
using *) pset
FROM mining_data_apply_v v
WHERE cust_id = 100955) t,
TABLE(t.pset) s
ORDER BY 2 DESC;
Predicts cluster membership for each row. It can use a pre-defined clustering model or perform dynamic clustering. Returns the raw distance between each row and the centroid of either the predicted cluster or a specified.
CLUSTER_DISTANCE([schema.]<model> [,<cluster_id>] USING <mining_attribute_clause>)
SELECT cust_id
FROM (
SELECT cust_id,
rank() OVER
(order by Ccluster_distance(km_sh_clus_sample
using *) DESC) rnk
FROM mining_data_apply_v)
WHERE rnk <= 11
ORDER BY rnk;
Returns the cluster identifier of the predicted cluster with the highest probability for the set of predictors specified in the mining_attribute_clause
SELECT cluster_id(km_sh_clus_sample
using *) AS clus, COUNT(*) AS cnt
FROM km_sh_sample_apply_prepared
GROUP BY cluster_id(km_sh_clus_sample
using *)
ORDER BY cnt DESC;
SELECT *
FROM (
SELECT cust_id, cluster_probability(km_sh_clus_sample, 2
using *) prob
FROM km_sh_sample_apply_prepared
ORDER BY prob DESC)
WHERE ROWNUM < 11;
Returns a varray of objects containing all possible clusters that a given row belongs to.
Each object in the varray is a pair of scalar values containing the cluster ID and the cluster probability. The object fields are named CLUSTER_ID and PROBABILITY, and both are Oracle NUMBER
WITH clus_tab AS (
SELECT id, a.attribute_name aname, a.conditional_operator op,
NVL(A.attribute_str_value,
ROUND(DECODE(a.attribute_name, n.col,
a.attribute_num_value * N.scale + N.shift,
a.attribute_num_value),4)) val,
a.attribute_support support,
a.attribute_confidence confidence
FROM TABLE(dbms_data_mining.get_model_details_km('km_sh_clus_sample')) t,
TABLE(t.rule.antecedent) a, km_sh_sample_norm n
WHERE a.attribute_name = n.col(+)
AND a.attribute_confidence > 0.55), clust AS (
SELECT id, CAST(COLLECT(Cattr(aname, op, TO_CHAR(val), support, confidence)) AS cattrs) cl_attrs
FROM clus_tab
GROUP BY id), custclus AS (
SELECT T.cust_id, S.cluster_id, S.probability
FROM (
SELECT cust_id,
cluster_set(km_sh_clus_sample, NULL, 0.2
using *) pset
FROM km_sh_sample_apply_prepared
WHERE cust_id = 101362) t,
TABLE(t.pset) s)
SELECT a.probability prob, a.cluster_id cl_id, b.attr, b.op, b.val, b.supp, b.conf
FROM custclus a, (
SELECT t.id, c.*
FROM clust t, TABLE(t.cl_attrs) c) b
WHERE a.cluster_id = B.id
ORDER BY prob DESC, cl_id ASC, conf DESC, attr ASC, val ASC;
SELECT * FROM TABLE(cube_table('global.channel'));
-- the docs claim this works ... but from my experience it does not
SQL> SELECT * FROM TABLE(CUBE_TABLE('global.units_cube HIERARCHY customer market_rollup HIERARCHY time calendar'));
SELECT * FROM TABLE(CUBE_TABLE('global.units_cube HIERARCHY customer market_rollup HIERARCHY time ca
*
ERROR at line 1:
ORA-36842: Hierarchy GLOBAL.CUSTOMER.MARKET_ROLLUP was not found
Uses a Feature Extraction model to compare two different documents, including short ones such as keyword phrases or two attribute lists, for similarity or dissimilarity.
Can be used with Feature Extraction algorithms such as Singular Value Decomposition (SVD), Principal Component Analysis PCA), Non-Negative Matrix Factorization (NMF), and Explicit Semantic Analysis (ESA).
FEATURE_COMPARE([<schema_name>.]<model><mining_attribute_clause>
AND <mining_attribute_clause>
USING '<document>'
AND USING '<document>';
SELECT 1-feature_compare(esa_wiki_mod
using 'There are several PGA tour golfers from South Africa' text
AND using 'Nick Price won the 2002 Mastercard Colonial Open' text) similarity
FROM dual;
Predicts feature matches for each row. Can use a pre-defined feature extraction model or perform dynamic feature extraction. Returns an XML string that describes the predicted feature or a specified feature.
SELECT feature_id(nmf_sh_sample
using *) AS feat, COUNT(*) AS cnt
FROM nmf_sh_sample_apply_prepared
GROUP BY feature_id(nmf_sh_sample
using *)
ORDER BY cnt DESC;
Returns a varray of objects containing all possible features. Each object in the varray is a pair of scalar values containing the feature ID and the feature value.
The object fields are named FEATURE_ID and VALUE, and both are Oracle NUMBERs.
WITH feat_tab AS (
SELECT f.feature_id fid,
a.attribute_name attr,
TO_CHAR(a.attribute_value) val,
a.coefficient coeff
FROM TABLE(DBMS_DATA_MINING.GET_MODEL_DETAILS_NMF('nmf_sh_sample')) f,
TABLE(F.attribute_set) a
WHERE A.coefficient > 0.25),
feat AS (
SELECT fid,
CAST(COLLECT(Featattr(attr, val, coeff)) AS FeatAttrs) f_attrs
FROM feat_tab
GROUP BY fid),
cust_10_features AS (
SELECT t.cust_id, s.feature_id, s.value
FROM (SELECT cust_id, feature_set(nmf_sh_sample, 10
using *) pset
FROM nmf_sh_sample_apply_prepared
WHERE cust_id = 100002) t, TABLE(T.pset) s)
SELECT a.value, A.feature_id fid, b.attr, b.val, b.coeff
FROM cust_10_features A,
(SELECT t.fid, f.* FROM feat t, TABLE(t.f_attrs) f) b
WHERE A.feature_id = B.fid
ORDER BY a.value DESC, a.feature_id ASC, coeff DESC, attr ASC, val ASC;
Returns the value of a given feature. If you omit the feature_id argument, then the function returns the highest feature value.
You can use this form in conjunction with the FEATURE_ID function to obtain the largest feature/value combo.
SELECT *
FROM (
SELECT cust_id, feature_value(nmf_sh_sample,3
using *) match_qual
FROM nmf_sh_sample_apply_prepared
ORDER BY match_quality DESC)
WHERE ROWNUM < 11;
Returns the best prediction for the model. The datatype returned depends on the target value type used during the build of the model. For regression models, this function returns the expected value.
SELECT cust_gender, COUNT(*) AS cnt, ROUND(AVG(age)) AS avg_age
FROM mining_data_apply_v
WHERE prediction(DT_SH_Clas_sample COST MODEL
using cust_marital_status, education, household_size) = 1
GROUP BY cust_gender
ORDER BY cust_gender;
For use only with generalized linear models. It returns an object with two NUMBER fields LOWER and UPPER
PREDICTION_BOUNDS(<schema.model>, <confidence_level>, <class_value>
USING <* | schema.table.* | expression AS alias>
SELECT cust_marital_status, COUNT(cust_id) AS CUST_COUNT
FROM (
SELECT cust_id, cust_marital_status
FROM mining_data_apply_v
WHERE prediction_bounds(glmr_sh_regr_sample,0.98
using *).LOWER > 24
AND prediction_boundsS(glmr_sh_regr_sample,0.98
using *).UPPER < 46)
GROUP BY cust_marital_status;
WITH cust_italy AS (
SELECT cust_id
FROM mining_data_apply_v
WHERE country_name = 'Italy'
ORDER BY prediction_cost(DT_SH_Clas_sample,1
COST MODEL using *) ASC,1)
SELECT cust_id
FROM cust_italy
WHERE rownum < 11;
SELECT cust_id, education,
prediction_details(DT_SH_Clas_sample
using *) treenode
FROM mining_data_apply_v
WHERE occupation = 'TechSup' AND age < 25
ORDER BY cust_id;
SELECT cust_id
FROM (
SELECT cust_id
FROM mining_data_apply_v
WHERE country_name = 'Italy'
ORDER BY predicton_probability(DT_SH_Clas_sample,
1 using *)
DESC, cust_id)
WHERE rownum < 11;
SELECT T.cust_id, S.prediction, S.probability, S.cost
FROM (
SELECT cust_id,
prediction_set(dt_sh_clas_sample COST MODEL using *) pset
FROM mining_data_apply_v
WHERE cust_id < 100011) T,
TABLE(T.pset) S
ORDER BY cust_id, S.prediction;