日韩黑丝制服一区视频播放|日韩欧美人妻丝袜视频在线观看|九九影院一级蜜桃|亚洲中文在线导航|青草草视频在线观看|婷婷五月色伊人网站|日本一区二区在线|国产AV一二三四区毛片|正在播放久草视频|亚洲色图精品一区

分享

SAP HANA中PAL算法使用入門 | SAP Blogs

 飛鳥的天空zwp 2019-04-15

1 應用場合

         SAP HANA作為一款內存數(shù)據(jù)庫產品, 使得數(shù)據(jù)常駐內存, 物理磁盤的存儲作為數(shù)據(jù)備份與日志記錄, 以防斷電內存中數(shù)據(jù)丟失. 這種構架大大的縮短了數(shù)據(jù)存取的時間, 使得SAP HANA高速”.

         在傳統(tǒng)數(shù)據(jù)模型中,數(shù)據(jù)庫只是作為存取數(shù)據(jù)一個工具,對于類似下圖所示的應用, 客戶端從Database獲取數(shù)據(jù),然后計算,最后再把結果寫回Database, 如果數(shù)據(jù)量過大, 數(shù)據(jù)傳輸?shù)拈_銷過大,并且如果客戶端的內存不夠, 計算分析的過程也將非常緩慢./wp-content/uploads/2014/03/flow_406433.png

         借助于大內存的優(yōu)勢, SAP HANA的解決方案是把數(shù)據(jù)敏感的相關計算邏輯都移動到SAP HANA, 從而省去了數(shù)據(jù)傳輸?shù)拈_銷. 典型的框架如下:

/wp-content/uploads/2014/03/hanasys_406434.png

對于一些簡單的計算分析, 可以利用SQLScript腳本完成, SQLScript提供了基本的變量定義語句,流程控制語句. 但是對于復雜的分析與計算, 單純使用SQLScript可能不是特別方便, 比如對1T的數(shù)據(jù)表作聚類分析. 為此 ,SAP HANA提供  AFL (Application Function Library) ,  把一些常見的分析任務用C++實現(xiàn),作為庫函數(shù)的形式, 提供給SQLScript調用,極大地豐富了SQLScript的功能.

2 PAL簡介

         PAL (Predictive Analysis Library)SAP HANAAFL (Application Function Library)框架下的一個函數(shù)庫, 主要用于數(shù)據(jù)預測與分析, 提供了很多數(shù)據(jù)挖掘算法的實現(xiàn). 按應用的場景進行分類,PAL函數(shù)主要包括以下類別:

聚類

分類

關聯(lián)分析

時間序列分析

數(shù)據(jù)預處理

統(tǒng)計分析

社會網(wǎng)絡分析

具體到每個類別下面, 有常見算法的實現(xiàn), 比如聚類下面的K-means算法.

值得一提的是, AFL是一個單獨的包, 需要另外進行安裝. 另外AFL 的版本號需要與SAP HANA的版本號匹配.

3 基本使用步驟

     PAL函數(shù)的使用包括三個步驟:

(1)    生成AFL_WRAPPER_GENERATOR AFL_WRAPPER_ERASER存儲過程.

       對于具體的某個算法, 在使用之前,利用AFL_WRAPPER_GENERATOR生成該算法的一個包裝器,然后才能進行調用, 可以理解為生成該算法的一個實例, AFL_WRAPPER_ERASER作用是刪除這個算法的實例.

      這兩個存儲過程的生成很簡單. AFL插件的目錄下有afl_wrapper_generator.sql, afl_wrapper_eraser.sql兩個腳文文件, 把它們的內容拷貝到SAP HANA StudioSQL Console,然后執(zhí)行以及即可.然后為用戶分配執(zhí)行權限:

   GRANT EXECUTE ON system.afl_wrapper_generator to USER1;

   GRANT EXECUTE ON system.afl_wrapper_eraser to USER1;

這個步驟只需要首次利用使用AFL時執(zhí)行一次,對于后續(xù)其他的算法使用,就不需要執(zhí)行了.

(2)    生成算法的實例

          CALL SYSTEM.AFL_WRAPPER_GENERATOR(

              ‘<procedure_name>’,        

              ‘<area_name>’,

              ‘<function_name>’, <signature_table>);

Procedure_name:自定義的名稱;

Area_name:通常為AFLPAL;

Function_name:算法名稱;

Signature_table:指定一個用戶表,作為方法簽名的信息;

(3)  調用算法實例

CALL <procedure_name>(

<data_input_table> {,…},   

<parameter_table>,

<output_table> {,…}) with overview;

Procedure_name:算法實例名;

Data_input_table:輸入數(shù)據(jù)表;

Parameter_table:參數(shù)表;

Output_table:輸出表;

4 示例Demo

  下面以DBSCAN聚類算法來說明說PAL算法的調用過程.(因測試機AFL_WRAPPER_GENERATOR存儲過程已經存在,故第一步不再執(zhí)行,另假定SchemaTEST)

  DBSCAN聚類算法是一個基于密度的聚類算法,該算法有很好的降噪能力,有關該算法的更多介紹,請參考http://en./wiki/DBSCAN



/*創(chuàng)建數(shù)據(jù)表類型 ,id,屬性1,屬性2 */
CREATE TYPE PAL_DBSCAN_DATA_T AS TABLE ( ID integer, ATTRIB1 double, ATTRIB2
double);
/*創(chuàng)建算法控制參數(shù)類型*/
CREATE TYPE PAL_CONTROL_T AS TABLE( NAME varchar(50), INTARGS integer, DOUBLEARGS
double, STRINGARGS varchar(100));
/*結構表類型,ID,類簇編號*/
CREATE TYPE PAL_DBSCAN_RESULTS_T AS TABLE( ID integer, RESULT integer);
/*創(chuàng)建方法參數(shù)表*/
CREATE COLUMN TABLE PAL_DBSCAN_PDATA_TBL( "ID" INT, "TYPENAME" VARCHAR(100), "DIRECTION" VARCHAR(100) );
/*向參數(shù)表中插入相關參數(shù)數(shù)據(jù)*/
INSERT INTO PAL_DBSCAN_PDATA_TBL VALUES (1, 'TEST.PAL_DBSCAN_DATA_T', 'in');
INSERT INTO PAL_DBSCAN_PDATA_TBL VALUES (2, 'TEST.PAL_CONTROL_T', 'in');
INSERT INTO PAL_DBSCAN_PDATA_TBL VALUES (3, 'TEST.PAL_DBSCAN_RESULTS_T', 'out');
/*分配權限*/
GRANT SELECT ON DM_PAL.PAL_DBSCAN_PDATA_TBL to SYSTEM;
/*生成PAL_DBSCAN9算法實例*/
call SYSTEM.afl_wrapper_eraser('PAL_DBSCAN9');
call SYSTEM.afl_wrapper_generator('PAL_DBSCAN9', 'AFLPAL', 'DBSCAN', PAL_DBSCAN_PDATA_TBL);
/* 創(chuàng)建數(shù)據(jù)表*/
CREATE COLUMN TABLE PAL_DBSCAN_DATA_TBL ( ID integer, ATTRIB1 double, ATTRIB2 double);
/*插入測試數(shù)據(jù)*/
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(1,0.10,0.10);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(2,0.11,0.10);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(3,0.10,0.11);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(4,0.11,0.11);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(5,0.12,0.11);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(6,0.11,0.12);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(7,0.12,0.12);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(8,0.12,0.13);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(9,0.13,0.12);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(10,0.13,0.13);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(11,0.13,0.14);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(12,0.14,0.13);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(13,10.10,10.10);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(14,10.11,10.10);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(15,10.10,10.11);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(16,10.11,10.11);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(17,10.11,10.12);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(18,10.12,10.11);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(19,10.12,10.12);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(20,10.12,10.13);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(21,10.13,10.12);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(22,10.13,10.13);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(23,10.13,10.14);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(24,10.14,10.13);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(25,4.10,4.10);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(26,7.11,7.10);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(27,-3.10,-3.11);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(28,16.11,16.11);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(29,20.11,20.12);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(30,15.12,15.11);
/*用臨時表來存儲算法的輸入?yún)?shù)*/
CREATE LOCAL TEMPORARY COLUMN TABLE #PAL_CONTROL_TBL( NAME varchar(50), INTARGS
integer, DOUBLEARGS double, STRINGARGS varchar(100));
/*指定DBSCAN算法的輸入?yún)?shù)*/
/*線程數(shù)18*/
INSERT INTO #PAL_CONTROL_TBL VALUES('THREAD_NUMBER',18,null,null);
/*自動確定MINPTS與RADIUS參數(shù)*/
INSERT INTO #PAL_CONTROL_TBL VALUES('AUTO_PARAM',null,null,'true');
/*點與點之間的距離采用Manhattan距離*/
INSERT INTO #PAL_CONTROL_TBL VALUES('DISTANCE_METHOD',1,null,null);
/*結果表*/
CREATE COLUMN TABLE PAL_DBSCAN_RESULTS_TBL( ID integer, RESULT integer);
/*調用DBSCAN算法*/
CALL _SYS_AFL.PAL_DBSCAN9(PAL_DBSCAN_DATA_TBL, "#PAL_CONTROL_TBL",
PAL_DBSCAN_RESULTS_TBL) with overview;
/*查看結果*/
SELECT * FROM PAL_DBSCAN_RESULTS_TBL; 
/*創(chuàng)建數(shù)據(jù)表類型 ,id,屬性1,屬性2 */
CREATE TYPE PAL_DBSCAN_DATA_T AS TABLE ( ID integer, ATTRIB1 double, ATTRIB2
double);
/*創(chuàng)建算法控制參數(shù)類型*/
CREATE TYPE PAL_CONTROL_T AS TABLE( NAME varchar(50), INTARGS integer, DOUBLEARGS
double, STRINGARGS varchar(100));
/*結構表類型,ID,類簇編號*/
CREATE TYPE PAL_DBSCAN_RESULTS_T AS TABLE( ID integer, RESULT integer);
/*創(chuàng)建方法參數(shù)表*/
CREATE COLUMN TABLE PAL_DBSCAN_PDATA_TBL( "ID" INT, "TYPENAME" VARCHAR(100), "DIRECTION" VARCHAR(100) );
/*向參數(shù)表中插入相關參數(shù)數(shù)據(jù)*/
INSERT INTO PAL_DBSCAN_PDATA_TBL VALUES (1, 'TEST.PAL_DBSCAN_DATA_T', 'in');
INSERT INTO PAL_DBSCAN_PDATA_TBL VALUES (2, 'TEST.PAL_CONTROL_T', 'in');
INSERT INTO PAL_DBSCAN_PDATA_TBL VALUES (3, 'TEST.PAL_DBSCAN_RESULTS_T', 'out');
/*分配權限*/
GRANT SELECT ON DM_PAL.PAL_DBSCAN_PDATA_TBL to SYSTEM;
/*生成PAL_DBSCAN9算法實例*/
call SYSTEM.afl_wrapper_eraser('PAL_DBSCAN9');
call SYSTEM.afl_wrapper_generator('PAL_DBSCAN9', 'AFLPAL', 'DBSCAN', PAL_DBSCAN_PDATA_TBL);
/* 創(chuàng)建數(shù)據(jù)表*/
CREATE COLUMN TABLE PAL_DBSCAN_DATA_TBL ( ID integer, ATTRIB1 double, ATTRIB2 double);
/*插入測試數(shù)據(jù)*/
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(1,0.10,0.10);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(2,0.11,0.10);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(3,0.10,0.11);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(4,0.11,0.11);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(5,0.12,0.11);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(6,0.11,0.12);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(7,0.12,0.12);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(8,0.12,0.13);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(9,0.13,0.12);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(10,0.13,0.13);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(11,0.13,0.14);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(12,0.14,0.13);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(13,10.10,10.10);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(14,10.11,10.10);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(15,10.10,10.11);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(16,10.11,10.11);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(17,10.11,10.12);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(18,10.12,10.11);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(19,10.12,10.12);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(20,10.12,10.13);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(21,10.13,10.12);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(22,10.13,10.13);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(23,10.13,10.14);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(24,10.14,10.13);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(25,4.10,4.10);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(26,7.11,7.10);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(27,-3.10,-3.11);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(28,16.11,16.11);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(29,20.11,20.12);
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(30,15.12,15.11);
/*用臨時表來存儲算法的輸入?yún)?shù)*/
CREATE LOCAL TEMPORARY COLUMN TABLE #PAL_CONTROL_TBL( NAME varchar(50), INTARGS
integer, DOUBLEARGS double, STRINGARGS varchar(100));
/*指定DBSCAN算法的輸入?yún)?shù)*/
/*線程數(shù)18*/
INSERT INTO #PAL_CONTROL_TBL VALUES('THREAD_NUMBER',18,null,null);
/*自動確定MINPTS與RADIUS參數(shù)*/
INSERT INTO #PAL_CONTROL_TBL VALUES('AUTO_PARAM',null,null,'true');
/*點與點之間的距離采用Manhattan距離*/
INSERT INTO #PAL_CONTROL_TBL VALUES('DISTANCE_METHOD',1,null,null);
/*結果表*/
CREATE COLUMN TABLE PAL_DBSCAN_RESULTS_TBL( ID integer, RESULT integer);
/*調用DBSCAN算法*/
CALL _SYS_AFL.PAL_DBSCAN9(PAL_DBSCAN_DATA_TBL, "#PAL_CONTROL_TBL",
PAL_DBSCAN_RESULTS_TBL) with overview;
/*查看結果*/
SELECT * FROM PAL_DBSCAN_RESULTS_TBL; 








DBSCANresult.png

如果執(zhí)行無誤,將看到如上圖所示的結果,記錄被聚成三類,0,1,-1各代表一個類簇.

5結束語

本文介紹了SAP HANAPAL算法的使用,以DBSCAN聚類算法作為具體的實現(xiàn)例子.其他的相關算法使用流程上與上述流程都相似,主要的工作在于準備數(shù)據(jù)表,根據(jù)算法的接口文檔定義相關的參數(shù),并將參數(shù)存入?yún)?shù)表.最后調用算法實例即可.

從效率上講,在SAP HANA中使用PAL,一方面利用了大內存的優(yōu)勢,另一方面利用了C++作為編譯型語言本身的高效性,如果使用得當,對于大數(shù)據(jù)的相關分析任務,在速度上將會有一個很大的飛躍!

[: 本文的測試案例所使用的SAP HANA版本為SAP HANA SPS06]

想獲取更多SAP HANA學習資料或有任何疑問,請關注新浪微博@HANAGeek!我們歡迎你的加入!

    本站是提供個人知識管理的網(wǎng)絡存儲空間,所有內容均由用戶發(fā)布,不代表本站觀點。請注意甄別內容中的聯(lián)系方式、誘導購買等信息,謹防詐騙。如發(fā)現(xiàn)有害或侵權內容,請點擊一鍵舉報。
    轉藏 分享 獻花(0

    0條評論

    發(fā)表

    請遵守用戶 評論公約

    類似文章