Rittman Mead Consulting

Subscribe to Rittman Mead Consulting feed Rittman Mead Consulting
Rittman Mead consults, trains, and innovates within the world of Oracle Business Intelligence, data integration, and analytics.
Updated: 2 weeks 1 day ago

Real World OBIEE: Demystification of Variables Pt. 2

Tue, 2017-03-21 09:00

In part one of this blog series, I went over using bins and presentation variables to dynamically create groups and switch between them in a report and on a dashboard. In part two, I am going to talk about making reports dynamic for periods of time using repository, system and presentation variables. Before I dive into an example, there are a couple of things I would like to cover first.


The sysdate function returns the current datetime set by the system where the database resides. Sysdate is a really useful function for creating repository variables for use with date dimensions. If I go into SQL Developer, I can write a query to return the current sysdate:

select sysdate from dual;


The current_date functions returns the current datetime set by the system where the bi server resides. This datetime may differ from sysdate depending on the geographical location of the database vs. the system that OBIEE resides on. I can write a query using sql developer to return the datetime using the current_date function:

select current_date from dual;

Since my database and OBIEE instance are on the same system, sysdate and current_date are the same.


When using sysdate or current_date to create repository variables for dates (which I am going to show in an upcoming example), you have to keep something in mind. While the date may match, the time may not. To show an example of this, I am going to join one of my date columns with sysdate.

select sysdate, dim_date_key from dual, 
where sysdate = dim_date_key;

If I run this query, I don't get an error but I get no results.

Why? To answer this, I need to write a query to inspect my date column.

select dim_date_key from gcbc_pef.dim_date;

As you can see by the results of my query, the DIM_DATE_KEY column does have the same format as sysdate but all the times are set to 00:00:00 (or midnight). To further demonstrate the difference between my date column and sysdate, I am going to write a new query and use the TRUNC (or TRUNCATE) function.

select sysdate, dim_date_key from dual, 
where trunc(sysdate) = dim_date_key;

As you can see, the query runs successfully but notice how sysdate and DIM_DATE_KEY still have different times. How is the join possible? Because I used the truncate function in the where clause in my query for sysdate. Without going into too much detail, using truncate on a date function without any formatting (which I will cover later) will set (or truncate) the datetime to the start (or midnight) of the current day. For example, if I run another query that just selects the truncated sysdate from dual, I get this result.

select trunc(sysdate) from dual;

Now, lets dive into an example.

Note: For all of the examples in this blog series I am using OBIEE

The Scenario

In this example, I have been asked to create a report that is going to reside on a products dashboard. It needs to have the same product grouping as the report I used part one of this series, needs to contain Gross Rev $, Net Rev $ and # of Orders and have a prompt that can select between the first and current day of the month and every day in-between. The person who requested the report wants the prompt to change dynamically with each month and does not want users to be able to select future dates.

There are two foreseeable challenges with this report. The first, and probably the most obvious, is how to make the date prompt for the current month and have it change dynamically with each month. The second is how to pass the dates into the report.

There is one more challenge that I will have to tackle. There is a gap in the data loads for # of Orders. Data does not update until the 2nd or 3rd of each new month. This wouldn't be a big deal except the person who requested the report wants a summary of the previous months # of Orders to be shown until the data is updated for the current month.

Fortunately, by using Repository, System and Presentation Variables, I can accomplish all of the requirements of this report.

The Example

For this example, I am going to start by creating Repository Variables to use with my date column in order to make the dates dynamic. There are other ways to make dates dynamic using functions within Answers but they are a little bit trickier to use and are less common. I am going to go over some of those functions in part three of this blog series.

Repository Variables are created using the Admin Tool. By launching the Admin Tool and opening my RPD in online mode (can also be created offline), I can go to Manage > Variables to start creating my first Repository Variable.

From the Variable Manager window, I can create a Repository Variable by selecting Action > New > Repository > Variable.

I am going to start by creating the Repository Variable for the current date. Since this variable will be dynamic, I need to make sure I select the option 'Dynamic' and I am going to give it the name USCurDate.

Now I need to create a new init block. I can do this by clicking New...

Once in the Repository Variable Initialization Block screen, I need to give the init block a name, set the schedule for when variable or variables will be refreshed then click Edit Data Source to define the connection pool the init block will use as well as the initialization string (query) the init block will use to populate the Repository Variable.

In the data source window, I am going to set my connection pool to one I have created just for my init blocks and then type in the following into the initialization string window:

select TRUNC(sysdate) from dual;

If I click Test, the query will execute and will return a result.

Notice how the result is the same as the query I ran using SQL Developer earlier.

Now I need to create a Repository Variable for the first day of every month. I am going to use the same method as before and name it USMoBeginDate. The query I am going to use is slightly different from the previous query. I still need to use the TRUNC function but I also need to apply formatting so that it truncates to the start of the month. I am going to enter the following into the initialization string window:

select TRUNC(sysdate, 'MM') from dual;

Some other useful queries I can use are:

First Day of the Current Year

select TRUNC(sysdate, 'YY') from dual;

Last Day of the Previous Year

select TRUNC(sysdate, 'YY') -1 from dual;

Previous Year Date

select TRUNC(ADD_MONTHS(sysdate, -12)) from dual;

Now I need to create a Repository Variable for the previous month to use with my # of Orders measure column. Upon inspection, I discover that the column I need to use is called Calendar Year Month and is a VARCHAR or character type. If I go into Answers and pull in the Calendar Year Month column, I can see the format is 'YYYYMM'

To create the Repository Variable, I am going to use the same method as with the current date and first day of the current month Repository Variables and issue a new query. Because the Calendar Year Month column is a VARCHAR, I need to use the to_char function to change sysdate from a date type to a character type, use some formatting syntax and use some basic arithmetic. The query is as follows:

select to_char(to_number(to_char(sysdate, 'YYYY')) * 100 + to_number(to_char(sysdate, 'MM') -1)) from dual;

To break down each part of this query, lets start with the year. In order to use the 'YYYY' format I must first cast sysdate to a character (to_char(sysdate, 'YYYY')). Then I need to cast that result back to and int so that I can multiply by 100. This will give me the result 201500.00. The reason for this is when I add the month number to my yearx100, there will always be a leading 0 for month numbers 1-9. To get the previous month number, I have to first cast sysdate to a character and use the formatting 'MM'. I then have to cast it back to an int and subtract 1 to get the previous month number (to_number(to_char(sysdate, 'MM') -1) then cast the entire statment back to a character type so that it matches the type for the Calendar Year Month column. When I run the query, I get this result.

Now that I have my three repository variables (USCurDate, USMoBeginDate and Prev_Month) I can start to create the report.

Im going to fast forward a little bit to the part of the report creation process where I will use my Repository Variables I created using the Admin Tool. Since I am using virtually the same report as part one of this blog series, please refer back for how to create custom groups using bins and presentation variables and custom value prompts.

Because of the delay in the data load for the # of Orders at the beginning of the month, I can not use a global report filter. Instead, I am going to have to use something called a Filter Expression within each measure column formula.

About Filter Expressions

Unlike global report filters, column formula level filter expressions are used when you need to specify a particular constraint within the column formula itself. Because the filter is at the column formula level, it is independent of any subsequent column filters.

Note: When using a column formula filter for a measure, you can not add a global filter of the same data subject on top of it. For example, if using a column level filter for a particular Year and Month, I can not add a global filter for a particular year. The two filters contradict each other and the result will be null.

To add a filter in the column formula, go to Edit formula, make sure the column syntax is highlighted and click Filter.

From here the Insert Filter window will pop up and I can select the attribute column to filter the measure by. Here, I want to use the column Day Date to filter Gross Rev $ by the day.

I can add a column by double clicking it in the the Subject Areas pane. When a column is added, I will be prompted with a New Filter window and from here, everything is exactly the same process as adding a global report filter.

Here I need to define the operator as is between since we are dealing with date ranges. I could call my Repository Variables for current_date and first day of the month here but, because the request is for a prompt to select between date ranges, I am going to have to call Presentation Variables and use the prompt to populate the actual values.

Note: If you are unsure about the functionality of Presentation Variables, see part one of this blog series

To add Presentation Variables to the filter expression, click Add More Options and select Presentation Variable from the dropdown.

When a Presentation Variable is added to the filter, two new text boxes appear. The Variable Expr box is where you define the variable to be used and the (default) box is used to add a default value. The default value is optional but, when defining a Presentation Variable within a filter, you have to specify a default value in order to get any results. The reason for this is because, when the report is run, the query issued will use the Presentation Variable placeholder that is defined unless a default value is specified. In other words, the default value will always be used unless the Presentation Variable is populated with a value or a list of values.

Because I want the users to be able to specify a date range, I need to define two Presentation Variables: one for the start date and one for the end date. I can add another place for a Presentation Variable by simply clicking Add More Options again and selecting Presentation Variable.

Now I need to add both my start and end date Presentation Variables in the Variable Expr boxes. I’m going to call my start date presentation variable pv_start_dt and my end date presentation variable pv_end_dt. I am also going to specify a default date range from the beginning of the current month (10/01/2015) to yesterday's date (10/15/2015).

If I click OK, I will be taken back to the Insert Filter screen where I can see the filter expression previously defined.

Clicking OK again will return me to Edit Column Formula which shows the column formula with the filter expression defined in the previous steps.

Now I have to do the exact same thing for the Net Rev $ column. Since the filter expression is identical, I can simply copy and paste the column formula for Gross Rev $ and replace the column name in the expression.

Now I need to take care of the # of Orders column. This column is tricky because of the gap between the 1st and the 2nd or 3rd of every month. I could use a filter expression that defaults to the previous month by using the previous month repository variable I created in a previous step, but this alone wouldn’t switch over when the data became available.

So how can we fulfill the requirement of the report if we don’t know the exact date in which the data will be available? This can be accomplished by using a CASE statement as shown previously in part one of this series. We can break the Case statement down into two parts or two conditions:

1. When the day for the current month is less than or equal to 2 OR if # of Orders is null, then filter # of Orders by Calendar Year Month using the value of the Prev_Month Repository Variable.

2. When condition one is not true, then filter # of Orders by Day Date between the values of the pv_start_date and the pv_end_date Presentation Variables

Putting both conditions together and using the correct syntax for Column Formula results in the following formula:

Note that I am using CURRENT_DATE in my column formula. In this case, I am extracting the day number from the current date by using the extract day function (DAY(CURRENT_DATE)). I am going to talk about this in further detail when I talk about using built in functions in Answers to make reports dynamic in part 3 of this series.

Now I need to create my dashboard prompt. I am going to start by clicking on New > Dashboard Prompt.

I need to create two prompts: One for the start date and one for the end date. Because I am using presentation variables as placeholders for the date between values, I have to use a Variable Prompt instead of a Column Prompt. Variable Prompts allow us to define a presentation variable and then define a list of values for the users to select from.

To create a Variable Prompt for Start Date, I can click on the new prompt icon and select Variable Prompt.

There a few things I need to do in order to make this prompt function for the report. First, I have to define the same presentation variable name (pv_start_dt) that I used in the filter expressions for the Gross Rev $, Net Rev $ and # of Orders columns.

Because this is not a column prompt, I have to manually specify the values I want the user to be able to select from. Rather than typing in each value, I can use the SQL Results option from the Choice List Values dropdown and use a SQL statement to select the exact values that I want.

This may seem daunting at first but there is a very straightforward way to accomplish this. Rather than manually writing out a SQL query, we can make use of the Advanced Tab within a new report.

I’m going to start by clicking New > Analysis and selecting the column that I want values for: Day Date.

I need to add a filter to Day Date so that it returns only the values I want to user to select from.

Now I need to select the operator to be is between and add two Repository Variables that I have set up: one for the first date of the current month and one for the current date of the current month.

If I go to results, I can see the data returned with the filter I have specified.

As you can see, the Day Date column only contains the values from the first of the month to the current date (October, 16th 2015 in this example)

Now for the good stuff. I can navigate to the Advanced Tab and copy the SQL statement used to generate these values and paste them into the SQL Results text box in my prompt.

You will notice that within the SQL Statement generated by OBI,
there are numbers and s_# between the SELECT and Day Date column, after the Day Date column and there is also an order by clause that uses a number “2”. Without going into too much detail, this what OBI uses to make the query more efficient when retrieving results from the database. In order to allow the values to populate the prompt, these have to be removed in OBIEE 12c and the “ORDER BY” clause has to be rewritten in order to make it work.


   0 s_0,
   "Sales - Fact Sales"."Periods"."Day Date" s_1
FROM "Sales - Fact Sales"
("Periods"."Day Date" BETWEEN VALUEOF("USMoBeginDate") AND  VALUEOF("USCurDate"))

Changed to this

   "Sales - Fact Sales"."Periods"."Day Date"
FROM "Sales - Fact Sales"
("Periods"."Day Date" BETWEEN  VALUEOF("USMoBeginDate") AND  VALUEOF("USCurDate"))
ORDER BY "Periods"."Day Date" ASC

This can be a bit confusing if you are not very familiar with SQL but just remember:

When populating a prompt using an SQL statement in OBIEE 12c, take out any number and anything that begins with “s” between the SELECT and first column and anything that begins with “s” after any subsequent columns and make sure the “ORDER BY” clause contains the actual column name of the column you want to order by.

Note: If you do not require any values to be in order, you can omit the “ORDER BY” clause all together.

If I expand Options in the Edit Prompt window, I can add a default selection or a default value that the prompt will start with. I can use the USMoBeginDate here as well so that the prompt always starts with the first date of every month as the start date.

Note: You will notice that under Options in the Edit Prompt window there is a Variable Data Type option with a dropdown selector. This can be used if the data type needs to be specified to something other than the default which is ‘text’ or character type. If you are getting an error when running the report that says “Selected value does not match datatype. Expected [this value] but got [this value]” you need to change the Variable Data Type to the datatype of the column you are prompting on. In this example, we are prompting a date datatype so therefore it needs to be set to date.

If I click OK, I can check the values in the display window by clicking the dropdown for the Start Date prompt I just created.

The blue checkmark indicates the value that is selected which, because the first date of every month was set by using the USMoBeginDate Repository Variable as the default value, defaults to the first date of the current month (October, 1st 2015) in this example.

Now I need to create another Variable Prompt for the End Date. The SQL statement used for Start Date can be reused for the values as we want the exact same values to be available for selection. I am going to specify the presentation variable to be named pvenddt, and the default value to be the USCurDate Repository Variable so that the End Date prompt always defaults to the current date.

Now all that’s left to do is put the prompt and report on the Dashboard. Here is the result.

So that concludes part 2 of Demystification of Variables. Please feel free to ask questions or leave me a comment! In part 3, I am going to talk about using built in front end functions and presentation variables to make reports dynamic for any series of time. Until next time.

Categories: BI & Warehousing

A Performance Analytics Application Case Study: Challenges and Successes

Mon, 2017-03-20 10:00

The Performance Analytics application is a collection of open source technologies that aids users in: identifying performance bottlenecks, identifying causes for slow report execution, discovering areas for performance optimization, and gathering meaningful insights into the health of an OBIEE environment.

This post focuses on lessons learned after a successful Performance Analytics application installation, where within one day of being operational it enabled us to identify and isolate a long-standing memory issue. Here's how.

OBIEE Performance Analytics Dashboards


Rittman Mead recently undertook an engagement with the remit to:

  • Carry out a health check on the current state of an OBIEE platform, architecture, and development process
  • Install the Rittman Mead Performance Analytics application, enabling rapid and accurate diagnostics of OBIEE issues

The client was on OBIEE 11g, having previously upgraded from 10g. OBIEE Production environment was a three-node cluster running the latest version of the 11g release. It served around 150 users daily, of a registered user base of around 1000.

The client had a long-standing issue with memory alerts on the master node of OBIEE cluster, but no evident architectural, hardware capacity, or software configuration issues were found after completing the health check.

Challenges and successes Gather all relevant data

Performance Analytics gathers data from a variety of sources in order to provide a full stack view of the OBIEE environment.

  • Active Session History (ASH) - The Active Session History data is read from the v$ACTIVE_SESSION_HISTORY system database view. Access to this data allows Performance Analytics users to have an understanding of the performance and state of the database at a given point it time as it provides information such as the SQL operation being performed, which application is executing the query, whether the query is actively being performed or is waiting for service, what state of execution the query is in, and many other useful statistics.

  • Operating System Metrics - Unix-based Operating Systems contain several commands used to gather information about the performance or status of the server such as vmstat, iostat, netstat, top and df. Performance Analytics utilizes the output of these commands to display the current status of the OS at a given point in time.

  • Usage Tracking - The Oracle BI Server supports the collection of usage tracking data. When usage tracking is enabled, the Oracle BI Server collects usage tracking data for each query, and it writes statistics to a usage tracking log file or inserts them directly into a database table. Access to this data allows Performance Analytics users to have an understanding of the performance of the BI Server and specific reports in the OBIEE environment at any given point in time.

  • OBIEE metrics - OBIEE has the capability to expose internal performance data through the Dynamic Monitoring Service (DMS). The data exposed contains information such as Connection Pool statistics, JVM statistics, the number of active threads, and much more. Access to this data allows Performance Analytics to record the current status of many of the OBIEE components and intricacies found within the tool.

Performance Analytics was deployed using Docker in a couple of days, requiring absolutely no software installation on any machine other than the monitoring server. All configuration settings are held in one file, and it was sufficient to add connection details of each server to it in order to gather all aforementioned data.

Accurately diagnose issues

By combining operating system metrics (CPU, memory, etc.) with internal OBIEE metrics and those from the database, Performance Analytics gives a "single pane of glass" view on the behaviour of the whole stack. This enables correlations in behaviour to be easily identified, and issues drilled into using the analysis capabilities of the tool.

Within a day of being installed, Performance Analytics enabled the client to accurately diagnose a long-standing issue with memory alerts on OBIEE master node. The cause was traced to the export to Excel of a large dataset by a single user.

Workload Planning

Performance Analytics allows to capture system status and workload over time, so you can see how the system is responding to peak loads in real-time.

With Performance Analytics the client is now able to estimate maximum workload the current architecture can support before starting to see issues and whether it is going to cope with the next years workload.


Performance Analytics just paid for itself.

Performance Analytics collects all relevant data and makes it accessible from one central location, allowing users to investigate performance inquiries promptly and simply. Instead of forcing users to dig through database records or a linux server manually, they can access all of the same data through a set of dashboards engineered to facilitate discovery from the collected data.

If you’d like to find out more about the Performance Analytics service offered by Rittman Mead, please get in touch.

Categories: BI & Warehousing

Property Graph in Oracle 12.2

Fri, 2017-03-10 11:00

The latest release of Oracle (12.2) includes support for Property Graph, previously available only as part of the Big Data Spatial and Graph tool. Unlike the latter, in which data is held in a NoSQL store (Oracle NoSQL, or Apache HBase), it is now possible to use the Oracle Database itself for holding graph definitions and analysing them.

Here we'll see this in action, using the same dataset as I've previously used - the "Panama Papers".

My starting point is the Oracle Developer Day VM, which at under 8GB is a tenth of the size of the beast that is the BigDataLite VM. BDL is great for exploring the vast Big Data ecosystem, both within and external to the Oracle world. However the Developer Day VM serves our needs perfectly here, having been recently updated for the 12.2 release of Oracle. You can also use DB 12.2 in Oracle Cloud, as well as the Docker image.

Prepare Database for Property Graph

The steps below are based on Zhe Wu's blog "Graph Database Says Hello from the Cloud (Part III)", modified slightly for the differing SIDs etc on Developer Day VM.

First, set the Oracle environment by running from a bash prompt

. oraenv

When prompted for SID enter orcl12c:

[oracle@vbgeneric ~]$ . oraenv
ORACLE_SID = [oracle] ? orcl12c  
ORACLE_BASE environment variable is not being set since this  
information is not available for the current user ID oracle.  
You can set ORACLE_BASE manually if it is required.  
Resetting ORACLE_BASE to its previous value or ORACLE_HOME  
The Oracle base has been set to /u01/app/oracle/product/12.2/db_1  
[oracle@vbgeneric ~]$

Now launch SQL*Plus:

sqlplus sys/oracle@localhost:1521/orcl12c as sysdba  

and from the SQL*Plus prompt create a tablespace in which the Property Graph data will be stored:

alter session set container=orcl;

create bigfile tablespace pgts  
datafile '?/dbs/pgts.dat' size 512M reuse autoextend on next 512M maxsize 10G  
segment space management auto;  

Now you need to do a bit of work to update the database to hold larger string sizes, following the following steps.

In SQL*Plus:

ALTER SYSTEM SET max_string_size=extended SCOPE=SPFILE;  
shutdown immediate;  
startup upgrade;  

Then from the bash shell:

cd $ORACLE_HOME/rdbms/admin  
mkdir /u01/utl32k_cdb_pdbs_output  
mkdir /u01/utlrp_cdb_pdbs_output  
$ORACLE_HOME/perl/bin/perl $ORACLE_HOME/rdbms/admin/catcon.pl -u SYS -d $ORACLE_HOME/rdbms/admin -l '/u01/utl32k_cdb_pdbs_output' -b utl32k_cdb_pdbs_output utl32k.sql

When prompted, enter SYS password (oracle)

After a short time you should get output:

catcon.pl: completed successfully  

Now back into SQL*Plus:

sqlplus sys/oracle@localhost:1521/orcl12c as sysdba  

and restart the database instances:

shutdown immediate;  

Run a second script from the bash shell:

$ORACLE_HOME/perl/bin/perl $ORACLE_HOME/rdbms/admin/catcon.pl -u SYS -d $ORACLE_HOME/rdbms/admin -l '/u01/utlrp_cdb_pdbs_output' -b utlrp_cdb_pdbs_output utlrp.sql

Again, enter SYS password (oracle) when prompted. This step then takes a while (c.15 minutes) to run, so be patient. Eventually it should finish and you'll see:

catcon.pl: completed successfully  

Now to validate that the change has worked. Fire up SQL*Plus:

sqlplus sys/oracle@localhost:1521/orcl12c as sysdba  

And check the value for max_string, which should be EXTENDED:

alter session set container=orcl;  
SQL> show parameters max_string;

NAME                                 TYPE        VALUE  
------------------------------------ ----------- ------------------------------
max_string_size                      string      EXTENDED  
Load Property Graph data from Oracle Flat File format

Now we can get going with our Property Graph. We're going to use Gremlin, a groovy-based interpretter, for interacting with PG. As of Oracle 12.2, it ships with the product itself. Launch it from bash:

cd $ORACLE_HOME/md/property_graph/dal/groovy  
sh gremlin-opg-rdbms.sh

Mar 08, 2017 8:52:22 AM java.util.prefs.FileSystemPreferences$1 run  
INFO: Created user preferences directory.  

First off, let's create the Property Graph object in Oracle itself. Under the covers, this will set up the necessary database objects that will store the data.

cfg = GraphConfigBuilder.\  
opg = OraclePropertyGraph.getInstance(cfg);  

You can also do this with the PL/SQL command exec opg_apis.create_pg('panama', 4, 8, 'PGTS');. Either way, the effect is the same; a set of tables created in the owner's schema:

SQL> select table_name from user_tables;  

Now let's load the data. I'm using the Oracle Flat File format here, having converted it from the original CSV format using R. For more details of why and how, see my article here.

From the Gremlin prompt, run:

// opg.clearRepository();     // start from scratch
opgdl.loadData(opg, vfile, efile, 1, 10000, true, null);  

This will take a few minutes. Once it's completed you'll get null response, but can verify the data has successfully loaded using the opg.Count* functions:

opg-oracledb> opgdl.loadData(opg, vfile, efile, 1, 10000, true, null);  
opg-oracledb> opg.countEdges()  
opg-oracledb> opg.countVertices()  

We can inspect the data in Oracle itself too. Here I'm using SQLcl, which is available by default on the Developer Day VM. Using the ...VT$ table we can query the number of distinct properties the nodes (verticies) in the graph:

SQL> select distinct k from panamaVT$;  
Entity incorporation.date  
Entity company.type  
Entity note  
Officer icij.id  
Entity status  
Source ID  
Country Codes  
Entity struck.off.date  
Entity address  
Entity jurisdiction  
Entity jurisdiction.description  
Entity dorm.date

17 rows selected.  

Inspect the edges:

[oracle@vbgeneric ~]$ sql scott/oracle@localhost:1521/orcl

SQL> select p.* from PANAMAGE$ p where rownum<5;

       EID       SVID       DVID EL               K       T V      VN VT     SL VTS  VTE  FE
---------- ---------- ---------- ---------------- ---- ---- ---- ---- ---- ---- ---- ---- ----
         6          6     205862 officer_of
        11         11     228601 officer_of
        30         36     216748 officer_of
        34         39     216487 officer_of


You can also natively execute some of the Property Graph algorithms from PL/SQL itself. Here is how to run the PageRank algorithm, which can be used to identify the most significant nodes in a graph, assigning them each a score (the "page rank" value):

set serveroutput on  
    wt_pr  varchar2(2000); -- name of the table to hold PR value of the current iteration
    wt_npr varchar2(2000); -- name of the table to hold PR value for the next iteration
    wt3    varchar2(2000);
    wt4    varchar2(2000);
    wt5    varchar2(2000);
    n_vertices number;
    wt_pr := 'panamaPR';
    opg_apis.pr_prep('panamaGE$', wt_pr, wt_npr, wt3, wt4, null);
    dbms_output.put_line('Working table names  ' || wt_pr
       || ', wt_npr ' || wt_npr || ', wt3 ' || wt3 || ', wt4 '|| wt4);
    opg_apis.pr('panamaGE$', 0.85, 10, 0.01, 4, wt_pr, wt_npr, wt3, wt4, 'SYSAUX', null, n_vertices)

When run this creates a new table with the PageRank score for each vertex in the graph, which can then be queried as any other table:

SQL> select * from panamaPR  
  2  order by PR desc
  3* fetch first 5 rows only;
      NODE         PR          C
---------- ---------- ----------
    236724 8851.73652          0
    288469 904.227685          0
    264051 667.422717          0
    285729 562.561604          0
    237076 499.739316          0

On its own, this is not so much use; but joined to the vertices table, we can now find out, within our graph, the top ranked vertices:

SQL> select pr.pr, v.k,v.V from panamaPR pr inner join PANAMAVT$ V on pr.NODE = v.vid where v.K = 'Name' order by PR desc fetch first 5 rows only;  
        PR K          V
---------- ---------- ---------------
8851.73652 Name       Portcullis TrustNet Chambers P.O. Box 3444 Road Town- Tortola British Virgin Isl  
904.227685 Name       Unitrust Corporate Services Ltd. John Humphries House- Room 304 4-10 Stockwell Stre  
667.422717 Name       Company Kit Limited Unit A- 6/F Shun On Comm Bldg. 112-114 Des Voeux Road C.- Hong  
562.561604 Name       Sealight Incorporations Limited Room 1201- Connaught Commercial Building 185 Wanc  
499.739316 Name       David Chong & Co. Office B1- 7/F. Loyong Court 212-220 Lockhart Road Wanchai Hong K


Since our vertices in this graph have properties, including "Type", we can also analyse it by that - the following shows the top ranked vertices that are Officers:

SQL> select V.vid, pr.pr from panamaPR pr inner join PANAMAVT$ V on pr.NODE = v.vid where v.K = 'Type' and v.V = 'Officer' order by PR desc fetch first 5 rows only;  
       VID         PR
---------- ----------
  12171184 1.99938104
  12030645 1.56722346
  12169701 1.55754873
  12143648 1.46977361
  12220783 1.39846834

which we can then put in a subquery to show the details for these nodes:

with OfficerPR as  
        (select V.vid, pr.pr
          from panamaPR pr
               inner join PANAMAVT$ V
               on pr.NODE = v.vid
         where v.K = 'Type' and v.V = 'Officer'
      order by PR desc
      fetch first 5 rows only)
select pr2.pr,v2.k,v2.v  
from OfficerPR pr2  
     inner join panamaVT$ v2
     on pr2.vid = v2.vid
where v2.k in ('Name','Countries');  
        PR K          V
---------- ---------- -----------------------
1.99938104 Countries  Guernsey  
1.99938104 Name       Cannon Asset Management Limited re G006  
1.56722346 Countries  Gibraltar  
1.55754873 Countries  Guernsey  
1.55754873 Name       Cannon Asset Management Limited re J006  
1.46977361 Countries  Portugal  
1.39846834 Countries  Cyprus  

10 rows selected.  

But here we get into the limitations of SQL - already this is starting to look like a bit of a complex query to maintain. This is where PGQL comes in, as it enables to express the above request much more eloquently. The key thing with PGQL is that it understands the concept of a 'node', which removes the need for the convoluted sub-select that I had to do above to first identify the top-ranked nodes that had a given property (Type = Officer), and then for those identified nodes show information about them (Name and Countries). The above SQL could be expressed in PGQL simply as:

SELECT n.pr, n.name, n.countries  
WHERE (n WITH Type =~ 'Officer')  
ORDER BY n.pr limit 5  

At the moment Property Graph in the Oracle DB doesn't support PGQL - but I'd expect to see it in the future.

Jupyter Notebooks

As well as working with the Property Graph in SQL and Gremlin, we can use the Python API. This is shipped with Oracle 12.2. I'd strongly recommend using it through a Notebook, and this provides an excellent environment in which to prototype code and explore the results. Here I'll use Jupyter, but Apache Zeppelin is also very good.

First let's install Anaconda Python, which includes Jupyter Notebooks:

wget https://repo.continuum.io/archive/Anaconda2-4.3.0-Linux-x86_64.sh  
bash Anaconda2-4.3.0-Linux-x86_64.sh  

In the install options I use the default path (/home/oracle) as the location, and keep the default (no)

Launch Jupyter, telling it to listen on any NIC (not just localhost). If you installed anaconda in a different path from the default you'll need to amend the /home/oracle/ bit of the path.

/home/oracle/anaconda2/bin/jupyter notebook --ip

If you ran the above command from the terminal window within the VM, you'll get Firefox pop up with the following:

If you're using the VM headless you'll now want to fire up your own web browser and go to http://<ip>:8888 use the token given in the startup log of Jupyter to login.

Either way, you should now have a functioning Jupyter notebook environment.

Now let's install the Property Graph support into the Python & Jupyter environment. First, make sure you've got the right Python set, by confirming with which it's the anaconda version you installed, and when you run python you see Anaconda in the version details:

[oracle@vbgeneric ~]$ export PATH=/home/oracle/anaconda2/bin:$PATH
[oracle@vbgeneric ~]$ which python
[oracle@vbgeneric ~]$ python -V
Python 2.7.13 :: Anaconda 4.3.0 (64-bit)  
[oracle@vbgeneric ~]$

Then run the following

cd $ORACLE_HOME/md/property_graph/pyopg  
touch README  
python ./setup.py install  

without the README being created, the install fails with IOError: [Errno 2] No such file or directory: './README'

You need to be connected to the internet for this as it downloads dependencies as needed. After a few screenfuls of warnings that appear OK to ignore, the installation should be succesful:

creating /u01/userhome/oracle/anaconda2/lib/python2.7/site-packages/JPype1-0.6.2-py2.7-linux-x86_64.egg  
Extracting JPype1-0.6.2-py2.7-linux-x86_64.egg to /u01/userhome/oracle/anaconda2/lib/python2.7/site-packages  
Adding JPype1 0.6.2 to easy-install.pth file

Installed /u01/userhome/oracle/anaconda2/lib/python2.7/site-packages/JPype1-0.6.2-py2.7-linux-x86_64.egg  
Finished processing dependencies for pyopg==1.0  

Now you can use the Python interface to property graph (pyopg) from within Jupyter, as seen below. I've put the notebook on gist.github.com meaning that you can download it from there and run it yourself in Jupyter.

Categories: BI & Warehousing

Accelerating Your ODI Implementation, Rittman Mead Style

Thu, 2017-03-09 10:01

Over the years, at Rittman Mead, we've built up quite a collection of tooling for ODI. We have accelerators, scripts, reports, templates and even entire frameworks at our disposal for when the right use case arises. Many of these tools exploit the power of the excellent ODI SDK to automate tasks that would otherwise be a chore to perform manually. Tasks like, topology creation, model automation, code migration and variable creation.

In this blog post, I'm going to give you a demo of our latest addition, a tool that allows you to programmatically create ODI mappings. ( and a few other tricks )

So you may be thinking isn't that already possible using the ODI SDK ? and you'd be right, it most definitely is. There are many examples out there that show you how it's done, but they all have one thing in common, they create a fairly simple mapping, with, relatively speaking, quite a lot of code and are only useful for creating the said mapping.

And herein lies the obvious question, Why would you create a mapping using the ODI SDK, when it's quicker to use ODI Studio ?

And the obvious answer is...you wouldn't, unless, you were trying to automate the creation of multiple mappings using metadata.

This is a reasonable approach using the raw ODI SDK, the typical use case being the automation of your source to stage mappings. These mappings tend to be simple 1 to 1 mappings, the low hanging fruit of automation if you like. The problem arises though, when you want to automate the creation of a variety of more complex mappings, you run the risk of spending more time writing the automation code, than you would actually save due to the automation itself. The point of diminishing return can creep up pretty quickly.

The principle, however, is sound. Automate as much as possible by leveraging metadata and free up your ODI Developers to tackle the more complex stuff.

All Aboard the Rittman Mead Metadata Train !

What would be really nice is something more succinct, more elegant, something that allows us to create any mapping, with minimal code and fuss.

Something that will allow us to further accelerate...

  • Migrating to ODI from other ETL products
  • Greenfield ODI Projects
  • Day to Day ODI Development work

..all powered by juicy metadata.

These were the design goals for our latest tool. To meet these goals, we created a mini-mapping-language on top of the ODI SDK. This mini-mapping-language abstracts away the SDK's complexities, while, at the same time, retaining its inherent power. We call this mini mapping language OdiDsl ( Oracle Data Integrator Domain Specific Language ) catchy heh?!


OdiDsl is written in Groovy and looks something like this...

 * OdiDsl to create a SCD2 dimension load mapping.


                [name: "HR.EMPLOYEES"],
                [name: "HR.DEPARTMENTS"],
                [name: "HR.JOBS"],
                [name: "PERF.D_EMPLOYEE", integration_type: "SCD"],
            .filter('NAME_FILTER', [filter_condition: "EMPLOYEES.FIRST_NAME LIKE 'D%'" ])
            .join('DEPT_JOBS', ['JOBS'], [join_condition: "EMPLOYEES.JOB_ID = JOBS.JOB_ID" ])
            .connect("D_EMPLOYEE", [
                                        [ attr: "employee_id", key_indicator: true ],
                                        [ attr: "eff_from_date", expression: "sysdate", execute_on_hint: "TARGET"],
                                        [ attr: "eff_to_date", expression: "sysdate", execute_on_hint: "TARGET"],
                                        [ attr: "current_flag", expression: 1, execute_on_hint: "TARGET"],
                                        [ attr: "surr_key", expression: ":RM_PROJECT.D_EMPLOYEE_SEQ_NEXTVAL", execute_on_hint: "TARGET"],

The above code will create the following, fully functional, mapping in ODI 12c (sorry 11g).

It should be fairly easy to eyeball the code and reconcile it with the above mapping image. We can see that we are specifying our datastores, selecting the EMPLOYEES datastore, adding a filter, a couple of joins and then connecting to our target. OdiDsl has been designed in such a way that it mimics the flow based style of ODI 12c's mappings by chaining components onto one another.

Creating a Mapping Using OdiDsl

Let's walk through the above code, starting with just the datastores, adding the rest as we go along...


We start by creating the mapping with mapping.create( <project>, <folder>, <mapping name>). We then chain the .datastores(), .commit() and .validate() methods onto it using the "dot" notation. The .datastores() method is the only method you can chain directly onto mapping.create() as it's a requirement to add some datastores before you start building up the mapping. The .commit() method persists the mapping in the repository and the .validate() method calls ODI's validation routine on the mapping to check if all is ok.

 * OdiDsl to create a mapping with 4 datastores.


                [name: "HR.EMPLOYEES"],
                [name: "HR.DEPARTMENTS"],
                [name: "HR.JOBS"],
                [name: "PERF.D_EMPLOYEE", integration_type: "SCD"],

When we execute this code it returns the following to the console. You can see that the mapping has been dropped/created and that ODI has some validation warnings for us.

Connecting to the repository...

mapping EMPLOYEE_DIM_LOAD dropped  
mapping EMPLOYEE_DIM_LOAD created

  Validation Results
  WARNING: Mapping component EMPLOYEES has no input or output connections.
  WARNING: Mapping component DEPARTMENTS has no input or output connections.
  WARNING: Mapping component JOBS has no input or output connections.
  WARNING: Mapping component D_EMPLOYEE has no input or output connections.

And here is the mapping in ODI - well, it's a start at least...

Starting the Flow with a Filter

Before we can start building up the rest of the mapping we need to select a starting datastore to chain off, you've got to start somewhere right? For that, we call .select("EMPLOYEES"), which is a bit like clicking and selecting the component in ODI Studio. The .filter() method is then chained onto it, passing in the filter name and some configuration, in this case, the actual filter condition.

 * OdiDsl to create a mapping with 4 datastores and a filter.


                [name: "HR.EMPLOYEES"],
                [name: "HR.DEPARTMENTS"],
                [name: "HR.JOBS"],
                [name: "PERF.D_EMPLOYEE", integration_type: "SCD"],
            .filter('NAME_FILTER', [filter_condition: "EMPLOYEES.FIRST_NAME LIKE 'D%'" ]) 

We now have an error in the validation results. This is expected as our filter doesn't connect to anything downstream yet.

Connecting to the repository...

mapping EMPLOYEE_DIM_LOAD dropped  
mapping EMPLOYEE_DIM_LOAD created

  Validation Results
  WARNING: Mapping component DEPARTMENTS has no input or output connections.
  WARNING: Mapping component JOBS has no input or output connections.
  WARNING: Mapping component D_EMPLOYEE has no input or output connections.
  ERROR: Mapping component NAME_FILTER must have a connection for output connector point OUTPUT1.

And here's the mapping, as you can see the filter is connected to the EMPLOYEES datastore output connector.

Adding a Join

Next we'll create the join between the filter and the DEPARTMENTS table. To do this we can just chain a .join() onto the .filter() method and pass in some arguments to specify the join name, what it joins to and the join condition itself.

 * OdiDsl to create a mapping with 4 datastores a filter and a join.


                [name: "HR.EMPLOYEES"],
                [name: "HR.DEPARTMENTS"],
                [name: "HR.JOBS"],
                [name: "PERF.D_EMPLOYEE", integration_type: "SCD"],
            .filter('NAME_FILTER', [filter_condition: "EMPLOYEES.FIRST_NAME LIKE 'D%'" ])

Only 2 validation warnings and no errors this time...

Connecting to the repository...

mapping EMPLOYEE_DIM_LOAD dropped  
mapping EMPLOYEE_DIM_LOAD created

  Validation Results
  WARNING: Mapping component JOBS has no input or output connections.
  WARNING: Mapping component D_EMPLOYEE has no input or output connections.

We now have a join named EMP_DEPT joining DEPARTMENTS and the filter, NAME_FILTER, together.

Adding another Join

We'll now do the same for the final join.

 * OdiDsl to create a mapping with 4 datastores, a filter and 2 joins.


                [name: "HR.EMPLOYEES"],
                [name: "HR.DEPARTMENTS"],
                [name: "HR.JOBS"],
                [name: "PERF.D_EMPLOYEE", integration_type: "SCD"],
            .filter('NAME_FILTER', [filter_condition: "EMPLOYEES.FIRST_NAME LIKE 'D%'" ])
            .join('DEPT_JOBS', ['JOBS'], [join_condition: "EMPLOYEES.JOB_ID = JOBS.JOB_ID" ])      

looking better all the time...

Connecting to the repository...

mapping EMPLOYEE_DIM_LOAD dropped  
mapping EMPLOYEE_DIM_LOAD created

  Validation Results
  WARNING: Mapping component D_EMPLOYEE has no input or output connections.

And we now have a join named DEPT_JOBS joining JOBS and the join, EMP_DEPT, to each other.

Connecting to the target

The final step is to connect the DEPT_JOBS join to our target datastore, D_EMPLOYEE. For this we can use the .connect() method. This method is used to map upstream attributes to a datastore. When you perform this action in ODI Studio, you'll be prompted with the attribute matching dialog, with options to auto-map the attributes.

OdiDsl will, by default, auto-map all attributes that are not explicitly mapped in the .connect() method. In our completed code example below we are explicitly mapping several attributes to support SCD2 functionality, auto-map takes care of the rest.

 * OdiDsl to create a SCD2 dimension load mapping


                [name: "HR.EMPLOYEES"],
                [name: "HR.DEPARTMENTS"],
                [name: "HR.JOBS"],
                [name: "PERF.D_EMPLOYEE", integration_type: "SCD"],
            .filter('NAME_FILTER', [filter_condition: "EMPLOYEES.FIRST_NAME LIKE 'D%'" ])
            .join('DEPT_JOBS', ['JOBS'], [join_condition: "EMPLOYEES.JOB_ID = JOBS.JOB_ID" ])
            .connect("D_EMPLOYEE", [
                                        [ attr: "employee_id", key_indicator: true ],
                                        [ attr: "eff_from_date", expression: "sysdate", execute_on_hint: "TARGET"],
                                        [ attr: "eff_to_date", expression: "sysdate", execute_on_hint: "TARGET"],
                                        [ attr: "current_flag", expression: 1, execute_on_hint: "TARGET"],
                                        [ attr: "surr_key", expression: ":RM_PROJECT.D_EMPLOYEE_SEQ_NEXTVAL", execute_on_hint: "TARGET"],

Nice, all validated this time.

Connecting to the repository...

mapping EMPLOYEE_DIM_LOAD dropped  
mapping EMPLOYEE_DIM_LOAD created  
Validation Successful  

What about Updates ?

Yes. We can also update an existing mapping using mapping.update(<project>, <folder>, <mapping name>). This is useful when you need to make changes to multiple mappings or when you can't drop and recreate a mapping due to a dependency. The approach is the same, we start by selecting a component with .select() and then chain a method onto it, in this case, .config().

mapping.update('MYPROJECT', 'DEMO', "EMPLOYEE_DIM_LOAD")  
            .config([join_type: "LEFT_OUTER"])

Which Properties Can I Change for each Component ?

Probably more than you'll ever need to, OdiDsl mirrors the properties that are available in ODI Studio via the SDK.

Can We Generate OdiDsl Code From an Existing Mapping ?

Yes, we can do that too, with .reverse(). This will allow you to mirror the process.

Let's take this, hand built, fictional and completely CRAZY_MAPPING as an example. (fictional and crazy in the sense that it does nothing useful, however, the flow and configuration are completely valid).

If we execute .reverse() on this mapping by calling...

mapping.reverse('MY_PROJECT', 'DEMO_FOLDER', 'CRAZY_MAPPING')  

...OdiDsl will return the following output to the console. What you are seeing here is the OdiDsl required to recreate the crazy mapping above.

Connecting to the repository...

mapping.create('MY_PROJECT', 'DEMO_FOLDER', 'CRAZY_MAPPING')  
         ['name':'STAGING.TABLE1', 'alias':'TABLE1'],
         ['name':'STAGING.TABLE9', 'alias':'TABLE9'],
         ['name':'STAGING.TABLE3', 'alias':'TABLE3'],
         ['name':'STAGING.TABLE4', 'alias':'TABLE4'],
         ['name':'STAGING.TABLE6', 'alias':'TABLE6'],
         ['name':'STAGING.TABLE5', 'alias':'TABLE5'],
         ['name':'STAGING.TABLE7', 'alias':'TABLE7'],
         ['name':'STAGING.TABLE2', 'alias':'TABLE2'],
         ['name':'STAGING.TABLE8', 'alias':'TABLE8'],
         ['name':'STAGING.TABLE11', 'alias':'TABLE11'],
         ['name':'STAGING.TABLE12', 'alias':'TABLE12'],
         ['name':'STAGING.TABLE13', 'alias':'TABLE13'],
         ['name':'STAGING.TABLE15', 'alias':'TABLE15'],
         ['name':'STAGING.TABLE14', 'alias':'TABLE14'],
         ['name':'STAGING.TABLE16', 'alias':'TABLE16'],
         ['name':'STAGING.TABLE17', 'alias':'TABLE17'],
         ['name':'STAGING.TABLE42', 'alias':'TABLE42'],
        .join('JOIN2', ['TABLE7'], [join_condition: "TABLE5.ID = TABLE7.ID" ])
        .join('JOIN3', ['TABLE6'], [join_condition: "TABLE6.ID = TABLE7.ID" ])
        .connect('TABLE14', [
                [ attr: "ID", expression: "TABLE5.ID" ],
                [ attr: "COL1", expression: "TABLE7.COL1" ],
                [ attr: "COL2", expression: "TABLE6.COL2" ],
                [ attr: "COL3", expression: "TABLE7.COL3" ],
                [ attr: "COL4", expression: "TABLE7.COL4" ],
        .expr('EXPRESSION1', [attrs: [
                [ attr: "ID", expression: "TABLE6.ID * 42", datatype: "NUMERIC", size: "38", scale: "0"]]])
        .connect('TABLE15', [
                [ attr: "ID", expression: "EXPRESSION1.ID" ],
                [ attr: "COL1", expression: "", active_indicator: false ],
                [ attr: "COL2", expression: "TABLE6.COL2" ],
                [ attr: "COL3", expression: "TABLE7.COL3" ],
                [ attr: "COL4", expression: "", active_indicator: false ],
        .join('JOIN', ['TABLE14'], [join_condition: "TABLE14.ID = TABLE15.ID" ])
        .filter('FILTER2', [filter_condition: "TABLE15.COL3 != 'FOOBAR'" ])
        .connect('TABLE16', [
                [ attr: "ID", expression: "TABLE15.ID" ],
                [ attr: "COL1", expression: "TABLE15.COL1" ],
                [ attr: "COL2", expression: "TABLE14.COL2" ],
                [ attr: "COL3", expression: "TABLE14.COL3" ],
                [ attr: "COL4", expression: "TABLE14.COL4" ],
        .connect('TABLE17', [
                [ attr: "ID", expression: "TABLE15.ID" ],
                [ attr: "COL1", expression: "TABLE15.COL1" ],
                [ attr: "COL2", expression: "TABLE14.COL2" ],
                [ attr: "COL3", expression: "TABLE14.COL3" ],
                [ attr: "COL4", expression: "TABLE14.COL4" ],
        .sort('SORT1', [sorter_condition: "TABLE5.ID, TABLE5.COL2, TABLE5.COL4" ])
        .connect('TABLE13', [
                [ attr: "ID", expression: "TABLE5.ID" ],
                [ attr: "COL1", expression: "TABLE5.COL1" ],
                [ attr: "COL2", expression: "TABLE5.COL2" ],
                [ attr: "COL3", expression: "TABLE5.COL3" ],
                [ attr: "COL4", expression: "TABLE5.COL4" ],
        .filter('FILTER1', [filter_condition: "TABLE3.ID != 42" ])
        .filter('FILTER', [filter_condition: "TABLE4.COL1 = 42" ])
        .lookup('LOOKUP1', 'FILTER1', [join_condition: "TABLE4.ID = TABLE3.ID AND TABLE3.COL1 = TABLE4.COL1"])
        .join('JOIN5', ['TABLE13'], [join_condition: "TABLE13.ID = TABLE3.ID" ])
        .distinct('DISTINCT_', [attrs: [
                [ attr: "COL3_1", expression: "TABLE4.COL3", datatype: "VARCHAR", size: "30"],
                [ attr: "COL4_1", expression: "TABLE4.COL4", datatype: "VARCHAR", size: "30"]]])
        .join('JOIN4', ['EXPRESSION1'], [join_condition: "TABLE5.ID = TABLE6.COL1" ])
        .sort('SORT', [sorter_condition: "EXPRESSION1.ID" ])
        .connect('TABLE8', [
                [ attr: "ID", expression: "EXPRESSION1.ID" ],
                [ attr: "COL1", expression: "", active_indicator: false ],
                [ attr: "COL2", expression: "", active_indicator: false ],
                [ attr: "COL3", expression: "TABLE7.COL3" ],
                [ attr: "COL4", expression: "", active_indicator: false ],
        .connect('TABLE12', [
                [ attr: "ID", expression: "TABLE8.ID" ],
                [ attr: "COL1", expression: "TABLE8.COL1" ],
                [ attr: "COL2", expression: "TABLE8.COL2" ],
                [ attr: "COL3", expression: "TABLE8.COL3" ],
                [ attr: "COL4", expression: "TABLE8.COL4" ],
        .expr('EXPRESSION', [attrs: [
                [ attr: "ID", expression: "TABLE9.ID *42", datatype: "NUMERIC", size: "38", scale: "0"],
                [ attr: "COL4", expression: "TABLE9.COL4 || 'FOOBAR'", datatype: "VARCHAR", size: "30"]]])
        .connect('TABLE1', [
                [ attr: "ID", expression: "EXPRESSION.ID" ],
                [ attr: "COL1", expression: "", active_indicator: false ],
                [ attr: "COL2", expression: "", active_indicator: false ],
                [ attr: "COL3", expression: "", active_indicator: false ],
                [ attr: "COL4", expression: "TABLE9.COL4" ],
        .join('JOIN1', ['TABLE2'], [join_condition: "TABLE1.ID = TABLE2.ID" ])
        .aggregate('AGGREGATE', [attrs: [
                [ attr: "ID", expression: "TABLE1.ID", datatype: "NUMERIC", size: "38", scale: "0", group_by: "YES"],
                [ attr: "COL4_1", expression: "MAX(TABLE2.COL4)", datatype: "VARCHAR", size: "30", group_by: "AUTO"]]])
        .lookup('LOOKUP', 'DISTINCT_', [join_condition: "AGGREGATE.ID = DISTINCT_.COL3_1"])
        .aggregate('AGGREGATE1', [attrs: [
                [ attr: "ID", expression: "AGGREGATE.ID", datatype: "NUMERIC", size: "38", scale: "0", group_by: "YES"],
                [ attr: "COL4_1_1", expression: "SUM(AGGREGATE.COL4_1)", datatype: "VARCHAR", size: "30", group_by: "AUTO"]]])
        .filter('FILTER3', [filter_condition: "AGGREGATE1.COL4_1_1 > 42" ])
        .connect('TABLE42', [
                [ attr: "ID", expression: "AGGREGATE1.ID" ],
        .join('JOIN6', ['TABLE8'], [join_condition: "AGGREGATE1.ID = TABLE8.ID" ])
        .connect('TABLE11', [
                [ attr: "ID", expression: "TABLE8.ID" ],
                [ attr: "COL1", expression: "" ],
                [ attr: "COL2", expression: "" ],
                [ attr: "COL3", expression: "TABLE8.COL3" ],
                [ attr: "COL4", expression: "TABLE8.COL4" ],

When we execute this OdiDsl code we get, you guessed it, exactly the same crazy mapping with the flow and component properties all intact.

Being able to flip between ODI studio and OdiDsl has some really nice benefits for those who like the workflow. You can start prototyping a mapping in ODI Studio, convert it to code, hack around for a bit and then reflect it all back into ODI. It's also very useful for generating a "code template" from an existing mapping. The generated code template can be modified to accept variables instead of hard coded properties, all you need then is some metadata.

Did Somebody Say Metadata ?

Metadata is the key to bulk automation. You can find metadata in all kinds of places. If you are migrating to ODI from another product then there will be a whole mass of metadata living in your current product's repository or via some kind of export routine which typically produces XML files. If you are starting a fresh ODI implementation, then there will be metadata living in your source and target systems, in data dictionaries, in excel sheets, in mapping specifications documents, all over the place really. This is the kind of metadata that can be used to feed OdiDsl.

A Quick Example of One possible Approach to Using OdiDsl With Metadata

First we build a mapping in Odi Studio, this will act as our template mapping.

We then generate the equivalent OdiDsl code using mapping.reverse('MY_PROJECT', 'DEMO_FOLDER', 'FEED_ME_METADATA'). Which gives us this code.

         ['name':'STAGING.TABLE1', 'alias':'LOOKUP_TABLE'],
         ['name':'STAGING.TABLE2', 'alias':'TABLE2'],
         ['name':'STAGING.TABLE3', 'alias':'TABLE3'],
         ['name':'STAGING.TABLE4', 'alias':'TARGET_TABLE'],
        .lookup('LOOKUP', 'LOOKUP_TABLE', [join_condition: "TABLE2.ID = LOOKUP_TABLE.ID"])
        .join('JOIN', ['TABLE3'], [join_condition: "TABLE2.ID = TABLE3.ID" ])
        .filter('FILTER', [filter_condition: "TABLE3.COL1 = 'FILTER'" ])
        .expr('EXPRESSION', [attrs: [
                [ attr: "CONSTANT", expression: "42", datatype: "VARCHAR", size: "30"]]])
        .connect('TARGET_TABLE', [
                [ attr: "ID", expression: "LOOKUP_TABLE.ID" ],
                [ attr: "COL1", expression: "LOOKUP_TABLE.COL1 || EXPRESSION.CONSTANT" ],
                [ attr: "COL2", expression: "TABLE2.COL2" ],
                [ attr: "COL3", expression: "TABLE3.COL3" ],
                [ attr: "COL4", expression: "LOOKUP_TABLE.COL4" ],

We now need to decorate this code with some variables, these variables will act as place holders for our metadata. The metadata we are going to use is from a database table, I'm keeping it simple for the purpose of this demonstration but the approach is the same. Our metadata table has 10 rows and from these 10 rows we are going to create 10 mappings, replacing certain properties with the values from the columns.

Remember that OdiDsl is expressed in Groovy. That means, as well as OdiDsl code, we also have access to the entire Groovy language. In the following code we are using a mixture of Groovy and OdiDsl. We are connecting to a database, grabbing our metadata and then looping over mapping.create(), once for each row in our metadata table. The columns in the metadata table are represented as the variables row.datastore, row.constant_expr and row.filter_cond. The code comments indicate where we are substituting these variables in place of our previously hard coded property values.

import groovy.sql.Sql

// Connect to the database and retrieve rows from the METADATA table.
def sqlConn = Sql.newInstance("jdbc:oracle:thin:@hostname:1521/pdborcl", "username", "password", "oracle.jdbc.pool.OracleDataSource")  
def rows = sqlConn.rows("SELECT * FROM METADATA")  

// For each row in our METADATA table
rows.eachWithIndex() {  row, index ->

    mapping.create('MY_PROJECT', 'DEMO_FOLDER', "FEED_ME_METADATA_${index+1}") // Interpolate row number to make the mapping name unique
                    ['name': 'STAGING.TABLE1', 'alias': 'LOOKUP_TABLE'],
                    ['name': "STAGING.${row.datastore}" ], // substitute in a different datastore
                    ['name': 'STAGING.TABLE3', 'alias': 'TABLE3'],
                    ['name': 'STAGING.TABLE4', 'alias': 'TARGET_TABLE'],
                .lookup('LOOKUP', 'LOOKUP_TABLE', [join_condition: "${row.datastore}.ID = LOOKUP_TABLE.ID"]) // substitute in a different datastore
                .join('JOIN', ['TABLE3'], [join_condition: "${row.datastore}.ID = TABLE3.ID"]) // substitute in a different datastore
                .filter('FILTER', [filter_condition: "TABLE3.COL1 = '${row.filter_cond}'"]) // substitute in a different filter condition
                .expr('EXPRESSION', [attrs: [
                    [attr: "CONSTANT", expression: row.constant_expr, datatype: "VARCHAR", size: "30"]]]) // substitute in a different constant for the expression
                .connect('TARGET_TABLE', [
                    [attr: "ID", expression: "LOOKUP_TABLE.ID"],
                    [attr: "COL1", expression: "LOOKUP_TABLE.COL1 || EXPRESSION.CONSTANT"],
                    [attr: "COL2", expression: "${row.datastore}.COL2"], // substitute in a different datastore
                    [attr: "COL3", expression: "TABLE3.COL3"],
                    [attr: "COL4", expression: "LOOKUP_TABLE.COL4"],


Here is the output...

Connecting to the repository...

mapping FEED_ME_METADATA_1 created  
Validation Successful  
mapping FEED_ME_METADATA_2 created  
Validation Successful  
mapping FEED_ME_METADATA_3 created  
Validation Successful  
mapping FEED_ME_METADATA_4 created  
Validation Successful  
mapping FEED_ME_METADATA_5 created  
Validation Successful  
mapping FEED_ME_METADATA_6 created  
Validation Successful  
mapping FEED_ME_METADATA_7 created  
Validation Successful  
mapping FEED_ME_METADATA_8 created  
Validation Successful  
mapping FEED_ME_METADATA_9 created  
Validation Successful  
mapping FEED_ME_METADATA_10 created  
Validation Successful  

And here are our 10 mappings, each with it's own configuration.

If we take a look at the FEED_ME_METADATA_5 mapping, we can see the metadata has been reflected into the mapping.

And that's about it. We've basically just built a mini accelerator using OdiDsl and we hardly had to write any code. The OdiDsl code was generated for us using .reverse(). All we really had to code, was the connection to the database, a loop and bit of variable substitution!


With the Rittman Mead ODI Tool kit, accelerating your ODI implementation has never be easier. If you are thinking about migrating to ODI from another product or embarking on a new ODI Project, Rittman Mead can help. For more information please get in touch.

Categories: BI & Warehousing

Create Your Own DVD Plugin in 22 minutes

Wed, 2017-03-08 06:45

Oracle DVD played well for the task of elections data analysis. But I faced some limitations and had to curtail my use of one of my favourite charts - Scatter chart. While DVD’s implementation of it looks good, has smooth animations and really useful, it is not suitable for visualisations of huge datasets. Election dataset has about 100K election precinct data rows and if I try to plot all of them at one chart I will get something close to this. It doesn't really important what is the data, just compare how charts look like.

Picture with Turnout%, United Russia% by PEC

This picture can give us about zero insights. Actually, I'm not even sure if shows my 100K points. What I want to get is something like this:
plugin with Turnout%, United Russia% by PEC (source: https://habrahabr.ru/post/313372/)

What can I see at this chart and can't see at the vanilla one? First of all, it gives a much better understanding of points distribution other the plot. I can see a relatively dense core around coordinates [30;40] and then not so dense tail growing by both dimensions and again a small and very dense core in the top right corner.
Secondly, it not only shows dense and sparse areas but shows that there are exist a few points which have an unexplainably high concentration. Around [64;62], for example.

Luckily DVD allows me (and anyone else too) create custom plugins. That's what I'm going to show here. The plugin I'm going to do is something I'd call Minimum Viable Product. It's very, very simple and needs a lot of work to make it good and really useful.

The code I'm going to show is the simplest I can invent. It doesn't handle errors, exceptions and so on. It should be more modular. It possibly could use some optimisation. But that was done for a purpose. And the purpose is called 'simplicity'. If you want a more comprehensive sample, there is a link to the Oracle's guide in the Summary. I want to show you that writing a DVD plugin is not a rocket science. It takes less than half an hour to build your own DVD plugin.

And the clock starts now!

[0:00-5:40] Setup Environment and DVD SDK Setup

Download and install DVD. I presume you already have it, because from my experience people who doesn't have it very rarely feel a need for its plugins. Hence I didn't include download and installation times into 22 minutes.

Create a folder to store your plugins and some Gradle-related configs and scripts. It may be any directory you like, just select your favourite place.

Next step is to define some environment variables.

  • The first one is DVDDESKTOP_SDK_HOME and it should point to thу DVD folder. In most of the cases, it is C:\Program files\Oracle Data Visualisation Desktop.
  • The second variable is PLUGIN_DEV_DIR it points to the folder you created 40 seconds ago. These two variables seem not to be absolutely necessary, but they make your life a bit easier.
  • And one more change to the environment before going further. Add %DVDESKTOP_SDK_HOME%\tools\bin to the PATH variable (do you feel how your life became easier with DVDDESKTOP_SDK_HOME already defined so you can start using it right now?).

And the last step is to initialise our development environment. Go to the plugins directory and execute the initialization command. Open cmd window and execute:


This will not only initialise everything your need but create a sample plugin to explore and research.

Let's take a look at what we got. In the folder we have created before (%PLUGIN_DEV_DIR%) this command generated the following structure.

a screenshot of %PLUGIN_DEV_DIR%

All names here are more than self-explanatory. You won't need to do anything with .gradle and gradle, and src stores all the code we will develop. Please notice that src already has sample plugin sampleviz. You may use it as a referrence for your own work.

Create First Plugin

Every plugin consists of a few parts. We don't need to create them from scratch every time. Oracle gives us a simple way of creating a new plugin. If you didn't close cmd window from previous steps execute the following:

bicreateplugin viz -id com.rittmanmead.demo -subType dataviz  

Obviously, you should use your own company name for the id. The last word (demo) plus word Plugin is what user will see as a visualisation name. Obviously we can change it but not today.

This will create a fully operational plugin. Where can we find it? You shouldn't be too surprised if I say "In src folder".

a screenshot of %PLUGIN_DEV_DIR%

And we may test it right now. Yes, its functionality is very limited, but you didn't expect it to do something useful, right?

Start SDK

New we should start Oracle DVD in SDK mode. The main difference from the normal mode is that DVD will start in the default browser. That's not a typo. DVD will start as a usual site in a browser. It's a good idea to select as the default a browser with good developer console and tools. I prefer Chrome for this task but that's not the only choice. Choose the browser you'd normally use for site development. When you finish with setting up of the default browser do:

.\gradlew run

If you don't have an ultrafast computer I'd recommend you to make some tea at this point. Because the very first start will take a long time. Actually, it will consume most of the first stage time.

[5:40-6:15] Test the Plugin

We didn't start changing the first plugin yet, so I expect everything to be OK at this point. But anyway it's a good idea to test everything. I think we can afford 30-40 seconds for this.

And again I assume you know how to use DVD and don't need a step-by-step guide on creating a simple visualisation. Just create a new project, select any data source, go to Visualisations, find your plugin and add it. You may even add some data to the plugin and it will show how many rows your data has. I think that's a good place to start from.

a picture of plugin

[6:15-22:00] Add Functions Change parameters

If you didn't skip testing phase and played with this new toy for a while you may have noticed two things we need to change right now. First, for a chosen type of visualisation, I need that my plugin can accept two measures. You possibly noticed that I was unable to add Number of cancelled ballots as a second measure. By default a plugin accepts not more than one (zero or one). Luckily it can be changed very easily.

We can find all parameter about measures, rows, columns etc inside of extensions folder. In this case it is %PLUGIN_DEV_DIR%\customviz\com-rittmanemad-demo\extensions. Here we can find two folders. The first one is oracle.bi.tech.pluginVisualizationDataHandlerand it has only one file com.rittmanemad.demo.visualizationDataHandler.JSON. This file allows us to define which types of input data our plugin has (measures, columns, rows, colour etc.), what are their types (measures/dimensions), which is the default one and so on.

a picture of json and plugin side-by side Here we should change maxCount to 2. This will allow us to add more than one measure to our visualisation.

The second thing I want to change is the number of data points. Elections dataset has data about 100K precinct commissions. And DVD's default is 10K. We can change this value in the second configuration file from extensions folder. Its name is oracle.bi.tech.Visualization. There is again only one JSON file com.rittmanmead.demo.json. We need to change rows to 200000.

Why 200000 if I need only 100000 points? Well, it's about how data will be passed to our code. Every measure is a separate row, so I need 100K points with 2 measures each and that gives me 200000 rows. It looks like right now that's the only way to have more than one measure in a plugin (at least it's my best knowledge for DVD

a picture of json and plugin side-by side

Note. I could change here some things like plugin name, input data types shown to user and so on but my aim is simplicity.

Now we should restart DVD SDK in order to use new parameters.

two measures and ~200K rows

Write code

For my custom scatter chart I'm going to use d3js JavaScript library. It will allow me to concentrate more on logic and less on HTML manipulation. To add it to my plugin I should add a couple of strings in the beginning of my code.


                 messages) {

After (I added two lines starting with d3):

                 messages) {

That's all. Now I can use d3js magic. And that's cool.

OK, it's time to make our plugin do something more useful than a simple counting of rows. All plugin code is in demo.js file and the main procedure is render. It is called every time DVD needs to update the visualisation. The initial version of this function is really small. Without comments there are only four rows. It retrieves data, counts rows, then finds a container to write and writes a message.

Demo.prototype.render = function(oTransientRenderingContext) {  
      // Note: all events will be received after initialize and start complete.  We may get other events
      // such as 'resize' before the render, i.e. this might not be the first event.

      // Retrieve the data object for this visualization
  var oDataLayout = oTransientRenderingContext.get(dataviz.DataContextProperty.DATA_LAYOUT);

      // Determine the number of records available for rendering on ROW
      // Because we specified that Category should be placed on ROW in the data model handler,
      // this returns the number of rows for the data in Category.
  var nRows = oDataLayout.getEdgeExtent(datamodelshapes.Physical.ROW);

      // Retrieve the root container for our visualization.  This is provided by the framework.  It may not be deleted
      // but may be used to render.
  var elContainer = this.getContainerElem();
  $(elContainer).html(messages.TEXT_MESSAGE.format("Demo Plugin", "" + nRows));

First of all, I need to know the actual size of the plotting area. I simply added that after var elContainer = this.getContainerElem();:

  var elContainer = this.getContainerElem();

  //container dimensions
  var nWidth = $(elContainer).width();
  var nHeight = $(elContainer).height();

The template plugin has code for pulling the data into the script (oDataLayout variable). But for my scatter plot, I need to put this data into an array and find the minimum and the maximum value for both arrays. This part of the code is my least favourite one. It looks like currently we can't make tho separate measures in a custom viz (or at least I can't find the solution), therefore instead of two nice separate measures, I have them as different rows. Like:

......... PEC #1Turnout,%34,8% PEC #1United Russia,%44,3% PEC #2Turnout,%62,1% PEC #2United Russia,%54,2% PEC #3Turnout,%25,9% PEC #3United Russia,%33,2% .........

I really hope that any solution for this will be found. So far I just put even and odd rows into X ad Y coordinates. At the same time, I'm determine minimum and maximum values for both axes.

  //temporary measure
  var tmp_measure;

  //current measure
  var cur_measure;

  var vMinimax=[Number.MAX_VALUE, Number.MIN_VALUE];
  var hMinimax=[Number.MAX_VALUE, Number.MIN_VALUE];

  for(var i=0;i<nRows;i++){

      cur_measure=Number(oDataLayout.getValue(datamodelshapes.Physical.DATA, i, 0, false));
      vMinimax[0]=Math.min(vMinimax[0], cur_measure);
      vMinimax[1]=Math.max(vMinimax[1], cur_measure);
      tmp_measure=Number(oDataLayout.getValue(datamodelshapes.Physical.DATA, i, 0, false));
      hMinimax[0]=Math.min(hMinimax[0], tmp_measure);
      hMinimax[1]=Math.max(hMinimax[1], tmp_measure);

The next part of the code is about handling multiple calls. I should handle things like changing measures or dimensions. I simply delete the old chart and create a new empty one.

  var oSVG;

  //Delete old chart if exists

  //Create new SVG with id=DemoViz

  //Set plot area size
       .attr("width", nWidth)
       .attr("height", nHeight)

Now I have an array with data. I have a prepared SVG plot for drawing. And I have no reason not to combine all this into a chart.

  //Compute scaling factors
  var hStep=(nWidth-40)/(hMinimax[1]-hMinimax[0]);
  var vStep=(nHeight-40)/(vMinimax[1]-vMinimax[0]);

      .attr("x", function(d) {return 20+d[1]*hStep;})
      .attr("y", function(d) {return nHeight-(20+d[0]*vStep);})
      .attr('fill-opacity', 0.3);

What I should mention here is opacity. If its value is too high (almost no opacity) the chart looks like this:


If I make dots more opaque, the chart looks (and works) better. This is how it looks like with fill-opacity=0.3.
0.99 Look at the black point in the middle. It's there because a lot of commissions has exactly the same values for both measures. That's how we could find Saratov region anomaly.

[22:00-] Summary and TODOs

This shows how we may enhance Oracle DVD with our own visualisation. Yes, it's really, really, really simple. And it needs a lot of work in order to make it more useful. First of all, we should make it more reliable. This realisation doesn't test input parameters for correctness. And we also need to add interactivity. The chart should be zoomable and selectable. It should work as a filter. It should react to resize. It needs visible axes. And so on. But we can create it in less than a half of hour and I think as a first step it works well. I hope you don't afraid of making your own plugins now.

If you need more complex sample, take a look at the Oracle's tutorial. It covers some of these TODOs.

Categories: BI & Warehousing

A Case for Essbase and Oracle Data Visualization

Mon, 2017-03-06 12:03

So it’s the end of the month, or maybe even the end of the quarter. And you’ve found yourself faced, yet again, with the task of pulling whatever data will be needed to produce the usual standardized budgetary and / or finance reports. You’re dreading this as it will probably eat up most of your day just getting the data you need, combing through it to find the metrics you need, and plugging them into your monumentally complicated custom spreadsheet, only to find the numbers are, well, off.

Enter Data Visualization (DV), and its lightweight brother application, Data Visualization Desktop (DVD). My bud at RM, Matt Walding, already did a pretty great post on some of the cursory features of DVD, covering a lot of the important how-tos and what’s whats. So check that out if you need a bit of a walkthrough. Both of these great tools tout that you can go from zero to analysis pretty darn quick, and from the extensive testing and prodding we’ve done with both DV and DVD, this claim is accurate. Now how does this help us, however, in the previous scenario? Well, IT processes being what they are in a lot of mid to large size companies, getting the data we need, to do the crunching we need to do, can be quite the monumental task, let alone the correct data. So when we get it, we are going to want a solution that can take us from zero to report, pretty darn quick.

If you’re the one stuck with doing the crunching, and then providing the subsequent results, your solution or workflow probably resembles one of the following:

Scenario 1

Emailed a spreadsheet with a ton of rows. Download the csv/xlsx and then crunch the rows into something that you can force into a super spreadsheet that has a ton of moving pieces just waiting to throw an error.

Scenario 2

You have access to Essbase, which stays pretty fresh, especially as reporting time draws nigh. You connect to Smartview and extract what you need for your report. See scenario 1.

Scenario 3

You have OBIEE that you depend on for data dumps, and then just export whatever you need. See scenario 1.

While there are no doubt variations on these themes, the bigger picture here is that between the time you receive your dataset and the final report, there are likely a few iterations of said final report. Maybe you’re having to make corrections to your Excel templates, perhaps the numbers on your sheet just aren’t jiving. Whatever the case may be, this part of the process is often the one that can be the most demanding of your time, not to mention the most headache inducing. So what’s the point of my schpeal? Well, wouldn’t it be nice to expedite this part of the whole thing? Let’s take a look at how we can do just that with both Data Visualization in OBIEE and Data Visualization Desktop.

Data Visualization

With DV, we can simply access any of our existing OBIEE subject areas to quickly create a basic pivot table. Right away you can see the profound time savings garnered by using DV. What's more, you don't need to feel forced into managing OBIEE on premises, as DV is also part of BICS (Business Intelligence Cloud Service) and DVCS (Data Visualization Cloud Service).

Even if you have Smartview, and can do more or less the same thing, what if you wanted to delegate some of the tedium involved in manually crunching all those rows? You could simply hand off an export to another analyst, and have them plug it right into their own instance of Data Visualization Desktop, which, might I add, comes with your purchase of the DV license. This also, however, leads down the slippery slope into siloing off your department. This approach is essentially doing that, however kept under the quarantine that is DVD, as this blog is touting, and keeping the data with which you are working consistent, you shouldn't be able to do too much damage. The point I'm trying to make is that everything about using DV and DVD as your sort of report crafting and proofing mediums, is super-fluid and smooth. The process from source system to report and over and over again, is super-seamless. Even if I didn't have direct access to the data source I needed, and had to rely on emailed data dumps or other, I can simply upload that sheet right into DV, assign some data types, and get to work. I can even add dynamic filters to the analysis by simply dragging and dropping a column to the filters area. If you're feeling adventurous, you could also display these tables on a dashboard, that perhaps your department looks at to proof them and share in the pleasurable experience that is concocting period-end reporting.

Data Visualization Desktop

Right now, DVD is only out for Windows (with a version for the Mac on the roadmap), which is mostly ok, as most every medium to large size company I have worked with employs Windows as their go-to OS. An analyst can install the program on their desktop machine and be ready to plug away in under 10 minutes. We can take the example spreadsheet above that we dumped out of VA and create our own version of the report right in DVD. One better, we can also blend it with any other source DVD can connect to. This feature, especially, can save lots of time when trying to get your numbers just right for sign off.

Looks Just Like DV

Acts Just Like DV

And hold the phone! There's even a native Essbase connector!

Actually One Better Than DV!

Speaking of connectors, check out the rest of the list, as well as the custom data flow functionality, which allows you to construct and save in-app data transformations to be invoked again, and again, and again...


Flexibility is the name of the game with DV and DVD. So sure, while it isn't the ideal tool for creating precisely formatted financial statements for SEC submission, it sure beats massaging and munging all that data in Excel. And once we're happy with our numbers, either by sharing our reports on a dashboard, or export, we can go ahead and plug our numbers right into whatever tool we are going to use to produce the final product. We've covered a couple of really good concepts, so let's just do a bit of a wrap up to make everything a bit more cohesive.

  • We can pull data into Data Visualization from OBIEE, from a spreadsheet, or even mash up the two to handle any inconsistencies that may exist at month's/quarter's end. Note that we can also use Smartview to pull directly to Excel. Note that you will be unable to use the OLAP capabilities of Essbase / Essbase in OBIEE, as the data will presented in DV and DVD as flattened hierarchies.

  • We can share reports in Data Visualization with other analysts or approvers via a dashboard (note that this is not a supported feature, but something that requires only a 'bit' of a hack) or PDF, or hand off our exported data set for further work. This can be to another person using DV or DVD.

  • Data Visualization Desktop can connect directly to our Essbase source or utilize a data dump from DV / OBIEE to do further work and analysis on the data set. We can also connect DVD to OBIEE, as posted here, in order to extract an analysis from the web catalog. This will at least save us the step of dumping / emailing the report.

  • All of this app-to-app compatability encourages a sustainable, functional, and fluid reporting environment (warning: data silos!), especially for those who are unfamiliar with the ins and outs of Essbase and / or working with OBIEE.

Categories: BI & Warehousing

Real World OBIEE: Demystification of Variables Pt. 1

Mon, 2017-03-06 10:36

OBIEE has some very powerful features and capabilities when it comes to how we visualize and present our data. There are some features which almost everyone uses (filters, prompts, rename columns ect.) and then some that are less common. In part, I think there is a lot of "noise" that surrounds the explanation of some of these features. Another reason, and probably the most likely, is that a lot of report developers may just don't know that they exist. I can't tell you how many times in one of our training courses here at Rittman Mead that someone says "wow, I didn't know I could do that!" or that I get asked, "could you explain how Presentation Variables work?". Now, I am not saying that there is anything wrong with not knowing or being confused when it comes to some of the features in OBIEE. In fact, it's my main purpose in writing this blog series! I want to demystify and show you some of the features and techniques I have used with almost every client I have worked with. Even if you are a seasoned veteran, my goal is to present something in a way that you haven't heard before and spark some ideas in your day-to-day development. So with all of that said, let's dive into part one of the Demystification of Variables.

For the first part of this series, I want to talk about using Bins and Presentation Variables together at a report level and also in dashboard prompts. Before we dive into an example, let me talk a little bit about bins and presentation variables.


Sometimes we need to group values differently than what we have available from our column selection. Common examples could be when we need to group products differently than product category, when we need to create a group for items on sale or, maybe the most common I've seen, when we need to create custom groups for certain values and then have a group for the "others". Creating custom groups can be accomplished in a few different ways (selection steps, add new group from the Results tab), but the way I'm going to show you involves using the Bins tab under Edit formula within a column. I'll go over this process shortly when we dive into the example.

Presentation Variables

These are one of the most powerful features in Answers. They are also one of the things that trip people up the most. Think of presentation variables like placeholders and these placeholders are waiting for a value or a group of values to be passed into them. There are really only three parts to the presentation variables we will look at in this example: First you assign the presentation variable (give it a name) in a prompt to pass the value or group of values selected into the second part. The second part is where the placeholder (presentation variable with the same name) is put in a column formula, column heading, text object etc. The third part is called formatting. Formatting is used when you need to apply a certain format to the values coming into the presentation variable placeholder. There are a massive amount of use cases for presentation variables but I am going to cover only a couple of the ones that I've used the most.

Make sense? If you're still a little confused or hearing about this for the first time, don't worry. We are going to go through an example that I hope will explain the uses of Bins and Presentation Variables in a way that is concise and clear.

Now, lets dive into the example.

The Scenario

In this example, I am going to work through a hypothetical situation where I am the front-end developer for a coffee shop and I've been asked to create a report for an executive dashboard. The report needs to include Gross Revenue, Net Revenue and # of Orders for the current date for six groups of products (Breakfast Foods, Coffee, Coffee Beans, Tea, Tea Bags and Other) and five location groups which contain all the stores located in four states (California, Massachusetts, District of Columbia, New York and Others). The report needs to be in a table view and the person that requested this report also wants the ability to switch between the two columns for product groups and location groups. They also want a dashboard prompt for selecting from one or more groups for each column.

The Example

Let's start by tackling the custom groups. Since both groups are going to be created from the lowest level of detail for both products and locations, we need to bring in both Product Name and Shop Name columns and also our measures.

Note: For all of the examples in this blog series I am using OBIEE

I can click on column properties > Edit Formula then click on the Bins tab to start creating my groups.

In the Bins tab, I can click on Add Bin to start creating my first group.

Once I click on Add Bin, I am presented with a New Filter window. The process of creating a new group (or bin) is exactly the same as creating a column filter. I can type the values individually, separated by ; or I can click on the search icon to add values manually. In this case, I am going to add them manually.

Once in the Select Values window, I am going to select all of the products that belong in the first group (Breakfast Foods).

Note: In this scenario, the person has given me a list of all the products and stores that they want in each group

Once I'm done adding all the items, clicking OK will take me to the Edit Bin Name window where I can give my new group (or Bin) a custom name.

Since the person who requested the report probably wants the Others group to be last, I am going to add numerical values to each group. This will give me the flexibility to order columns any way that I choose. After I’m done creating all my Bins, there is a checkbox at the bottom of the Bins window where I can create a custom bin for all other values.

Clicking OK will return me to the Criteria where I can view the formula that OBIEE generated for all of my selections.

When using Bins, OBIEE generates a custom Case statement for us. At a high level, a Case statement takes a set of conditions and executes a corresponding statement. In other words, when the Product Name column has these values, put them all together an call them THIS.

If I go to Results, I can see the Bins I just created.

Now I can return to the Criteria to create the groups for the stores. Upon completion, here is the formula for the Locations:

And the result:

Now I need to come up with a way to be able to switch between the Product and Location columns I just created. There are a couple of options. Since i'm using OBIEE 12c, I can save both the Products and Locations columns to the web catalog and simply create a column selector just like I would do for any of the columns within a subject area.

There are two problems with this method. One is that the Save Column As feature is only available from OBIEE forward so, if you are still on or earlier, this method won't work. The second problem is that the request is not just for swapping the Product and Location columns with a prompt but also being able to select individual values for each column with a prompt. To avoid confusion, we probably want the prompt values for each column to change dynamically depending on which column is selected. To accomplish this I am going to use a second method which requires me to do the following:

  1. Combine the case statements for both Products and Locations column formulas

  2. Insert two additional case statements along with presentation variables to allow me to switch between the two.

  3. Create a filter for the combined column with a presentation variable to allow me to use prompted values.

  4. Create a variable type dashboard prompt to select between the Product and Locations independently.

  5. Create a dashboard prompt that dynamically changes between the value selections for the Products and Locations.

I'm going to start with the first two steps since they require editing the same column formula.

I am going to create a new column and copy both my Products and Locations column formulas into the new column's edit formula window and, to make it easier to edit and insert the two additional case statements, I am going to put a blank line between them.

Now I need to put in my two additional case statements with presentation variables so that I can switch between the two. The syntax and naming convention I am going to use is as follows:

CASE WHEN @{pv_select}['@']{'Products'} = 'Products' THEN [Products Case Statment]

CASE WHEN @{pv_select}['@']{'Products'} = 'Location' THEN [Locations Case Statment]

Let me take a moment to examine and explain the presentation variable I'm using.

The syntax for putting a presentation variable in a column formula, column heading or text object is as follows:


Note: I am going to use the presentation variable name (pv_select) in my variable prompt later in this example

The default value I have defined (Products) is an optional step that is primarily used for when I want to view my report without any values being passed into the presentation variable. The default value is used when no value has been passed into the presentation variable.

Notice that I have also included formatting between the presentation variable name and the default value (['@']). The reason for this is because the prompt values (which will be defined as Products and Locations in the variable prompt I will create) are a string (text) datatype and I have to explicitly define the format I need for the presentation variable value or I will get an error. You have to define formatting using single brackets [] and then encapsulate the @ character with the formatting you need applied to the presentation variable value. Think of the @ character as a sort of shorthand or an abbreviation for the presentation variable itself.

Note: I could enclose my presentation variable in single quotes to achieve the same results in this senario ('@{pv_select}{Products}').

So, to explain the entire statement another way:

CASE WHEN the value of the presentation variable coming from the prompt is equal to 'Products', THEN then use the CASE statment for the product groups

CASE WHEN the value of the presentation variable coming from the prompt is equal to 'Location', THEN then use the CASE statment for the location groups

My new column formula is as follows:

CASE WHEN @{pv_select}['@']{Products} = 'Products' THEN

CASE WHEN "Sales - Fact Sales"."Products"."Product Name" IN ('All Butter Croissant','Almond Croissant','Banana and Walnut Muffin','Blueberry Muffin','Caramel Shortbread','Chocolate Chip Muffin','Lemon Drizzle Muffin','Maple Pecan Danish') THEN '1. Breakfast Foods' WHEN "Sales - Fact Sales"."Products"."Product Name" IN ('Americano','Caramel Hot Cocoa','Classic Hot Cocoa','Dark Hot Cocoa','Expresso','Filter Coffee','Iced Caramel Latte','Iced Cocoa','Iced Latte','Iced Mocha','Latte','Macchiato','Mocha','Pain Au Chocolat','Vanilla Hot Cocoa','White Hot Cocoa') THEN '2. Coffee' WHEN "Sales - Fact Sales"."Products"."Product Name" IN ('Arabica Beans','Arabica Ground','Robustica Beans','Robustica Ground','GCBC Special Blend Instant') THEN '3. Coffee Beans' WHEN "Sales - Fact Sales"."Products"."Product Name" IN ('Chai Tea','Chamomile Tea','Earl Grey Tea','Fresh Mint Tea','Jasmine Tea','Spearmint Tea') THEN '4. Tea' WHEN "Sales - Fact Sales"."Products"."Product Name" IN ('Chamomile Tea Bags','Chai Tea Bags','Earl Grey Tea Bags','English Breakfast Tea Bags','Fresh Mint Tea Bags','Jasmine Tea Bags','Spearmint Tea Bags') THEN '5. Tea Bags' ELSE '6. Other' END

WHEN @{pv_select}['@']{'Products'} = 'Location' THEN

CASE WHEN "Shops"."Shop Name" IN ('SF1 - Fisherman''s Wharf', 'SF2 - South Beach', 'SJ1 - First Street Corridor', 'SV1 - Mountain View') THEN '1. California' WHEN "Shops"."Shop Name" IN ('BO1 - Financial District', 'BO2 - Atlantic Wharf', 'BO3 - Back Bay', 'CA2 - Harvard Square') THEN '2. Massachusetts' WHEN "Shops"."Shop Name" IN ('DC1 - Central Business District', 'DC2 - Dupont Circle') THEN '3. District of Columbia' WHEN "Shops"."Shop Name" IN ('NY1 - Times Square', 'NY2 - Fifth Avenue') THEN '4. New York' ELSE '5. Others' END
ELSE 'Other' END*

I am also going to use the presentation variable in the column heading so that when I make my selection in the prompt, the column heading will reflect the selection as well.

If I go to my results, notice how only the Products column is showing due to the default value I set when defining my presentation variable.

If I go back to my column formula and change the default value to be Location instead of Products, the column values will change to the ones defined in my Locations case statment.

In order for the report to be filtered for individual values, I need to create a filter using a presentation variable. To do this I am going to create a filter for the column I used to define all of my groups and, instead of selecting a value, I am going to put in a new presentation variable which I will define later when I create the dashboard prompt for the values.

I am going to name my new presentation variable pv_select_value.

When a Presentation Variable is added to the filter, two new text boxes appear. The Variable Expr box is where you define the variable to be used and the (default) box is used to add a default value. Notice how I am not using the syntax @{presentation_variable_name}. When defining a presentation variable using the presentation variable option in a filter or in a prompt, you only have to define the name. Once I click ok, OBIEE will insert the proper syntax for me. The default value is optional but, when defining a Presentation Variable within a filter, you have to specify a default value in order to get any results within the analysis. If I run the report with just a presentation variable defined with no default value, I will get the error shown below:

This is because, when the report is run, the query issued is trying to use the value of the presentation variable. Since there is no default value specified, it is trying to use the presentation variable name itself as a value. If I specify a default value or a list of default values, the report will use those values in the absence of a value or values in the presentation variable. Because I want all of the values to be shown initially on the dashboard, I am going to use the Convert this filter to SQL option and input all of the values for the product group manually for the list of default values.

If I run the report with the default values specified, I get the results back for the product group.

In order for the filter to work with the column prompt I will create later in this example, it requires the [formatting] syntax to be used (@{pv_select_value}['@']{'1. Breakfast Foods', '2. Coffee', '3. Coffee Beans', '4. Tea', '5. Tea Bags', '6. Other'}) for example). This will encapsulate each value passed into the pv_select_value presentation variable with single quotes.

Now I need to add an additional filter on the same column using the same pv_select_value presentation variable. The reason for this is because I need to define each default value for the location group as well so that all of the values for location will be shown on the dashboard. To do this, I am going use the same process that I used for the product group filter.

There is one additional step that I need to do in order to make the filter work for both products and locations. When you initially create report filters, OBIEE assumes that you want to use all filters together at the same time therefore it uses 'AND' for each filter created after the initial filter.

If I run the report, I will get an error because the filter is being applied for both the products and location values. Because the case statment is defined for products and locations independently, I can not use all values in a filter at the same time.

To fix this, I need to change the operator for the location filter from 'AND' to 'OR'. I can do this by simply clicking on the 'AND' icon.

Now the report will run because only the first set of conditions for the product group filter is applied.

Now I need to create my two dashboard prompts: One for the initial selection between my two case statements for Products and Locations and one for the values.

I start by creating a new dashboard prompt and create a new Variable Prompt.

Because this is a variable prompt, I have to manually specify the values I want the user to be able to select from as well as define the presentation variable. Since I am creating the prompt for switching between the two case statements for Products and Locations, I am going to make the following selections and input the following values.

Prompt for: Presentation Variable > pv_select

User Input: Choice List

Choice List Values: Custom Values

Values: Products and Location

Default Selection: Specific Custom Value > Products

As you can see, I have defined the variable that we used in the column formula for the case statements. By defining the two custom values (Products and Location), I can select between the two case statements at will and, as far as the user is concerned, it will be just like using a column prompt within a report. Also, the default value needs to be specified for reasons I will explain when creating the next prompt for the values.

Now I am going to create a column prompt for the values.

First, I need to create a new column prompt and put in the same column formula that I used earlier in my report. I can do that by clicking on the fx icon and pasting in the column formula.

Essentially what I am doing is making the column values populate for the prompt by using the same case statement definitions and they will also change depending on the value selected in the prompt we created previously.

For the choice list of values, I can simply leave the default of All Column Values. Since the values will dynamically change depending on the initial prompt value in the variable prompt, it will only show values corresponding to each selection.

I do not need to define a default selection because I set the default values in the two filters I created in my report.

Finally, all that is left to do is define the pv_select_value presentation variable I defined for the filter of the case statement column in the report. This is how I will get all of the values to be shown for each initial prompt selection and also how I will select individual values. The default values will be passed into the filter via the pv_select_value presentation variable initially which will prompt all the values to be shown and then individual selections will be passed in the same way.

Here are the results when placed on the dashboard.

So that concludes part 1 of Demystification of Variables. Please feel free to ask questions or leave me a comment! In part 2, I am going to talk about using Repository, System and Presentation Variables to make reports dynamic for any series of time. Until next time.

Categories: BI & Warehousing

New Performance Analytics Release

Wed, 2017-03-01 09:00

Poor performance is one of the most common complaints we hear upon arrival to customer sites, and often for good reason. When users have time to “make the rounds” before the system returns a result (if it returns a result at all), they have the right to be frustrated.

Unfortunately, tracking down the origins of performance issues can be extremely difficult, as the root cause could be one or more problems across several distinct systems. Without a comprehensive view of the complete technology stack, an administrator will effectively be guessing.

But we are all in the business of analytics. We should be taking an analytical approach to drive our decisions, and provide evidence to support them. Performance data should be collected, investigated, and then acted upon. But an analytical approach raises two distinct questions, “What data can we collect that will help us identify performance issues, and once found, what actions can we take”?

Enter Performance Analytics

Performance Analytics is an application that collects data from all layers of the stack in your environment, and makes it readily available for study and investigation. By collecting data from OBIEE Usage Tracking, operating system metrics, database sessions, and a myriad of other sources, our application provides administrators with an all-encompassing view of their system. It has never been easier to identify the causes of poor performance.

However, simply identifying a cause of poor performance isn’t all that’s required. In fact, it’s almost useless without knowing how to resolve the issue. Rittman Mead has expertise ranging from databases and data warehouses to front end browser applications, and everything in between. Our team will help you understand the problems discovered by Performance Analytics and show you how to fix them.

With the newest release of Performance Analytics, improving the performance of your BI ecosystem is easier than ever. No longer will your users sit around waiting for reports, no longer will you need to guess at configuration settings, and no longer will you be left with no leads to investigate when users desire a better experience.

New Features Simple and Lightweight Installation

Performance Analytics is now deployed using Docker, requiring absolutely no software installation on any machine other than the monitoring server.

Upgraded and enhanced software.

The open source technologies included in Performance Analytics have come a long way since its inception, and this release provides their latest and greatest versions, as well as our own improvements that increase performance and reliability.

Alerts and Notifications.

Receive notifications based on failed OBIEE system components, a violated performance threshold, or any other identifiable condition found in your data. Be made aware of adverse performance conditions the instant they occur.
And many more...

Learn More

We will be releasing more exciting news about Performance Analytics over the following weeks, so be sure to subscribe to our blog and newsletter to stay up to date!

Visit our Performance Analytics page here.
To request a demo or ask any questions about Performance Analytics, call or email us:

P (US): (888) 631-1410
E (US): info@rittmanmead.com
P (UK): +44 (0) 1273 911 268
E (UK): info@rittmanmead.com
Categories: BI & Warehousing

OBIEE Is It Still Supported?

Tue, 2017-02-28 07:11
 Is It Still Supported?

Last week, during a quiet working morning, my colleague Robin Moffatt sent me a link that changed my working focus for few hours.

The link was to "OBIEE 11g: Required and Recommended Bundle Patches and Patch Sets (Doc ID 1488475.1)", a document containing the latest bundle patch (patch set) available for every major OBIEE release.

He spotted that for the first time, the new bundle patch released on 17 Jan 2017 was for OBIEE only, with the latest patch for being the one previously released on 18 Oct 2016.
Moreover there is a clear statement in the document saying that is necessary to be at least on the patch-set or upgrade to (or 12c) in order to receive continued Error Correction Support.

Questions became then naturally:

  • Is OBIEE still fully supported?
  • What is Error Correction Support?
  • How does Oracle Support Work?
  • How can I verify if my software is supported and when the support ends?

 Is It Still Supported?

With those questions in mind, I started my wonderful journey in understanding the Oracle support, if you want be my guests then sit down, take a coffee (espresso please) and enjoy the trip.

Understanding OBIEE Oracle Support

Disclaimer: All the information and date mentioned in the post reflect the situation as of the post publication date, Oracle may change them in future. Oracle support documents mentioned in this post should be directly taken as reference. The documents linked below are mostly regarding OBIEE, for other products please visit Oracle's support website.

Understanding Oracle's support requires patience and the right set of documents, with the main two being:

  • Lifetime Support Policy: provides the announcements, dates and exceptions for the end of Premier, Extended and Sustaining Support. The detail of the Lifetime Support Policy is by product e.g OBIEE 11.1.1.*.
  • Error Correction Support Policy: provides details regarding the end of bug correction support per single release (e.g. For OBIEE 12c the Error Correction Support document is merged in the Fusion Middleware document.

The combined information contained in these two sources should be used to determine the status of a specific Oracle's software version.

Oracle's Support Phases

The two documents listed above provide a good starting point, but what do "Premier", "Extended", "Sustaining" and "Error Correction" Support mean?
Doc ID 1664916.1 and Lifetime Support Policy page provide a detailed explanation.

The "Lifetime Support Policy" defines the three main phases for each product (e.g. OBI 11g):

  • Premier Support: is in the first 5 years (however this timeframe could be changed) from General Availability of a product, during this period bundle patches are released providing error correction and new features with certification for most Oracle or third party support.
  • Extended Support: almost all the benefits of Premier Support can be extended by another three years after its end with an additional fee.
  • Sustaining Support: after the end of Premier Support, this option (requiring an additional fee) provides a limited support on existing product bugs but no new developments (e.g. a certified mechanism of connecting OBIEE 10g to Impala); can be extended for an unlimited amount of time.

The first two phases are also considered Error Correction Support since are the only two where new bug/security related patches are provided. After the end of Error Correction Phase no more patches will be released for a certain product.

Applying the Lifetime Support Policy to OBIEE 11.1.1.* based on Oracle's documentation the following is obtained.

 Is It Still Supported?

As mentioned before, this is a guideline for the whole OBIEE 11.1.1.* set of products, so how can I determine the end of support for a certain release?

Usually the answer to this question is defined by

  • The end of the Premier/Extended Support if the release you're checking is the latest available of a product
  • The end of the release's Grace period: if a newer release is available. The grace period, mentioned in Doc ID 944866.1 is the period of time (usually one year) following the release of a patch set in which Oracle provides fixes for both the new and previous patch set. The grace period is calculated based on the availability of the following release. E.g. since OBIEE was released in May 2015, the grace period for should be ended in May 2016. There are however exceptions where the grace period and related error correction support is extented, like for the which now is extended until December 2018 as per OBIEE Error Correction Support document.

The following graph shows the error correction support phases for OBIEE

 Is It Still Supported?

What Does This Mean For OBIEE

As stated above OBIEE is still in its grace period meaning that bundle patches and security fixes will still be released when needed until Dec-2018.


Is OBIEE Still The Right Choice?

If all you need is support for your existing and working environment having the latest bundle patch, then yes, you're fine and you'll be until December 2018.
But we all know how long an upgrade takes from planning to the actual execution and testing. Probably you don't want to wait until the end of next year having to rush for the upgrade because your system is getting out of support.

 Is It Still Supported?

We at Rittman Mead are in favour of a smart upgrade planning and with our fixed cost upgrade process can take care of it while you keep focusing on what matters in your company.


As suggested by Oracle's document there are two options: migrate to the latest 11g release ( or 12c. Both are valid options - has another few years of premier support, with end of error correction support targeted for Dec-2021 as stated in Oracle's Doc 1664916.1.

If you are opting for the migration, choosing the target release should be based on the features you aim to get and the version stability plan: some options like Visual Analyzer, Advanced Analytics, and Data Mashups are only available on 12c which is the version currently developed, while 11g is on its way out. These points are discussed in more detail below.

Upgrading to OBIEE

The migration to might seem easier and it could be done in-place. However it still requires database schema changes, moreover an in-place upgrade has associated risk of having unavailable live environments until all issues are fixed; risk that can be mitigated with an out-of-place upgrade.
You can find a list of new features for end users and system admins in our blog.

Finally, if you migrate to, the end of its error correction support in Dec-2021 forces the next milestone in your upgrade path.

Upgrading to OBIEE 12c

On the other hand the migration to OBIEE 12c (with the being the latest available as of now) is completely an out-of-place process. There is a requirement on the source 11g version meaning that if you are currently on a pre- version, you'll first need to perform a migration to or and then to 12c.

The upgrade process is smooth, with a dedicated upgrade tool taking care of moving the catalog, RPD and part of the security and a Baseline Validation Tool helping in automating the testing.
OBIEE 12c includes several new features with new connectors to big data world that will increase over time and the new options like Visual Analyzer and Data Mashup providing great functionalities directly in the hands of end users.

Check out our 12c migration case study presented at UKOUG!

Rittman Mead Can Help!

Unsure about the support status of your current system and the risks related? With our experience we can help you plan a migration roadmap!
A migration is not a quick process, it needs to be understood and planned correctly, the following are just some examples of our pre-migration activities with clients.

  • Worried about migrating?

    We can advice about best installation/migration processes and the benefits of such upgrades. We can also help you implementing the processes during the migration phase.

  • Unsure about server capacity?

    We can help estimating server sizing based on the number of active users and the enabled features. Our Performance Analytics Service can help understanding and solving bottlenecks.

  • Unsure about impact on BAU activities?

    Our migration approach reduces the Code Freeze time to a minimum; the time to swap between versions in the production environment can also be minimised.

  • Uncertain about timings?

    We can help you estimating effort and plans for a successful upgrade based on our experience.

Once the migration in planned, we can implement it for you or assist and support your team! Let us help you!

Categories: BI & Warehousing

Getting Apex 4.2 Up and Running on Oracle 12c Using PL/SQL Gateway

Mon, 2017-02-27 10:30

Recently I was asked to create a simple Master-Detail Apex form on top of some metadata tables stored within Oracle 12c. Apex version 4.2 is already pre-installed in 12c but it needs configuring to be able to use. This is where I hit some trouble which stems from a conflict of information across a number of different sources.

I was following the Oracle certified documentation on 4.2 and how to get it running on 12c specifically but it wouldn't work. Time after time I kept getting 'Error 404' messages whenever I tried accessing Apex through the browser. I gave up on the Oracle documentation and decided to scout out some blogs but they were virtually copy and paste and resulted in the same error messages. I asked colleagues at work and also joined the Apex slack domain but nobody quite understood why it wasn't working.

It seemed like a lost cause, until I decided to check the later versions of the Oracle documentation and came across an extra piece of information from an Apex 5.0 document:

Procedures CREATE_ACL, ASSIGN_ACL, ADD_PRIVILEGE and CHECK_PRIVILEGE in DBMS_NETWORK_ACL_ADMIN are deprecated in Oracle Database 12c. Oracle recommends using APPEND_HOST_ACE instead.

Given that 12c comes with Apex 4.2, I was surprised to find it not mentioned anywhere else but in the Apex 5.0 documentation.

So the intention for this blog entry, is to bring together all the information that is needed to get Apex 4.2 using PL/SQL Gateway on Oracle 12c so you're able to take a quick and easy look at what it can offer.

Steps to configure the PL/SQL Gateway on Oracle 12c

  1. Change your working directory where Apex is installed:

    cd $ORACLE_HOME/apex  
  2. Connect to SQL*Plus with a user with SYS privileges:

    sqlplus / as sysdba  
  3. Switch containers to the pluggable database container:

  4. Run the apex_epg_config.sql script along with the $ORACLE_HOME:

    @apex_epg_config.sql /u01/app/oracle/product/12.1.0/dbhome_1/

    Log back into SQL*Plus from the Apex directory if you get logged out

  5. Then unlock the ANONYMOUS user within the Root Container:

  6. Switch to the pluggable database container and run apxconf.sql:


    When prompted change the ADMIN password and assign a port for APEX to connect to - Default is 8080.

  7. Verify the Oracle XDB Protocol Server Port that was set after running the apxconf.sql script:


    If it returns with [0] then enable the XDB Server

  8. Enable the Network Services in Oracle Database 12c - This will be disabled by default. This following will give access to all hosts:

    (host => '*', ace => xs$ace_type(privilege_list => xs$name_list('connect'), principal_name => 'apex_040200', 
    principal_type => xs_acl.ptype_db)); 

    NOTE: If you require stricter security add the host name after the host parameter e.g. host => 'localhost'

  9. Open your browser and connect to Apex:


This blog has given you the guidelines for getting Apex 4.2, that comes pre-installed on Oracle 12c, up and running.

Note that 4.2 is now quite dated and it is recommended to upgrade to Apex 5. This ensures continued support from Oracle, and access to newer and more efficient features.

Categories: BI & Warehousing

Data Preparation for "Analyse Elections with Oracle Data Visualisation Desktop"

Fri, 2017-02-24 09:13

In the previous post I wrote about some interesting insights we can get from the elections data. And this post concentrates on the data preparation for it. As the first one, it doesn't work as a guide and aimed to show some data manipulation things end-user can do without IT department and coding. As data preparation is not absolutely needed for the understanding of the visual part but still a piece of the story it was extracted as a separate post.

The Data

From the logical point of view I have two datasets.

  1. Voting results by election commissions: number of votes for every political party and a lot of technical measures like number of registered voters, number of good and damaged voting ballots and so on.
  2. Turnout figures at given times throughout the day.

From the physical point of view, both datasets are big files (about 500 Mb combined) each line of which is a single record. These records represent information at various levels of aggregation. I will use this fact to show some of Data flows features (great introduction from Francesco Tisiot).

Generally, each record is a JSON which looks like this:

  "1 | Number of voters included in the voters list at the time of the end of voting":"102",
  "2 | The number of ballots received by the precinct election commission":"102",
  "3 | The number of ballots issued to voters who voted early":"0",
  "19 | 1 Russian political party 'Rodina'":"0",
  "20 | 2 Political party 'COMMUNISTS COMMUNIST PARTY OF RUSSIA'":"3",
  "21 | 3 Political party 'Russian Party of Pensioners for Justice'":"0",
   "Nizhny Novgorod Region",
   "129 Nizhny Novgorod Region - Nizhny Novgorod",
   "28 Krasnooktyabrskaya",
   "PEC #1148"

You can find a full sample of the JSON here.

Data part is a set of 32 measures and Loc is an array of one to four elements. The more detailed the record, the more elements in Loc array.

The whole votes dataset is slightly smaller than 100'000 of records. As I said, records have various levels of aggregation, and this sample shows the most detailed data (its Loc has four elements). Nizhny Novgorod Region is obviously the region, 129 Nizhny Novgorod Oblast - Nizhny Novgorod is a concatenation of district electoral commission number (129), region name (Nizhny Novgorod Region) and district electoral commission name (Nizhny Novgorod). 28 Krasnooktyabrskaya is number and name of a territorial election commission, and PEC #1148 is precinct electoral commission. For the most aggregated records Loc looks like:

      "DEC #129"

This is an aggregated data for a district election commission #129.

Unfortunately, DVD doesn't work with JSON format directly. As one of the possibilities, we can use Apache Drill (for more information see previous posts by Francesco Tisiot and Robin Moffatt). Apache Drill is a supported option for Oracle DV, but it isn't an end-user tool and the story is not about building a complete and reusable system. It is about the storytelling. So I did a one-time manual conversion using Excel magic and got a plain table.

The second part of the data is turnout. In JSON form it's basically the same: one record per line, Data part with four values and Loc part based on the same principles. Data is significantly smaller and has only four values - turnout at four fixed moments. But here I need it in a more dimesional form. I want time as a dimension rather than pivoted.

In my opinion, pivot/unpivot is one more highly desired feature DVD needs. We can live without it in tools like OBIEE because its models aren't built on rapidly changing user-generated data structures, but for desktop data exploration tools like DVD powerful data transformation capabilities are crucial.

But while it isn't implemented, I made unpivot in Excel and got this.

It wasn't obvious that I need in in this particular form in the beginning of data preparation stage, but the good news is both visualisation and data manipulation live in one product, and in reality I was able to build charts, understand that I need more data/attributes/etc, add them and go back to analysis. Very smooth and clean.

I Thought We Weren’t Supposed to Use Excel?

While Data flows DVD feature is powerful and very useful there are some data manipulation tasks that Excel can do just easier or better than DVD. I hope that Oracle will improve Data flows over time but right now it’s not perfect.

I used Excel for the following:

  1. Splitting of long strings into fields. I could do it using locate+substring+left+right functions in DVD. But that would produce unreadable code. Excel Text to columns tool is a weapon of choice so far.
  2. UNION. When I need to combine two datasets into one, Excel’s copy and paste are beyond competition. Adding set operations (not only union) will greatly improve Oracle’s tool.
  3. A few technical tasks like translating the data. Initially, the data was not in English and I had to translate it. I could use both Excel and DVD for this task but chose Excel in order to simplify the post. But if I did it in DVD, it would be just a few more joins. Can't say what is simpler, so no clear winner here.

Note: I'm talking about static data sets. In case of a changing data that may be not true.

Data Transformations in DVD

While I was able to (had to) make some of the data preparation in Excel, it's better and easier to do some tasks in DVD using relatively new Data flows feature. I will start with building a clean dictionary of electoral commissions.

When I finished my Excel part, I got this:

It isn't so bad, but I can make it better in a few minutes. I want to rename Location1 into Subject, split Location2 into DEC #129 and Nizhny Novgorod, remove leading digits from Location3 and rename it to TEC name, and also rename Location4 into PEC #.

Here is the data flow which does it. It may not look like the simplest thing in the world, but I spent about the same time building it as you spent reading what it does.

The second thing I want to do is to split my wide file with 32 measures into two. One for party results and another one for other let's say "technical" measures like number of voters, ballots and so on. It's not absolutely necessary but will please me and anyone else who will see it. Both data flows for this task use Commissions dictionary I made earlier and the same source file. I filter the source to get only the most detailed data, join with commissions, select only interesting columns and save the result. Simple and straightforward. The only difference is sets of selected columns.

I could start building visualisations with this data, but I'd prefer to calculate some relative measures. Absolute numbers won't help me because every commission has its own number of voters. We can't compare them directly but relative measures will do the trick.
Data flow for these calculations is the simplest of all.
The only not immediately obvious part is why I placed two consequential Add columns steps. The first one calculates Total votes column as a sum of votes for all parties and the second one uses this column for calculation of a percentage.

And it is really cool how we can use results of different flows. I took Protocols, made Commisions, joined Protocols and Commisions and got Votes data and later made calculations based on Votes data. That was very cool.

And here I want to show how the autocomplete feature in formulas editor works (because I really like it). I hope to see it on OBIEE one day.

I have one more data flow for the turnout data, but it simply joins source file with the dictionary of commissions.


I believe Oracle DVD is a very powerful tool for a data analyst. While many users prefer to stick with Excel exclusively, DVD can significantly increase their data preparation and transformation capabilities. Even now at the very beginning of its journey DVD allows making many data manipulation operations fast and easy. It obviously worth give it a try and include into a data analysis process. Look how the data from this post may be used for analysis and making interesting conclusions: Analyse Elections with Oracle Data Visualisation Desktop.

Categories: BI & Warehousing

Analysing Elections Data with Oracle Data Visualisation Desktop

Fri, 2017-02-24 04:38

Disclaimer #1 This post is not about politics. Its dataset is about politics, but that's a coincidence. It could be about immunisation or transport or anything else. If you are strictly against any politics, here is a link to watch cute kittens.


Let's pretend that I'm an analyst and got a supposedly interesting data set. Now I want to understand if the data is actually interesting or it's a total rubbish. Having been in IT for some time I can use tools and technologies which typical end-user can’t access. But this time I pretend I’m a usual analyst which has data and desktop tools. And my task is to do a research and tell if there are anomalies in the data or everything looks like it supposed to look like.
The main tool for this work is obviously Oracle Data Visualisation Desktop (DVD). And, as a supplementary tool, I use Excel. This post is not a guide for any particular DVD feature. It won’t give a step by step instruction or anything like that. The main idea is to show how we can use Oracle Data Visualisation for an analysis of a real dataset. Not simply show that we can build bar charts, and pie charts and other fancy whatever charts, but show how we can get insights from the data.

The Data

I should say a few words about the data. It is an official result of the Russian State Duma (parliament) elections in 2016. Half of the Duma was elected by party lists and for this post I took that data. I should confess that I cheated a little and decided not spend my time downloading and parsing the data piece by piece from the official site, and took a prepared set.

From a bird's-eye view I have the following data:

  1. Voting results by election commissions: number of votes for every political party and a lot of technical measures like number of registered voters, number of good and damaged voting ballots and so on.
  2. Turnout figures at given times throughout the day.

From a more technical point of view, the data was stored in two big files with multiple JSON in each. As the data preparation part is big enough, it was extracted to another post. This one concentrates purely on visualisation and the second one is about data flows and comparison to Excel.

Analysing the Data

I did some cleaning, refining and enriching of the data and it's time to use it. I started with a standard Stacked bar chart. It shows percentages of parties by regions and in addition width of bars shows Total votes. The chart is sorted by ascending Total votes.

What can we say here?

Before I start talking I need a lawyer and a disclaimer #2:

Disclaimer #2 Some of the results may be interpreted in different ways. Some of them may be not so pleasant for some people. But I'm not a court and this post is only a data visualisation exercise. Therefore I'm not accusing anyone of committing any crimes. I will make some conclusions because of rules of the genre, but they should be treated as hypotheses only.

That’s not a proven charge (see disclaimer #2) but for me these regions look a bit suspicious. Their results are very uncommon. United Russia ruling party (orange bars) got an extremely high result in these few regions. This may be a sign of some kind of interfere with an election process there. But of course, other explanations (including a measure incorrectly selected for sorting) exist.

Just for reference so we don’t forget the names: Tatarstan, Kemerovo, Dagestan, Chechnya and Mordovia. There are a few more regions with similar results but their number of voters is lower so I don’t show them here.

At this point I'm starting to suspect something. But I need more visuals to support my position, and my next hypothesis is that in these regions ballots were somehow added to voting boxes (or protocols were changed which is basically the same). From a data visualisation point of view that will mean that these regions will have higher turnout (because of added ballots) along with higher United Russia result.

To check this I need one more measure - Turnout, %. It shows how many of registered voters actually voted. I can create this field without leaving visualisation mode. Cool.

Note. This formula may be not absolutely correct but it works well for demonstration purposes.

In order to visualise this hypothesis, I built a Scatter chart. Its horizontal axis is Turnout,% and vertical one United Russia, %. And I added a trend line to make things more obvious. Colour density shows Total votes.

I think my hypothesis just got a strong support. As usual it is not an absolutely impossible situation. But it's hard to explain why the more people come to voting stations the higher one party result is. I'd expect either high result not depending on the turnout (more or less like Uniform distribution) or at least a few exceptions with high turnout and low result.

I consider this result strange because in real life I'd expect that higher turnout should mean more opposition voters (a passive group indeed) coming to voting stations. But that's only my opinion. And highly arguable I should agree. What I really want to show here is that these charts highlight an oddity that should be investigated and may or may not have a rational explanation.

And who are our heroes? Let’s zoom in on the chart …and we see the same regions.

But maybe other parties can show the same result? We can build the same Scatter charts for every party or we can visualise all at once with a usual Line chart. Here I’ve plotted the percent of vote won by each party (Y-axis) against the overall turnout % (X-axis).

United Russia is the only party that increases with turnout.

So far all my charts were about relative measures, it's time to check some absolute values. Here is a Bar chart which shows a number of precinct commissions by results. I'd expect to see something close to normal distribution - a bell-shaped chart with the maximum around 54% (average turnout). Now, look at the real chart (bin size=1.0%). Its maximum is at 36-37%.

Normal distribution

In probability theory, the normal (or Gaussian) distribution is a very common continuous probability distribution. Normal distributions are important in statistics and are often used in the natural and social sciences to represent real-valued random variables whose distributions are not known.

Strictly speaking all numbers I'm showing here are discrete and I should say Binomial distribution rather than Normal but right now for my purposes the diffence is not that big.

I'm definitely not Carl Gauss (and even not one of his best students) and you may ignore my opinion, but I expected something more like this:

And I don't have the slightest idea how it is possible that the most "popular" turnout is 100%.

What if we look at the same chart with more details? The previous one was grouped by 1% bins, and this one has 0.1% bins. And I had to add turnout not equal to 100% filter. Even with smaller bin size, the last one is almost the same ~3K commissions. This bar is much bigger than the others and the chart doesn't show anything in that case.
What can we see here? Well, people aren’t really good in generating random numbers. It's perfectly OK to have some runout on the chart. But hey, it's not normal to have them mostly at round values. That looks like someone was trying to fit the result to some plan.

Here is my favourite part of the chart. I marked 1% intervals, and you can see that round turnout value is always more probable than its closest non-round neighbours. No exceptions. A possible explanation is that the number of commissions with results that high is relatively low and even the slightest manipulation is clearly visible.

But wait. What about that 64.3 percent? It’s not round, but it is a big runaway. Let’s take a closer look at this value and check if there is anything interesting or that is a normal situation. Here is a few interesting visualisation for it.

The first one is Tree Diagram. It shows all existing combinations of district and precinct commissions by regions for the filtered data (turnout=64.3). And in order to demonstrate how it works for this case I made an animation. Most of the regions have a few commissions with 64.3% turnout. But Saratov region beats them all.

This visualisation has a serious flaw. End-user has to scroll it (I mean for this set of data) and can miss the point. Another visualisation can improve the situation. Network diagram doesn't need scrolling.

Looks good and shows exactly the same. But for this chart we must ensure that every data point is unique what is not true in my case. Different precinct commissions have the same numbers and I had to create a unique field first (DEC #||PEC #). It's easy to forget and get unpredictable or even misleading results.

Or if you prefer more traditional charts, here is Sunburst for you. Its sectors size shows Total votes and the colour is PEC count. It gives a good representation of how uncommon Saratov's result is.

And the last picture for the same topic boring never-old classic Bar chart.

Considering all these charts I'd say that almost exclusive concentration of commissions with 63.4% turnout in Saratov doesn't look normal for me. It's pretty weird that sibling commissions show exactly the same figures.

A few more diagrams which could work well are Sankey and Parallel coordinates, unfortunately, they are less informative because of the high number of precinct commissions. But if the number was lower I'd consider them too.

All previous charts are based on voting data. But I have one more dataset - official turnout. Let's check if we can find anything interesting there. And unfortunately significant part of commissions doesn't have official data, and sometimes I may use formulas that are not exactly the same as official ones, so numbers may differ slightly from what I got from the protocols data.

The first chart shows the number of commissions (vertical axis) by the official turnout (horizontal axis). Colour shows the moment of time. Strictly saying I shouldn't have used continuous linear charts for discrete values, but coloured overlapped bars don't give that clear picture.

Except for the 100% tail, everything is more or less natural. Graph shape looks more like Gamma distribution rather than Normal but I didn't test it.

What picture do I have for various regions?
Moscow city is known for a relatively high number of poll watchers and we may expect more clean data there. Ignoring the long tail, these look normal (or binomial if you want to be precise).

Saratov region. The one with 64.3% turnout. Look at these peaks. Do they look natural to you?
Do you remember Tatarstan (was the hero in the beginning of this story)? Here it is. I don't know how can anyone explain how it is possible (without manual results adjusting I mean).


This post shows how we can use Oracle DVD for visualisation of a real data set. And I hope I was able to convince you that this tool can be useful and can give you really interesting insights. Of course, visualisation alone doesn't answer all questions. And this time actually it was less about answers but more about questions. It helps to ask right questions.

More reading on the topic: 1, 2 (Russian language). If you can read Russian, here you will find more visualisations, discussions and interesting information. And this article is about elections in 2011. Its undisputable advantage is that it is in English.

Categories: BI & Warehousing

OBIEE 12c Time Dimension: Logical Sequence Number

Thu, 2017-02-23 02:59
 Logical Sequence Number

The key component of any successful OBIEE implementation is the metadata model known as the repository (RPD). Doing it right is sometimes considered "black magic", and small mistakes in the RPD can impact all the exposed Subject Areas, resulting in poor performances or, even worse, wrong results.

Working an RPD requires dedicated knowledge of the tool and we are sharing it in our RPD modelling training both for OBIEE 11g and OBIEE 12c.

If you ever worked on RPD modelling, one of the settings you surely encountered is the Time dimension. This blog post written back in 2007 explains the process of setting up a time dimension for OBIEE 10g. The process didn't have any major modifications until recently when, in 12.2.1, Logical Sequence Numbers were introduced. As per Oracle's documentation this new feature "optimizes time series functions and in some cases improves query time", and in this post we'll see how to configure it and its impact on the time-series calculations. The examples shown below are based on Oracle Sampleapp v607, a really good source of modelling and front-end examples.

Usual Time-series Query Behaviour

Time-series functions like Ago,ToDate, and more recently PeriodRolling, allow end users to compare results coming from different moments just by specifying the level in the time dimension hierarchy and the number of periods to look backwards or forwards. As example if you needed to compare current month sales revenue with the previous month figure you'll end up writing a formula like

 AGO("F0 Sales Base Measures"."1- Revenue","H0 Time"."Month", 1)


  • AGO: is the Time-series function being called
  • "F0 Sales Base Measures"."1- Revenue": is the metric
  • "H0 Time"."Month": is the time hierarchy level
  • 1: is the amount of periods (months in our case) to look back in history

Once the time-series metric has been created, it can be used in an analysis like the following to compare Revenue of 2015-04 with the one of the previous month.

 Logical Sequence Number

The analysis generates the following Logical SQL which basically lists the columns retrieved and the filters applied.

   0 s_0,
   "A - Sample Sales"."Time"."T02 Per Name Month" s_1,
   "A - Sample Sales"."Base Facts"."1- Revenue" s_2,
   "A - Sample Sales"."Time Series"."127  Mth Ago Rev  (Fix Time Lvl)" s_3
FROM "A - Sample Sales"  
("Time"."T02 Per Name Month" = '2015 / 04')

The translation to source SQL hugely depends on the data model created in the RPD and on the data source type. In our example an Oracle SQL gets generated containing the following steps:

  • Sequence Generation: a RANK function is used to created a dense sequence based on Per_Name_Month, the chronological key defined in the time-hierarchy for the month level. Mth_Key is also part of the query since it's used in the join between dimension and fact table.
OBICOMMON0 AS (select DENSE_RANK() OVER ( ORDER BY T653.Per_Name_Month) as c1,  
     T653.Mth_Key as c2,
     T653.Per_Name_Month as c3
     BISAMPLE.SAMP_TIME_MTH_D T653 /* D02 Time Month Grain */ ),
  • Period Shifting: The sequence generated above is now shifted by the number of periods defined in the column formula (D1.c1 + 1), in our example 1.
SAWITH0 AS (select D1.c1 + 1 as c1,  
     D1.c2 as c2,
     D1.c3 as c3
     OBICOMMON0 D1),
  • Ago Period Query: Using the period shifted query the historical record (or set of records) is retrieved.
SAWITH1 AS (select distinct D1.c1 as c1,  
     D1.c3 as c2
     OBICOMMON0 D1),
SAWITH2 AS (select sum(T418.Revenue) as c1,  
     D3.c2 as c2
     BISAMPLE.SAMP_REVENUE_F T418 /* F10 Billed Rev */ ,
     SAWITH0 D4,
     SAWITH1 D3
where  ( T418.Bill_Mth_Key = D4.c2 and D3.c1 = D4.c1 and D3.c2 = '2015 / 04' )  
group by D3.c2, D4.c3),  

The period shifted query usage is explained visually by the image below

 Logical Sequence Number

  • Selected Period Query: the query for the selected period, in our case 2015-04, is executed using standard time dimension
SAWITH3 AS (select sum(T418.Revenue) as c1,  
     T653.Per_Name_Month as c2
     BISAMPLE.SAMP_TIME_MTH_D T653 /* D02 Time Month Grain */ ,
     BISAMPLE.SAMP_REVENUE_F T418 /* F10 Billed Rev */ 
where  ( T418.Bill_Mth_Key = T653.Mth_Key and T653.Per_Name_Month = '2015 / 04' )  
group by T653.Per_Name_Month)
  • Resultsets joining: Results coming from Ago Period and Selected Period queries are then joined with an outer join.
select D1.c1 as c1, D1.c2 as c2, D1.c3 as c3, D1.c4 as c4 from ( select D1.c1 as c1,  
     D1.c2 as c2,
     D1.c3 as c3,
     D1.c4 as c4
     (select 0 as c1,
               coalesce( D1.c2, D2.c2) as c2,
               D2.c1 as c3,
               D1.c1 as c4,
               ROW_NUMBER() OVER (PARTITION BY coalesce( D1.c2, D2.c2) ORDER BY coalesce( D1.c2, D2.c2) ASC) as c5
               SAWITH2 D1 full outer join SAWITH3 D2 On D1.c2 = D2.c2
     ) D1
where  ( D1.c5 = 1 )  
order by c2 ) D1 where rownum <= 5000001  

As you can see, it's a rather complex set of instructions that involves several steps including analytical functions like the DENSE_RANK() in the sequence generation. In our case we have been lucky that the source system provided the DENSE_RANK() function; with other sources the sequence generation needs to be calculated directly by the BI Server (OBIS) with a possible degradation of performances.

What is Logical Sequence Number?

The Sequence Generation step mentioned above can sometimes be the bottleneck of the whole query especially when the time dimension is complex or huge in volume since DENSE_RANK() is a costly analytical function.

All that OBIEE's Time-series need to work is a pure sequence, or in Oracle's words:

Sequence numbers are enumerations of time dimensional members at a certain level. The enumeration must be dense (no gaps) and must correspond to a real time order. For example, months in a year can be enumerated from 1 to 12.

Then what if we can find a way of pre-calculating them and storing in the table or calculate them on the fly but using functions less expensive than a DENSE_RANK()?

This is the idea behind the Logical Sequence Number (LSN): a way of avoiding the BI Server (OBIS) needing to execute the DENSE_RANK(), by passing either a pre-calculated sequence column in the source table or a cheaper-cost function to calculate it on the fly based on existing columns.

The formula behind a Logical Sequence Number must resolve in a number (no varchar, or dates are allowed) and can either be:

  • Absolute: when the sequence is consistent and doesn't need any external reference, e.g. calendar years are a self-defined sequence
  • Relative: when the sequence is relative to a parent level, e.g. creating a sequence for months between 1 and 12 would need the calendar year as parent level to provide correct cross-years references

As the word already says "Sequence Numbers" must be sequential, no gaps can be included. Year and month number are good examples of this.

But what if we wanted to provide an absolute sequence number at month level?
One could think about using a month key in the YYYYMM format but this is not a sequence: 201701 <> 201612+1. The best way of generating the sequence number would be to add it as a column in the database table.

Will take as example the SAMP_TIME_MTH_D table that can be found in Sampleapp v607 containing MONTH_KEY column in the YYYYMM format.
 Logical Sequence Number

I'll add to SAMP_TIME_MTH_D a column MONTH_ABS_SEQ containing the absolute sequence number. The formula behind the column is exactly what Oracle was doing under the covers using a DENSE_RANK.


And the end result as expected being

 Logical Sequence Number

How are Logical Sequence Number Used?

Previously I described the two types of Logical Sequence Numbers: Absolute and Relative, each one has it use case:

  • PERIODROLLING: This function uses absolute LSN to calculate the starting Sequence Number based on the selected one: e.g. looking at the previous image a 6 month rolling starting from 2009-02 will include data from sequences in the range [9-14].
  • TO_DATE: uses relative LSN, e.g A YearToDate Measure shown by month will use the month relative LSN to calculate the previous months in the same year.
  • AGO: Ago function uses both absolute and relative: Absolute numbers are used if the grain of the query and the one of the AGO are at the same level e.g. Yearly analysis of Actual vs Previous Year. However when the grain of the shift in the ago is higher than the grain of the query Relative LSN are used, e.g. Monthly analysis of Actual vs Previous Year.
How to Configure Logical Sequence Numbers

Logical Sequence Number is a new feature and as such it requires additional settings in the time dimension hierarchy to be working. In our case we'll add two sequences, an absolute at calendar year level and a relative at calendar month level.
We'll add the sequences directly as formulas in the repository however those formulas should be pushed down as columns in the database table if optimal performances are sought.

In order to set the Logical Sequence Numbers we need to open the RPD (the SampleApp one in this test), and select the Time dimension we want to change.
 Logical Sequence Number

After selecting any level apart from the Grand Total (top of the hierarchy) a new tab called "Sequence Numbers" should be visible. However if, like in our case, none of the columns at that level are integer or doubles, the sequence number selector is disabled.

 Logical Sequence Number

In order to enable the selector we first need to create a sequence column in our dimension and bring it at the correct level in the hierarchy. For the Year Level there is already an integer column named "T35 Cal Year" which can be used as sequence. We need simply to drag the column at Year level in the Time hierarchy and set it as Absolute Sequence.

 Logical Sequence Number

I can do the same with the Month level in the hierarchy and the "T32 Cal Month" column. Note that the column contains only the months enumeration from 1 till 12 so we need to set the sequence number as Relative to the level Year.

 Logical Sequence Number

Please note that both absolute and relative LSN can be (and should be) entered since as discussed above each have a different use cases. In addition relative LSN should be set for all logical parents level in the hierarchy since they will be used only if the grain of the time shift matches the one of the parent level. For example a Monthly LSN based on Year logical level will only be used in AGO functions having a year shift and not in case of Quarterly shift.

For an optimal usage every level of the time hierarchy should have one absolute and a relative LSN for each of the parents level in the hierarchy.

Impact on SQL

It's time now to review the SQL generated by our analysis and check the differences with the old-school time-series query.

When creating an analysis at year level like the following
 Logical Sequence Number

As Expected the BI Server (OBIS) uses the CAL_YEAR column as sequence instead of the DENSE_RANK() function over the PER_NAME_YEAR column.

OBICOMMON0 AS (select T795.CAL_YEAR as c1,  
     T795.QTR_KEY as c2,
     T795.PER_NAME_YEAR as c3
     BISAMPLE.SAMP_TIME_QTR_D T795 /* D03 Time Quarter Grain */ )

While when using the TO_DATE both the relative sequence is used, like in the following example where the measure "166 Revenue Year To Date" is defined by the formula:

TODATE("01 - Sample App"."F0 Sales Base Measures"."1- Revenue",  "01 - Sample App"."H0 Time"."Year" )  

 Logical Sequence Number

The following query gets generated, note the usage of Cal_Year and Cal_Month in the sequence generation query instead of the DENSE_RANK() function as per RPD settings mentioned above.

OBICOMMON0 AS (select T653.Cal_Year as c1,  
     T653.Cal_Month as c2,
     T653.Per_Name_Month as c3,
     T653.Per_Name_Year as c4
     BISAMPLE.SAMP_TIME_MTH_D T653 /* D02 Time Month Grain */ )
Are Logical Sequence Numbers Useful?

Most of the times the bottleneck when using Time Series is not in the sequence generation, since the time dimension cardinality is rarely big enough to produce a noticeable delay in the query time, but rather in the poorly managed calculations made on query time on top of massive fact tables.

Don't expect LSN to solve all your performance problems with Time Series. However, the usage of Logical Sequence Numbers provides to OBI a way of pre-cooking part of the calculation and so in theory should help performance. The small effort required to set them up centrally in the time hierarchy is covered by the benefits during query time, without having to touch any pre-defined time-series calculation.

If you do have performance problems with your OBIEE system, or would like to ensure that a system you’re building will be performant from the outset, please get in touch to find out more about our Performance Analytics service!
We also provide expert OBIEE training, implementations, QA and health checks - to find out more about how we can help you, please contact us!

Categories: BI & Warehousing

Introducing Advanced Analytics Training from Rittman Mead!

Tue, 2017-02-21 09:00


Rittman Mead is proud to release our new training course: Advanced Analytics with Oracle's R Technologies.

Oracle has made significant investments in the R language with Oracle R, ROracle and Oracle R Enterprise. Using these tools, data scientists and business intelligence practitioners can work together more efficiently and can transition between their roles more easily.

Rittman Mead has developed a three-day course that tackles R's notoriously steep learning curve. It builds on Oracle professionals' existing skills to accelerate growth into R programming and data science.

What does the course include?

Day one is all about the R programming language, starting with a history and explanation of Oracle's R technologies. Hands-on coding begins right away, with practical labs comparing R's data types and data structures with those found in the Oracle Database. The day wraps up with R programming concepts like conditions and functions, providing a strong grasp of the fundamentals on the very first day.

Day two focuses on the analysis pipeline, from data acquisition to data visualization. You will use SQL and R to tidy and transform raw data into a structured format and then use visualization and basic statistics to gain insights.

Day three looks at statistical modeling—discussing linear models and the predictive modeling pipeline. We present pros and cons of different types of models and get hands-on with preprocessing, model tuning, cross-validation and interpreting model results.

Our course is a mixture of theory and practical exercises—ensuring that you'll understand the tools and know when to apply them.

Who should attend?

The course is suitable for Oracle practitioners having some experience with SQL and business intelligence. No previous knowledge of R is assumed or necessary.

Sounds great, where do I sign up?

Please view our UK & Europe or US training schedule for public courses. For any questions or queries, including on-site training requests, please contact Daniel Delgado (US) or Sam Jeremiah (UK & Europe) for more details.

Categories: BI & Warehousing

OBIEE 12c - Regression Tester for Application Roles and Mappings

Mon, 2017-02-20 10:08

Allow me to introduce you to the Application Role Mapping validation script for OBIEE 12c. With this bit of code, we can utilize OBIEE's runcat command script to export our application role and permissions mapping information from multiple environments, and ensure their consistency. The picture below is an example of what you'll see as the final product. While it does not show you a side by side comparison of both environments, what it does do is display those objects in your lower environment that contain inconsistent application role or permissions mappings vs the target environment, and their respective configuration. The items in the path column then point you to that object in your lower environment catalog for further examination.


Our script only requires a few easy steps to generate a web-based view of any inconsistencies in application role and permissions mappings between your tested environments. Similar to the Baseline Validation Tool (BVT), this script goes one step further and executes a fine-grain examination and resulting view of application role and permissions mappings. The BVT only catches that something is different about the object, as indicated by the Name column value, and tells you where to look.


While I'll be sure to go into more detail later, the first picture above shows us that we have a number of application role and/or permissions mappings that exist in the lower environment, however, do not in the target OBIEE environment. Curious? Let's jump right into it.


The security audit is essentially a 3-step process, and was designed to be really accessible and simple to use. It breaks out like this:

  • Run the security_audit.py script in both OBIEE12c environments (being the lower environment that possesses the proper app role/permission mappings and the target environment).
  • security_audit.py will generate a .csv file in each environment.
  • Move the CSV from the target environment into the directory where you've got the CSV in the lower environment. After you've got the files moved, you'll run security_compare.py and simply pass in the locations of your lower environment CSV, and then that of the target environment. Lastly, a browser will pop up, giving you an immediate view of any inconsistencies that exist between your two OBIEE 12c instances.

Let's take a look at the process in a bit more detail!

Step by Step

Run security_audit.py in Lower Environment

First, let's make sure we've got a few libraries installed that we'll need to run our code. I recommend using pip for this. You'll need to install pandas and flask if you have not done so already. Simply navigate to the security_audit directory you got from GitHub and then from the command line run:

python3 security_audit.py

First, the script is smart enough to figure out which kind of OS it's in. Second, if your DOMAIN_HOME variable is set correctly (probably something like ORACLE_HOME/user_projects/domains/bi), the runcat command will run, exporting a CSV that contains the information we need to run the next script, which does the actual crunching between environments. In Windows, the default output location is C:, in Linux, the /tmp directory.

script run

The tool will prompt you to enter your DOMAIN_HOME, should you not have one set in your environment.

enter domain home

Run security_audit.py in Target Environment

Next, in our target environment, being the OBIEE 12c instance we want to make sure contains the same app role mappings as in our lower environment, run the security_audit.py script once again, following the same steps as outlined above. Rename the CSV to something different than the file that was written in your lower environment, as we're going to need to put both of these guys in the same directory.

rename mapping files

So on that note, after you've renamed your security_mappings CSVs, move them to the same directory on your lower environment. We're simply renaming them so we don't clobber one version or the other, and for easy reference as to which file belongs to its corresponding environment.

Run security_compare.py in Lower Environment

At this point you should have two security_mappings.csv files (although, the names should be a bit changed at this point) in your lower environment. We're going to need them as inputs for the next part of the regression testing process. This next bit of code simply ensures consistency between the two environments. Once run, it will instantly display any catalog objects in your lower environment that contain any disparities in the way their application roles or permissions are mapped when compared against those in your target environment. As of this blog, there really is no good way to do this with any native OBIEE tool, that is aside from running the runcat reports yourself and doing the crunching. So let's do it!

Open a command prompt in your environment, taking note of where your two CSV files are located. In this example, we'll be using a Windows command prompt, with our files located directly off of the C:\ drive.

In your command prompt, navigate to the location of your .py files. On that note make sure you do not separate these from the other files in the security_audit directory. Flask will need the other files to render the resulting webpage. Back to it. In my example below, I've navigated to the security_audit directory, and then run the following:


And then watch the magic happen! Make sure you have pop-ups enabled if you're having trouble rendering the page. The script will auto-magically figure out the host name for your environment and run it there.


  1. Owner - this is the owner of the catalog object that is showing the variance in permissions assignment under:

  2. Name - this is the name of the object, as it is displayed in the catalog.

  3. Path - this is the path to the object in the web catalog

  4. ACL - these are the detailed permissions mappings based on each entity assigned to the particular object. There is an accompanying key to the left hand side to help you out.

  5. Permissions - detail level permissions mapped to each object by owner entity.


Having any trouble running the script? Please get in touch! I would also love to hear some feedback on how it might have helped you perform one task or the other. Feel free to use the comments section below for this or to report any issues.

Categories: BI & Warehousing

Financial Reports - which tool to use? Part 2

Thu, 2017-02-16 12:00
Financials in BI Publisher Financial Reports - which tool to use? Part 2

I find it interesting that BI Publisher is mostly known for the creation of pixel perfect repeating forms (invoices, labels, checks, etc) and its ability to bursting them. To me, BI Publisher is the best kept secret for the most challenging reports known to mankind.

In my last blog - https://www.rittmanmead.com/blog/2017/02/financial-reports-which-tool-to-use-part-1/, I discussed some of the challenges of getting precisely formatted financial reports in OBIEE, as well as some pros and cons of using Essbase/HFR. Although we can work through difficult solutions and sometimes get the job done, BI Publisher is the tool that easily allows you to handle the strangest requirements out there!

If you have OBIEE, then you already have BI Publisher, so there is no need to purchase another tool. BI Publisher comes integrated with OBIEE, and they can both be used from the same interface. The transition between BI Publisher and OBIEE is often seamless to the user, so you don’t need to have concerns over training report consumers in another tool, or even transitioning to another url.

The BIP version that comes embedded with OBIEE 12c comes loaded with many more useful features like encryption and delivering documents to Oracle Document Cloud Service. Check out the detailed new features here: http://www.oracle.com/technetwork/middleware/bi-publisher/new-features-guide-for-12-2-1-1-3074557.pdf

In BI Publisher, you can leverage data from flat files, from different databases, from an Essbase cube, from the OBIEE RPD, from one (or multiple) OBIEE analyses, from web services and more:

Financial Reports - which tool to use? Part 2

So, if you already have very complex OBIEE analyses that you could not format properly, you can use these analyses, and all the logic in them, as sources for your perfectly formatted BI Publisher reports.

Every BI Publisher report consists of three main components:

  1. Data Model - data source that you will use across one or more reports

  2. Layout(s) - which will define how your data is presented

  3. Properties - which are related to how it generates, displays and more

You start a BI Publisher project by creating a data model that contains the different data sets that you would like to use on your report (or across multiple reports). These data sets, which reside inside of your data model, can be of the same source or can come from multiple sources and formats. If you regularly use OBIEE, you can think of a data model as the metadata for one or more reports. It is like a very small, but extremely flexible and powerful RPD.

Financial Reports - which tool to use? Part 2

Inside the data model you can connect your data sets using bind variables (which creates a hierarchical relationship between data sets), or you can leave them completely disconnected. You can also connect some of your data sets while leaving others disconnected.

The most impressive component of this tool is that it will allow you to do math from the results of disconnected data sets, without requiring ETL behind the scenes. This may be one of the requirements of a very complex financial report, and one that is very difficult to accomplish with most tools. The data model can extract and transform data within a data set, or extract only, so that it can later be transformed during your report template design!

For example, within a data set, you can create new columns to suit most requirements - they can be filtered, concatenated, or have mathematical functions applied to them, if they come from the same data source.

Financial Reports - which tool to use? Part 2

If they do not come from the same source, you can transform your data using middle tier systems, such as Microsoft Word during your template creation. You can perform math and other functions to any result that comes from any of your data sets using an RTF template, for example.

Financial Reports - which tool to use? Part 2

The example above was mentioned in Part 1 of this blog. It was created using BI Publisher and represents what I would call a "challenging report" to get done in OBIEE. The data model in this example consisted of several OBIEE analyses and their results were added/subtracted/multiplied as needed in each cell.

Financial Reports - which tool to use? Part 2

This second example was another easy transition into BI Publisher: the entire report contained 10 pages that were formatted entirely differently, one from the other. Totals from all pages needed to be added in some specific cells. Better yet, the user entered some measures at the prompt, and these measures needed to be accounted for in every sub-total and grand total. You may be asking: why prompt for a measure? Very good question indeed. In this case, there were very few measures coming from a disconnected system. They changed daily, and the preferred way for my client to deal with them was to enter them at the prompt.

So, do you always have to add apples to apples? Not necessarily! Adding apples and bananas may be meaningful to you.

Financial Reports - which tool to use? Part 2

And you can add what is meaningful with BI Publisher!

For example, here is a sample data model using sources from Excel, OBIEE and a database. As you see, two of these data sets have been joined, while the other two are disconnected:

Financial Reports - which tool to use? Part 2

A data model such as this one would allow you to issue simultaneous queries across these heterogeneous sources and combine their results in the report template. Meaning, you can add anything that you would like in a single cell. Even if it involves that measure coming from the prompt! Goes without saying, you should have the exact purpose and logic behind this machination.

Once your data model is complete: your data sets are in place, you have created the relationships within them (where applicable), you created custom columns, created your parameters and filters, then you generate some sample data (XML) and choose how you will create your actual report.

As I mentioned, there are additional functionalities that may be added when creating the report, depending on the format that you choose for your template:

Financial Reports - which tool to use? Part 2

One very simple option is to choose the online editor, which has a bit more limited formatting capability, but will allow you to interact with your results online.

In my experience, if I had to cross the bridge away from OBIEE and into BI Publisher, it is because I needed to do a lot of customization within my templates. For those customizations, I found that working with RTF templates gave me all the additional power that I could possibly be missing everywhere else. Even when my financial report had to be read by a machine, BI Publisher/RTF was able to handle it.

The power of the BI Publisher data model combined with the unlimited flexibility of the RTF templates was finally the answer to eliminate the worst excel monsters. With these two, you can recreate the most complex reports, and do it just ONCE - not every month. You can use your existing format - that you either love, or are forced to use for some reason - and reuse it within the RTF. Inside of each RTF cell, you define (once!) what that cell is supposed to be. That specific cell, and all others, will be tested and validated to produce accurate results every month.

Once this work is done, you are done forever. Or well, at least until the requirements change… So, if you are battling with any one of these monsters on a monthly basis, I highly encourage you to take a step forward and give BI Publisher a try. Once you are done with the development of your new report, you may find that you have hours per month back in your hands. Over time, many more hours than what you spent to create the report. Time worth spending.

Financial Reports - which tool to use? Part 2

Categories: BI & Warehousing

Financial Reports - which tool to use? Part 1

Wed, 2017-02-15 01:00
Financial Reports - which tool to use? Part 1

One of the treats of working in the Business Intelligence world is that we are asked to analyze different aspects of a business. In fact, we are asked to analyze many different types of businesses, too. Most of us using BI tools have come from some previous background. Be it Marketing, Finance, Supply Chain or any other, we most likely had work experience before we got here. Maybe one of our jobs even led to Business Intelligence. The fact is, we are not experts in all areas. It would take several lives to make such a claim, because each area can be very complex and take years to master. The truth, for most of us, is that we have our favorite areas. They are often related to what we are most familiar with.

Financial Reports - which tool to use? Part 1

Over time, I came to really appreciate how simple numbers can be, and developed this - hard to understand - favoritism towards financial reports. While some business areas can be artistic and even vague, numbers are never vague. I have a great appreciation for that. Working with numbers is always precise. In the end, they have to match. No matter how great your report looks, if the numbers don’t add up the report is always wrong. Plus, financial layouts are generally very defined going in, so there is little room for error.

Financials in OBIEE

So, the endeavor begins when you are a BI consultant and everything is supposed to add up properly and look very nice. OBIEE is an extremely powerful tool, and this gives users the impression that it can solve all problems. While it can solve most problems, it falls short on some key features needed for easy financial reporting. That is not to say that Financials can’t be handled in OBIEE - but it is definitely to say that it is not easy.

So, if financial reports are not easy to create in OBIEE, than we are left with two very simple options:

  1. Struggle through it and make it happen

  2. Choose another tool

I have made the mistake of choosing option 1 some times, but quickly realized that option 2 couldn’t be as bad. Countless times, I have been asked to create financial reports in OBIEE. Of course, they needed to tie up and match a specific format: they needed to have blank lines inserted between one section and another, and the alignment of the categories was very important. They often required very detailed variance calculations, so that a company could see where they stood as far as change overtime. Variance percentages are key on these types of reports, and if you have dealt with them in OBIEE, you know that different types of variances and their grand totals can often pose challenges for report writers.

So, in order to accomplish the formatting needed, you end up adding extra code here and there, in essence trying to make OBIEE do something that it’s not supposed to do. Soon, you are experiencing performance issues and a new array of considerations are in place. You start removing your “special code”, then you loose your formatting. The numbers on your financial statement are still correct, but your report looks something like this:

Financial Reports - which tool to use? Part 1

While, in reality, you were trying to get here:

Financial Reports - which tool to use? Part 1

** The Balance Sheet above was created using HFR for illustration of formatting only.

Looking at a different OBIEE financial report (below), you will see that a lot of formatting can be done in these reports, but they will always look like OBIEE reports, if you know what I mean.

Financial Reports - which tool to use? Part 1

In this example, the first column is out of order - as far as Income Statements go. This was left alone on purpose to display one of the issues with creating these statements in OBIEE. The tool does not easily allow you to choose which items will go in each row. So, in the criteria tab, in Answers, you choose the order of the columns, but if you need the rows in order, you will need to either:

  1. Use a hidden column created just for sorting purposes

  2. Leverage selection steps, or

  3. Create a measure column for each row that you will need, use a pivot table, and add the Measure Labels as rows on your pivot

I will illustrate the third option, as it is my preferred way of ordering rows. Suppose that you have a very simple criteria tab such as this:

Financial Reports - which tool to use? Part 1

Naturally, your results would default like this on a table:

Financial Reports - which tool to use? Part 1

If you use a Pivot table instead, you can drag your Measure Labels onto the Rows:

Financial Reports - which tool to use? Part 1

And now, you will be able to see your measures as rows. You can easily reorder them as needed by just moving the order of the columns in the Measures section of your Layout editor.

Financial Reports - which tool to use? Part 1

This seems like a simple solution if you know precisely what all your rows should be, and even better, if you don’t have a huge amount of measures on the report. In real life, this type of row ordering is high maintenance:

  1. You must label each measure to match the account category name for each row

  2. You must filter each measure by its account category (or account number)

  3. If the account category name changes in your DB, you must manually rename your columns to match the new naming convention

  4. If you add or delete account categories, you must manually add and delete columns from your report

OBIEE 12c offers a great improvement in this area: the ability to “save columns” is described very well by Jason Baer on this blog: https://www.rittmanmead.com/blog/2016/01/my-favorite-obiee-12c-feature-that-almost-no-one-is-talking-about/

With the new release of the product you can save as many financial columns as you would like in the web catalog, which allows you to reuse them. As a consequence, you will streamline report maintenance by updating the columns’ format and formula directly from the catalog (instead of inside every report). In fact, if you are spending too much time maintaining your existing reports out of OBIEE 11g, you will automatically benefit from an upgrade to 12c just based on this single feature. Check here for more info: https://www.rittmanmead.com/obiee-12c-upgrade/

This is a great improvement, but you will still need to deal with an overall lack of flexibility for dynamically adding and deleting columns, setting orders, adding blank space, indenting and calculating variances along with proper grand totals.

After spending more time than you should in order to create a simple report, you really start considering other tools. If you are already working in the Oracle stack, the obvious choices will be BI Publisher and Hyperion Financial Reporting (HFR).

Financials in Essbase/HFR

Hyperion Financial Reporting (HFR) brings a powerful solution to financial statements, because it allows you to create pixel perfect reports that are pre-aggregated in an Essbase cube. Just with that, two big problems were just solved: formatting and performance.

In the example below, you see that HFR allows you to place metrics on both sides of the Account Category (butterfly layout - difficult to accomplish in OBIEE):

Financial Reports - which tool to use? Part 1

In addition to formatting and performance, there are some definite pros to consider when choosing HFR:

  1. The calculations in HFR dynamically reference cells, as in excel. So, if a cell changes, the cells that are referencing the original cell will automatically be updated

  2. HFR has the ability to create financial books and batches, and also has a powerful bursting feature

HFR is a great solution for Income Statements, Balance Sheets and other reports that come from Essbase cubes. In a simplistic way, an Essbase cube is a combination of tables that have been joined and pre-aggregated. Since most tables coming out of a financial module in a system can often be joined, you should be able to create Essbase cubes to use as a source for your HFR reports. You will rarely have a requirement that cannot be handled by HFR and Essbase, but some situations may be problematic, for example, if your report requires a measure to be entered at run-time, if results from multiple cubes need to be added, or if your layout is very complex. This is why :

In an HFR report, you start by inserting a grid onto your report and then you associate that grid with a specific Essbase cube. If you need data from two cubes on the report, you can insert another grid and associate that with the second cube. You can also create a report that leverages calculations between existing grids (for the purpose of doing math with two or more separate cubes):

Financial Reports - which tool to use? Part 1

Many thanks to my collegue, Mark Cann https://www.rittmanmead.com/blog/author/mark-cann/, for working through this solution with me

The challenge here is that you may end up with multiple layout grids on your HFR report, which will complicate the report creation and maintenance going forward. It is important to know that if your requirements call for strange off-setting of cells and multiple different looking blocks, then HFR may not be the best tool for the job. If you choose HFR for this purpose, you will spend too much time trying to make things right.

Financial Reports - which tool to use? Part 1

*This is a simple Essbase implementation with 2 cubes (or databases): a Balance Sheet and an Income Statement cube.

The fact is, some financial reports are very tricky and do not come solely from a Financial module. For example, if your company is evaluated monthly for a line of credit, your bank may require to look at several components of your business in order to determine the amount that you can borrow. They will base their decision not only on your monthly revenue, but also your liabilities, such as accounts payables, and some of your assets, such as inventory. What they ask for really depends on their internal lending requirements, and also on the type of business that you have. These are, therefore, highly customized reports that never come out-of-the box anywhere. For this reason, most companies spend a lot of man hours creating these reports as a huge excel report, after the employees have managed to pull information from many different modules together.

These excel “monsters” do the job. They are accepted by the banks, and will get you that loan. On the downside, they need to be redone every month and will drag resource hours out of profitable projects. The flat excel files are also prone to mistakes, as the values are manually keyed in each time. If you make a mistake favorable to the company, your bank will look at it as a very negative issue. If you make an unfavorable mistake, you will not be able to borrow as much as you qualify for. This is a no win situation, so the reports must be accurate every time.

Financial Reports - which tool to use? Part 1 To check for accuracy, there is nothing like testing overtime. But, since you must rework the report each month, you don’t have that opportunity.

The solution is to create a template that will pull from all of these different modules, calculate the numbers, add the results automatically to a pixel perfect formatted report. Over the development cycle, these mappings and calculations will be thoroughly tested, and then they will only be reused going forward.

While you may spend some time pulling this logic together, you will only have to click a few buttons after you are done, for months or years to come. In fact, I have clients that have been running reports such as this one for years. They have been saving a couple of weeks in report creation every month.

Let’s look at an example of what I am talking about:

Financial Reports - which tool to use? Part 1

On this report, each number (disguised as $1234) has been mapped to a calculation that will be pulled dynamically, according to the date entered on the prompt. The inventory amounts are adjusted according to banking requirements, and a rate is allocated depending on the row. This amount is later added/subtracted from receivables and existing contracts. Most of these numbers were created as separate OBIEE analyses. Some amounts could even be tied into web services to get the daily futures prices to estimate the value of contracts when the report runs. All lines are considered in the final equation before the total borrowing amount can be calculated. Per this bank’s requirement, this form needed to be printed and signed, then submitted monthly.

Lending/financing reports may be the most tricky, and the most time consuming for companies to generate every month. The reports may be required by the bank, or by a company that is leasing or financing valuable equipment to your company. These reports need to show your prospective lender everything about your business. They will often need to be done in a format that is specified by your lender. These formats are not negotiable, in fact, some lenders still use old forms that used to be read by a machine.

Here is another small snippet of a financing report that I had to create recently. Now, which tool would you use for 10 different pages of something like this, which required some of the amounts to be entered in the prompt? *Note: the report had to look “exactly” like this:

Financial Reports - which tool to use? Part 1

Well, as I mentioned in the beginning of this article, OBIEE would not be your partner in this type of endeavor. I can guarantee that this relationship would fail: strange formatting with black boxes, line numbers, need for Headers (footers too, not shown here), indenting, etc.

You may consider Essbase/HFR combo, for formatting and performance, but you will soon realize that:

  1. Performance does not tend to be an issue with these reports, as they are generally submitted to lenders on a monthly basis, and therefore can be scheduled to run automatically in the middle of the night.

  2. As mentioned earlier, HFR requires a layout grid to be inserted before the report can be designed. Here, you would end up with multiple grids to handle the calculation of different cells from multiple cubes - which can be cumbersome to create and maintain.

  3. The measures in an HFR report should come from the pre-aggregated cube. In this example, some of the measures were entered as part of the prompt and are calculated at run time. At this point, you must scratch the Essbase/HFR option for this one!

So, now you are still stuck with your monster excel spreadsheet, then retyping the numbers onto the required form.
Financial Reports - which tool to use? Part 1

Before you marry this solution, let me present you with the tool that can do everything: BI Publisher.

Stay tuned for the second part of this blog, when I will share why I believe that BIP can solve the most challenging reporting requirements out there!

Categories: BI & Warehousing

Enabling Concurrent OBIEE RPD Development - for free

Fri, 2017-02-10 04:30
Enabling Concurrent OBIEE RPD Development - for free

One of the most common and long standing problems with developing in OBIEE is the problem of multiple developers working on the RPD at the same time. This blog explains the problem and explores the solution that we’ve developed and have been successfully using at clients over the last couple of years. We’re pleased to announce the immediate availability of the supporting tools, as part of the Rittman Mead Open Source Project.
Before we get into the detail, I'll first explain a bit about the background to the requirement and the options that ship with OBIEE.

Why Concurrent Development

The benefits of concurrent development are obvious: scalability and flexibility of development. It enables you to scale your development team to meet the delivery demands of the business. The challenge is to manage all of the concurrent work and enable releases in a flexible manner - which is where source control comes in.

We couldn't possibly attempt to manage concurrent development on something as complex as the RPD without good version control in place. Source control (A.K.A. version control/revision control) systems like Git and Apache Subversion (SVN) are designed to track and retain all of the changes made to a code base so that you can easily backtrack to points in time where the code was behaving as expected. It tracks what people changed, who changed it, when they changed it and even why they made that change (via commit messages). They also possess merge algorithms that can automatically combine changes made to text files, as long as there are no direct conflicts on the same lines. Then there's added benefits with code branching and tagging for releases. All of this leads to quicker and more reliable development cycles, no matter what the project, so good in fact that I rely on it even when working as one developer. To (mis)quote StackOverflow, "A civilised tool for a civilised age".

All of these techniques are about reducing the risk during the development process, and saving time. Time spent developing, time spent fixing bugs, spent communicating, testing, migrating, deploying and just about every IT activity under the sun. Time that could be better spent elsewhere.

Out of the Box

Oracle provide two ways to tackle this problem in the software:

However I believe that neither of these are sufficient for high standards of reliable development and releases - the reasons for which I explore below (and have been described previously). Additionally it is not possible to natively and fully integrate with version control for the RPD, which again presents a significant problem for reliable development cycles.

Firstly the online check-in and check-out system does, by design, force all development to be conducted online. This in itself is not an issue for a single developer, and is in fact a practice that we advocate for ‘sandbox’ development in isolation. However, as soon as there is more than one developer on the same server it reduces development flexibility. Two developers cannot develop their solutions in isolation and can be made to wait for assets they want to modify to be unlocked by other developers. This may be workable for a small amount of developers but does not scale well. Furthermore, the risk of losing work is much higher when working online; we've all seen the infamous "Transaction Update Failed" message when saving online. This is usually because of an inconsistency in the RPD but can be caused by less obvious reasons and usually leads to repeating some redundant work. Lastly, very large RPDs like those from BI Apps or very mature enterprise deployments pose a problem when working online. They cause the Admin Tool to work very slowly because of the calls it has to make to the server, which can be frustrating for developers. To be clear, I am certainly not advocating developing without testing your code, but given the speed of uploading an RPD to the server and the fact that it can be automated, in my experience it is far more efficient to develop offline and upload frequently for testing.

The MUD system is much better and is quite close in methodology to what we recommend in this guide. The premise works on having a production quality master RPD and then having many other individual developers with their own RPDs. The check-in and check-out system will automatically handle three-way merges to and from the master RPD when development changes are made. This is good in theory but has been maligned for years when used in practice. The version control system in MUDE is not as robust as Git or SVN for example and the conflict resolution relies on developers managing their own issues, without the ability for a source master to intervene. Ultimately there is little flexibility in this method, which makes it difficult to use in the real world.

Source Controlling the RPD

Source control is another problem as the RPD is a binary file which cannot be merged or analysed by text-based comparison tools like SVN or Git. A potential solution at one point seemed to be MDS XML, a structured, textual representation of the RPD. However, this also seemed to have some drawbacks when put into practice. Whilst some use MDS XML with great success and there are tools on the market that rely on this system, here at Rittman Mead we’ve found that there are significant risks and issues with it. We’ve come up with what we believe is a robust, scalable, and flexible approach, based around the binary RPD.

The Rittman Mead Solution to Concurrent OBIEE RPD Development

Successful development lifecycles comes down to implementation of the correct process and ensuring it is as quick and reliable as possible. Tools, like the ones described in this blog, can be used to help in both of those areas but are not a substitute for detailed knowledge of the processes and the product. A key feature of this approach is the Source Master who owns and is responsible for the overall development process. They will have a detailed understanding of the method and tools, as well as the specifics of the current and future work being done on the RPD. Things will go wrong, it is as inevitable as death and taxes - the key is to minimise the impact and frequency of these events.

The solution is based on the Gitflow method, which is one of the most established development models. The model is based on a few major concepts:

  • Features - Specific items of development, these are begun from the development branch and then merged back into development when complete.
  • Develop/Master Branches - Two branches of code, one representing the development stream, the other the production code.
  • Releases - A branch taken from development that is then eventually merged into production. This is the mechanism for getting the development stream into production.

I highly recommend reading that blog and this cheatsheet as they explains the method excellently and what we've done here is support that model using binary RPDs and the 3-way merge facility in OBIEE. Also of relevance is this Rittman Mead blog which describes some of the techniques we're explaining here. We've open sourced some command line tools (written in Python) to ease and automate the process. You can download the code from the GitHub repository and need only an install of Python 2.7 and the OBIEE client to get started. The tooling works with both git and Subversion (SVN). We recommend the use of git, but realise that SVN is often embedded at organisations and so support that too.


This section shows a simple example of how you might use this methodology for multiple developers to work on the RPD in a reliable way. Many of the screenshots below show SourceTree, a GUI for Git which I'm a fan for both its UI and GitFlow support.

We have two developers in our team, Basil and Manuel, who both want to work on the RPD and add in their own changes. They already have an RPD they've made and are using with OBIEE, named base.rpd. First they initialise a Git repository, committing a copy of their RPD (base.rpd).

Enabling Concurrent OBIEE RPD Development - for free

The production branch is called master and the development branch develop, following the standard naming convention for GitFlow.

Before we get started, let's a take a look at the RPD we're working with:

Enabling Concurrent OBIEE RPD Development - for free

Simple Concurrent Development

Now Basil and Manuel both independently start features F01 and F02 respectively:

python obi-merge-git.py startFeature F01  

Each developer is going to add a measure column for Gross Domestic Product (GDP) to the logical fact table, but in different currencies. Basil adds "GDP (GBP)" as a logical column and commits it to his development branch, F01. Manuel does the same on his, adding "GDP (USD)" as a logical column and committing it to F02.

Enabling Concurrent OBIEE RPD Development - for free

Now Basil finishes his feature, which merges his work back into the develop branch.

Enabling Concurrent OBIEE RPD Development - for free

This doesn't require any work, as it's the first change to development to occur.

python obi-merge-git.py finishFeature F01

Checking out develop...  
Already on 'develop'

Merging feature/F01 into develop...  
Successfully merged feature/F01 to the develop branch.  

When Manuel does the same, there is some extra work to do. To explain what's happening we need to look at the 3 merge candidates in play, using the terminology of OBIEE’s 3-way merge functionality:

  • Original: This is the state of the development repository from when the feature was created.
  • Modified: This is your repository at the time of finishing the feature.
  • Current: This is the state of the development repository at the time of finishing the feature.

When Basil completed F01, the original and current RPDs were the same, so it could just be overridden with the new RPD. However now, the Original and Current RPDs are different, so we need to resolve the changes. Our RPDs are binary files and so we need to use the 3-way merge from the Admin Tool. The python script wrapped around this process uses Git’s metadata to determine the appropriate merge candidates for invoking the OBIEE 3-way merge.

Enabling Concurrent OBIEE RPD Development - for free

Since our changes do not conflict so this can happen automatically without user intervention. This is one of the critical differences from doing the same process in MDS XML, which would have thrown a git merge conflict (two changes to the same Logical Table, and thus same MDS XML file) requiring user intervention.

python obi-merge-git.py finishFeature F02

Checking out develop...  
Already on 'develop'

Merging feature/F02 into develop...  
warning: Cannot merge binary files: base.rpd (HEAD vs. feature/F02)

Creating patch...

        Patch created successfully.

Patching RPD...

        RPD patched successfully.

RPD Merge complete.

Successfully merged feature/F02 to the develop branch.  

In the background the script uses the comparerpd and patchrpd OBIEE commands.


Now our development branch has both features in, which we can see using the Admin Tool:

Enabling Concurrent OBIEE RPD Development - for free

To get this into production we can start a release process:

python obi-merge-git.py startRelease v1.00  

This creates a new branch from develop that we can use to apply bug fixes if we need to. Any changes made to the release now will be applied back into development when the release is complete as well as being merged into the production branch. The developers realise they have forgotten to put the new columns in the presentation layer, so they do it now in the release branch as a bugfix. In GitFlow, bugfixes are last minute changes that need to be made for a release but do not interfere with the next development cycle, which may have already begun (in the develop branch) by the time the bug was spotted. The changes are merged back to develop as well as master so the fix isn't lost in the next cycle.

Enabling Concurrent OBIEE RPD Development - for free

This is committed to the repo and then the release is finished:

python obi-merge-git.py finishRelease v1.00  

Enabling Concurrent OBIEE RPD Development - for free

After the release we can see that the master and develop branches are at the same commit point, with a tag of the release name added in too. Additionally we can switch to the develop and master branches and see all of the changes including the columns in the presentation layer. The full commit history of course remains if we want to roll back to other RPDs.

Enabling Concurrent OBIEE RPD Development - for free

Conflicted Development

Basil and Manuel start their new features, F03 and F04 respectively. This time they’re working on the same existing column - something that a “Source Master” should have helped avoid, but missed this time. Basil edits the column formula of the "Area" column and renames it to "Area (sqm)"" and Manuel does the same, naming his column "Area (sqFt)".

Enabling Concurrent OBIEE RPD Development - for free

They both commit the changes to their own feature branches and Manuel merges his back to development with no problem.

python obi-merge-git.py finishFeature F04  

However when Basil tries to finish his feature the obvious conflict occurs, as the automatic merge cannot resolve without some human intervention since it is the same object in the RPD affected by both changes. At this point, the script will open up the current RPD in the Admin Tool and tell Basil to merge his changes manually in the tool, going through the usual conflict resolution process. The script provides 3 RPDs to make the RPD choosing step unambiguous:

  • original.rpd
  • modified.rpd
  • current.rpd (Opened)
python obi-merge-git.py finishFeature F03

Checking out develop...  
Already on 'develop'

Merging feature/F03 into develop...  
warning: Cannot merge binary files: base.rpd (HEAD vs. feature/F03)

Creating patch...

        Patch created successfully.

Patching RPD...

        Failed to patch RPD. See C:\Users\Administrator\Documents\obi-concurrent-develop\patch_rpd.log for details.

        Conflicts detected. Can resolve manually using the Admin Tool.

        Original RPD:   C:\\Users\\Administrator\\Documents\\rpd-test\a.rpd (original.rpd)
        Current RPD:    C:\\Users\\Administrator\\Documents\\rpd-test\c.rpd (Opened)
        Modified RPD:   C:\\Users\\Administrator\\Documents\\rpd-test\b.rpd (modified.rpd)

Perform a full repository merge using the Admin Tool and keep the output name as the default or C:\\Users\\Administrator\\Documents\\rpd-test\base.rpd

Will open RPD using the Admin Tool.

Press Enter key to continue.

You must close the AdminTool after completing the merge manually in order for this script to continue.  

When Basil hits a key, the Admin Tool opens up, and from here he needs to manually initiate the merge and specify the merge candidates. This is made easy by the script which automatically names them appropriately:

Enabling Concurrent OBIEE RPD Development - for free

Note that the a, b and c RPDs are part of the merge logic with Git and can be ignored here.

Basil assigns the original and modified RPDs to the correct parts of the wizard and then resolves the conflict (choosing his change) in the next step of the wizard.

Enabling Concurrent OBIEE RPD Development - for free

Upon closing the Admin Tool, the Git merge to the develop branch is automatically completed.

Now when they look in the development RPD they can see the column named as "Area (sqm)", having accepted Basil's change. Of course this is a trivial example, but because the method relies on using the Admin Tool, it will be just as reliable as a manual 3-way merge you would perform in OBIEE.

In my experience, most of the problems with 3-way merging is that developers get confused as to which candidates to choose or they lose track of a true original point from when both developers started working. Using this method eliminates both of the those problems, with the added benefit of tight integration into source control. Even with an easier interface to the 3-way merge process, developers and/or the Source Master should be aware of some of the ‘features’ of OBIEE’s 3-way merge. For example, if a change has occurred on the physical layer which does not have any representations at all in the business or presentation layers, it may be lost during a 3-way merge. Another is that the merge rules are not guaranteed to stay the same between OBIEE versions, which means that we cannot be certain our development lifecycle is stable after patching or upgrading OBIEE.

So given this, and as a general core tenet of good software development practice, you should be automatically testing your RPDs after the merge and before release.

Testing the RPD

There are still issues with OBIEE RPD merging that aren't rectified by the 3-way merge and so must be handled manually if and when they occur. One such example is that if a change has occurred on the physical layer which does not have any representations at all in the business or presentation layers, it may be lost during a 3-way merge. Another problem is that the merge rules are not guaranteed to stay the same between OBIEE versions, which means that we cannot be certain our development lifecycle is stable after patching or upgrading OBIEE. Another thing I don't really like is the inherent bias the merge process has toward the modified RPD, instead of treating the modified and current RPDs equally. The merge candidates in the tool have been selected in such a way as to mitigate this problem but I am wary it may have unforeseen consequences for some as yet untested scenarios. There are may be other inconsistencies, but it is difficult to pin down all of the scenarios precisely and that's one of the main stumbling blocks when managing a file as complex as the RPD. Even if we didn't receive any conflicts, it is vital that RPDs are checked and tested (preferably automatically) before release.

The first step to testing is to create a representative test suite, which will encompass as much of the functionality of your system in as few reports as possible. The reason for this is that it is often impractical and sometimes invalid to check the entire catalogue at once. Furthermore, the faster the testing phase occurs, the quicker the overall release process will be. The purpose of a test suite is so that we can take a baseline of the data of each report from which we can validate consistency after making changes. This means your test suite should contain reports that are expected not to change after making changes to RPD. Furthermore you need to be careful that the underlying data of the report does not change between the baseline capture and the regression validation phases, otherwise you will invalidate your test.

In terms of tooling, Oracle provide BVT which can be used outside of upgrades to perform automated regression tests. This is good as it provides both data checks as well as visual validation. Furthermore, it can be run on a specific Web/Presentation Catalog folder directly, as opposed to the whole system.

As well as Oracle’s BVT, we also have an in-house Regression Testing tool that was written prior to BVT’s availability, and is still used to satisfy specific test scenarios. Built in Python, it is part of a larger toolset that we use with clients for automating the full development lifecycle for OBIEE, including migrating RPDs and catalogue artefacts between environments.

This brings us onto the last piece in the DevOps puzzle is continuous integration (CI). This is the concept of automatically testing and deploying code to a higher environment as soon as the work is complete. This is something not explicitly covered by the tools in this blog, however would work nicely used with the testing and migration scripts described above. This could all be made seamless by invoking the processes via script calls or better using Git hooks.


The success of an OBIEE concurrent development approach comes down to two things: the tooling, and the rigorous implementation of the process - and it is the latter that is key. In this article I’ve demonstrated the tooling that we’ve developed, along with the process required for a successful development method. Here at Rittman Mead we have detailed understanding and experience in the process and framework necessary to implement it at any client, adapting and advising to ensure the integration into existing in-house development and release requirements. The real world is messy and developers don't all work in the same way. A single tool in isolation is not going to succeed in making OBIEE - designed from the outset as a single-developer tool - scale to multiple developers. Instead of insisting that you change to accommodate our tool, we instead bring our tool and process and adapt to suit you.

You can find the code used in this blog up on GitHub and if you would like to discuss how Rittman Mead can help implement concurrent OBIEE RPD development successfully at your organisation, please get in touch.

Categories: BI & Warehousing

Working with OBIEE Data in Excel using ODBC

Thu, 2017-02-09 04:09

Look at this picture. I'm sure you've recognised the most favourite data analysis tool of all times - Excel.


But what you can't see in this picture is the data source for the table and charts. And this source is OBIEE's BI Server. Direct. Without exports or plugins!

Querying OBIEE Directly from Excel? With No Plugins? What Is Going On!

The OBIEE BI Server (nqsserver / OBIS) exposes an ODBC interface (look here if you live in a world full of Java and JDBC) which is used by Presentation Services and Administration tool. But a lesser-known benefit of this is that we can utilise this ODBC interface for own needs. But there is a little problem with the OBIEE 12c client installation - its size. Full (and the only possible actually) client OBIEE installation is more than 2 gigabytes and consists of more than 31 thousand files. Not a huge problem considering HDD sizes and prices but something not so good if you have an average-sized SSD.

And the second point to consider. We don’t want to give a full set of developer tools to an end-user. Even if our security won’t let them break anything, why would we stuff his head with unnecessary things? Let's keep things simple.

So what I had in mind with this research was to make a set of OBIEE ODBC libraries as small as possible. And the second aim was avoiding a full installation with cutting out redundant pieces. I need a small "thing" I can easily copy to any computer and then use it.

Disclaimer. Everything below is a result of our investigation. It’s not a supported functionality or Oracle’s recommendation.

I will not describe in full details the process of the investigation as it is not too challenging. It's less a detective thriller and more a tedious story. But anyways the main points will be highlighted.

Examine Working Copy

The first thing I needed to know what changes Oracle's installer does during an installation. Does it copy something to the Windows folder or everything stays in its installation folder? Does it make any registry changes (apparently it does but what exactly)?

For this task, I took a fresh Windows, created a dump of the registry and folders structure of the Windows folder, then installed OBIEE client using normal installation process, made the same dumps and compared them once again.

There were no surprises. OBIEE installer doesn't copy a single byte to the Windows folder (and it's a good news I think) but it creates a few registry keys (what was expected). Anyone who has ever tried to play around Windows ODBC won't be surprised with it at all.

I deleted some keys in order to make this screenshots more clear and readable.

So now I know names of the DLLs and their places. A good point to start. A small free utility Dependency walker helped me to find out a set of DLLs I need. This tool is very easy to use and very useful for finding a missing DLL. Just give it a DLL to explore and it will show all DLLs used by it and mark all missing.

Dependency walker

And a bit of educated guess helped to find one more folder called locale which stores all language files.

So, as a result, we got a tiny ODBC-related OBIEE client. It's very small. With only English locale it has a size about 20 megabytes and consists of 75 files. Compare it to 31 thousand files of the full client.

So that was a short story of looking and finding things. Now goes some practical result.

Folders Structure.

It seems that some paths are hard-coded. So we can't put DLLs to any folder we like. It should be something\bi\bifoundation\server. C:\BI-client\bi\bifoundation\server for example.

The List of DLLs

I tried to find the minimum viable set of the libraries. The list has only 25 libraries but it takes too much place on the screen so I put them into a collapsible list in order to keep this post not too long. These libraries should go under bin folder. C:\BI-client\bi\bifoundation\server\bin for example.

The list of ODBC DLLs

  • BiEndPointManagerCIntf64.dll
  • mfc100u.dll
  • msvcp100.dll
  • msvcr100.dll
  • nqcryptography64.dll
  • nqerrormsgcompiler64.dll
  • nqmasutility64.dll
  • nqperf64.dll
  • nqportable64.dll
  • nqsclusterapi64.dll
  • nqsclusterclient64.dll
  • nqsclusterutility64.dll
  • nqsodbc64.dll
  • nqsodbcdriverconndlg64.dll
  • nqssetup.dll
  • NqsSetupENU.dll
  • nqstcpclusterclient64.dll
  • NQSTLU64.4.5.dll
  • nqutilityclient64.dll
  • nqutilitycomm64.dll
  • nqutilitygeneric64.dll
  • nqutilitysslall64.dll
  • perfapi64.dll
  • samemoryallocator864.dll
  • xerces-c_2_8.dll

Or you may take the full bin folder. Its size is about 240 megabytes. You won't win the smallest ODBC client contest but will save a few minutes of your time.


The second folder you need is locale, it is located near bin. C:\BI-client\bi\bifoundation\server\locale, for example. Again if you agree with not the smallest client in the world, you may take the whole locale. But there are 29 locales and I think most of the time you will need only one or two of them. Every locale is about 1.5 megabytes and has 48 files. A good place for some optimisation in my opinion.

Registry Key

And the last part is registry keys. I need to tell my Windows what is my driver name and what is its path and so on. If it was a usual part of the registry I'd created a file anything.reg, put a code like this into it and imported it into the registry.

Windows Registry Editor Version 5.00

"Oracle BI Server"="Installed"


But luckily there is a small console utility which makes the task easier and more elegant - scripted. Microsoft provides us a tool called odbcconf.exe located in C:\Windows\System32 folder. And its syntax is not very obvious but not too hard also. Generally the syntax is the following: odbcconf.exe /a {action "parameters"}. In this case the call is odbcconf.exe {installdriver "Oracle BI Server|Driver=C:\BI-client\bi\bifoundation\server\bin\nqsodbc64.dll|Setup=C:\BI-client\bi\bifoundation\server\bin\nqssetup.dll|APILevel=2|SQLLevel=2|ConnectionFunctions=YYN|DriverODBCVer=03.52|Regional=Yes"}. Here installdriver is the action and the long string is the set of parameters divided by |. It may look a bit complicated but in my opinion it leaves less space for manual work and therefore less space for error. Just one note: don't forget to start a cmd windows as administrator.

Visual C++ Redistributable

If your computer is fresh and clean, you need to install a Visual C++ 2010 redistributable package. It's included in Oracle's client and placed in 'Oracle_Home\bi' folder. The file name is vcredist_x64.exe.


And as a result I got an ODBC driver I can use as I want. And not obvious but pleasant bonus is that I can give it any name I like. OBIEE version, path, whatever I want.

And I can create an ODBC DSN in a normal way using ODBC Data source Administrator. Just like always. No matter this is a hand-made driver. It was properly registered and it is absolutely legitimate.

So just a brief intermediate summary. We can take a full 2+ gigabytes OBIEE client. Or we can spend some time to:
1. Create a folder and put into it some files from the Oracle OBIEE client;
2. Create a few registry keys;
3. Install a Visual C++ 2010 redistributable
And we will get a working OBIEE ODBC driver which size is slightly above 20 megabytes.


So now we have a working ODBC connection, what can it give us?

Meet the most beloved by end users all around the world tool - Excel.

At this point of the story, some may tell me "Hey, stop right there! Where have you got that SQL? And why is it so strange? That's not an ANSI SQL". The evil part of me wants to simply give you a link to the official documentation: Logical SQL Reference and run away. But the kind one insists that even while documentation has done no harm to anyone, that's not enough.

In a nutshell, this is an SQL that Presentation services send to BI Server. When anyone builds an analysis or runs a dashboard, Presentation services create and send logical queries to BI Server. And we can use it for our own needs. Create an analysis as usual (or open an existing one), navigate to the Advanced tab, and then copy and paste analysis' Logical SQL. You may want to refine it, maybe remove some columns, or change aliases, or add a clause or two from the evil part's documentation, but for the first step just take it and use it. That simple.

And of course, we can query our BI server using any ODBC query tool.

And all these queries go directly to the BI Server. This method doesn't use Presentation Services, OBIEE won't build a complex HTML which we have to parse later. We use a fast and efficient way instead.

Categories: BI & Warehousing

Join Rittman Mead at the 2017 BIWA Summit!

Mon, 2017-01-30 15:46
Join Rittman Mead at the 2017 BIWA Summit!

We invite you to come join us at the annual 2017 BIWA Summit.

Join Rittman Mead at the 2017 BIWA Summit!

This year we are proud to announce that Robin Moffatt, Head of Research and Development, will be presenting on:

Analysing the Panama Papers with Oracle Big Data Spatial and Graph

January 31, 2017 | 3:45 pm – 4:15 pm | Room 103

Oracle Big Data Spatial and Graph enables the analysis of datasets beyond that of standard relational analytics commonly used. Through Graph technology relationships can be identified that may not otherwise have been. This has practical uses including in product recommendations, social network analysis, and fraud detection. In this presentation we will see a practical demonstration of Oracle Big Data Spatial and Graph to load and analyse the “Panama Papers” dataset. Graph algorithms will be utilised to identify key actors and organisations within the data, and patterns of relationships shown. This practical example of using the tool will give attendees a clear idea of the functionality of the tool and how it could be used within their own organisation. If Oracle Database 12cR2 on-premise is available by the time of this presentation, then its new property graph capabilities will also be covered here. The presentation will be based on a paper published on OTN: https://community.oracle.com/docs/DOC-1006400

Kafka’s Role in Implementing Oracle’s Big Data Reference Architecture on the Big Data Appliance

February 1, 2017 | 2:20 pm – 3:10 pm | Room 102

Big Data … Big Mess? Everyone wants Big Data, but without a good platform design up front there is the risk of a mess of point-to-point feeds. The solution to this is Apache Kafka, which enables stream or batch consumption of the data by multiple consumers. Implemented as part of Oracle’s Big Data Architecture on the Big Data Appliance, it acts as a data bus for the enterprise to both the data reservoir and discovery lab. This presentation will introduce the basics of Kafka, and explain how it fits within the Big Data Architecture. We’ll then see it used with Oracle GoldenGate to stream data into the data reservoir, as well as ad hoc population of discovery lab environments and microservices such as Flume, HBase, and Elasticsearch.

(Still) No Silver Bullets: OBIEE 12c Performance in the Real World

February 2, 2017 | 1:30 pm – 2:20 pm | Room 203

Are you involved in the design and development of OBIEE systems and want to know the best way to go about ensuring good performance? Maybe you’ve an existing OBIEE system with performance “challenges” that you need to diagnose? This presentation looks at the practical elements of diagnosing the causes of performance issues in OBIEE, and discusses good practices to observe when developing new systems. It includes discussion of OBIEE 12c and with additional emphasis on analysis of Usage Tracking data for the accurate profiling and diagnosis of issues. Why this would appeal to the audience: – Method-R time profiling technique applied to the OBIEE nqquery.log – Large number of the community use OBIEE, many will have their own performance horror stories; fewer will have done a deep dive into analysing the time profile of long-running requests – Performance “right practices” will help those less familiar with performant OBIEE designs, and may prompt debate from those more experienced. As presented previously at OOW, OUGF, UKOUG, OUG Scotland, and POUG. Newly updated for OBIEE 12c. * Video: http://ritt.md/silver-bullets-video* Slides: http://ritt.md/silver-bullets-slides

Categories: BI & Warehousing