Rittman Mead Consulting

Subscribe to Rittman Mead Consulting feed Rittman Mead Consulting
Rittman Mead consults, trains, and innovates within the world of Oracle Business Intelligence, data integration, and analytics.
Updated: 2 weeks 1 day ago

DevOps in OAC: Scripting Oracle Cloud Instance Management with PSM Cli

Mon, 2018-06-25 10:18
 Scripting Oracle Cloud Instance Management with PSM Cli

This summer we unselfish Italians decided to not participate to the World Cup to give another country the opportunity to win (good luck with that England!). This decision, which I strongly support, gives me lot of time for blogging!
 Scripting Oracle Cloud Instance Management with PSM Cli

As already written, two weeks ago while in Orlando for Kscope18, I presented a session about DevOps and OBIEE focusing on how to properly source control, promote and test for regression any component of the infrastructure.

 Scripting Oracle Cloud Instance Management with PSM Cli

Development Isolation

One key aspect of DevOps is providing the Development Isolation: a way of allowing multiple development streams to work independently and merging the outcome of the process into the main environment only after this has been tested and validated. This is needed to avoid the standard situation where code promotions are blocked due to different working streams not being in sync: forcing a team to postpone a code release just because another team doesn't have the UAT OK is just an example of non-isolated development platforms.

 Scripting Oracle Cloud Instance Management with PSM Cli

We have been discussing development isolation topic in the past focusing mainly on concurrent repository development and how to integrate it with versioning tools like Git and SVN. The concurrent online editing option is not viable since multiple developers are modifying the same artifact (RPD) without a way of testing for regression the changes or to verifying that what has been done is correct before merging the changes in the RPD.
Alternative solutions of using MUDE (default multi-user development method provided by the Admintool) or pure offline RPD work encounter the same problems defined above: no feature or regression testing available before merging the RPD in the main development environment.

Different RPD development techniques solve only partially the problem: almost any OAC/OBIEE development consist at least in both RPD and catalog work (creation of analysis/dashboards/VA projects) we need an approach which provides Development Isolation at both levels. The solution, in order to properly build a DevOps framework around OAC/OBIEE, it's to provide isolated feature-related full OBIEE instances where the RPD can be edited in online mode, the catalog work can be done independently, and the overall result can be tested and validated before being merged into the common development environment.

Feature-Related Instances

The feature instances, as described above, need to be full OAC/OBIEE development instances where only a feature (or a small set) is worked at the time in order to give the agility to developers to release the code as soon as it's ready and tested. In the on-premises world this can "easily" be achieved by providing a number of dedicated Virtual Machines or, more in line with the recent trends, an automated instance provisioning with Docker using a template image like the one built by our previous colleague Gianni Ceresa.

 Scripting Oracle Cloud Instance Management with PSM Cli

However, when we think about Oracle Analytics Cloud (OAC), we seem to have two problems:

  • There is a cost associated with every instance, thus minimizing the number of instances and the uptime is necessary
  • The OAC provisioning interface is point and click, thus automating the instance management seems impossible

The overall OAC instance cost can be mitigated by the Bring Your Own License (BYOL) licensing method which allows customers to migrate on-premises licenses to the cloud and have discounted prices on the hourly/monthly instance cost (more details here). However, since the target is to minimize the cost thus the # of instances and the uptime, we need to find a way to do so that doesn't rely on a human and a point and click interface. Luckily the PaaS Service Manager Command Line Interface (PSM Cli) allows us to solve this problem by providing a scriptable way of creating, starting and stopping instances.

PaaS Service Manager Command Line Interface

PSMCLI is a command line interface acting as a wrapper over the PaaS REST APIs. Its usage is not limited to OAC, the same interface can be used to create and manage instances of the Oracle's Database Cloud Service or Java Cloud Services amongst the others.
When talking about OAC please keep in mind that, as of now, PSM Cli works only with the non-autonomous version but I believe the Autonomous support will be added soon.

Installing and Configuring PSM Cli

PSMCLI has two prerequisites before it can be installed:

  • cURL - a command line utility to transfer data with URLs
  • Python 3.3 or later

Once both prerequisites are installed PSM can easily be downloaded with the following cURL call

curl -X GET -u <USER>:<PWD> -H X-ID-TENANT-NAME:<IDENTITY_DOMAIN> https://<REST_SERVER>/paas/core/api/v1.1/cli/<IDENTITY_DOMAIN>/client -o psmcli.zip

Where

  • <USER> and <PWD> are the credentials
  • <IDENTITY_DOMAIN> is the Identity Domain ID specified during the account creation
  • <REST_SERVER> is the REST API server name which is:
  • psm.us.oraclecloud.com if you are using a US datacenter
  • psm.aucom.oraclecloud.com if you are in the AuCom region
  • psm.europe.oraclecloud.com otherwise

Next step is to install PSM as a Python package with

pip3 install -U psmcli.zip

After the installation is time for configuration

psm setup

The configuration command will request the following information:

  • Oracle Cloud Username and Password
  • Identity Domain
  • Region, this need to be set to
  • emea if the REST_SERVER mentioned above contains emea
  • aucom if the REST_SERVER mentioned above contains aucom
  • us otherwise
  • Output format: the choice is between short, json and html
  • OAuth: the communication between the CLI and the REST API can use basic authentication (flag n) or OAuth (flag y). If OAuth is chosen then ClientID, Secret and Access Token need to be specified

A JSON profile file can also be used to provide the same information mentioned above. The structure of the file is the following

{ 
    "username":"<USER>",
    "password":"<PASSWORD>",
    "identityDomain":"<IDENTITY_DOMAIN>",
    "region":"<REGION>",
    "outputFormat":"<OUTPUT_FORMAT>",
    "oAuth":{ 
        "clientId":"",
        "clientSecret":"",
        "accessTokenServer":""
    }
}

If the profile is stored in a file profile.json the PSM configuration can be achieved by just executing

psm setup -c profile.json

One quick note: the identity domain Id, shown in the Oracle Cloud header, isn't working if it's not the original name (name at the time of the creation).

 Scripting Oracle Cloud Instance Management with PSM Cli

In order to get the correct identity domain Id to use, check in an Oracle Cloud instance already created (e.g. a database one) and check the Details, you'll see the original identity domain listed there (credits to Pieter Van Puymbroeck).

 Scripting Oracle Cloud Instance Management with PSM Cli

Working With PSM Cli

Once the PSM has been correctly configured it's time to start checking what options are available, for a detailed list of the options check PSM documentation.

The PSM commands are product related, so each command is in the form:

psm <product> <command> <parameters>

Where

  • product is the Oracle cloud product e.g. dbcs, analytics, BigDataAppliance, for a complete list use psm help
  • command is the action to be executed against the product e.g. services, stop, start, create-service
  • parameters is the list of parameters to pass depending on the command executed

The first step is to check what instances I already created, I can do so for the database by executing

psm dbcs services

which, as expected, will list all my active instances

 Scripting Oracle Cloud Instance Management with PSM Cli

I can then start and stop it using:

psm dbcs start/stop/restart -s <INSTANCE_NAME>

Which in my example provides the Id of the Job assigned to the stop operation.

 Scripting Oracle Cloud Instance Management with PSM Cli

When I check the status via the service command I get Maintenance like in the web UI.

 Scripting Oracle Cloud Instance Management with PSM Cli

The same applies to the start and restart operation. Please keep in mind that all the calls are asynchronous -> the command will call the related REST API and then return the associated Job ID without waiting for the command to be finished. The status of a job can be checked with:

psm dbcs operation-status -j <JOB_ID>

 Scripting Oracle Cloud Instance Management with PSM Cli

The same operations described above are available on OAC with the same commands by simply changing the product from dbcs to analytics like:

psm analytics start/stop/restart -s <INSTANCE_NAME>

On top of the basic operation, PSM Cli allows also the following:

  • Service Instance: start/stop/restart, instance creation-deletion
  • Access Control: lists, creates, deletes, enables and disables access rules for a service.
  • Scaling: changes the computer shape of an instance and allows scaling up/down.
  • Storage: extends the storage associated to OAC
  • Backup Configuration: updates/shows the backup configurations
  • Backups: lists, creates, deletes backups of the instance
  • Restore: restores a backup giving detailed information about it and the history of Restores
  • Patches: allows patching, rollbacking, doing pre-checks, and retrieving patching history
Creating an OAC Instance

So far we discussed the maintenance on already created instances with start/stop/restart commands, but PSM Cli allows also the creation of an instance via command line. The call is pretty simple:

psm analytics create-service -c <CONFIG_FILE> -of <OUTPUT_FORMAT>

Where

  • CONFIG_FILE: is the file defining all OAC instance configurations
  • OUTPUT_FORMAT: is the desired output format between short, json and html

The question now is:

How do I create a Config File for OAC?

The documentation doesn't provide any help on this, but we can use the same approach as for on-premises OBIEE and response file: create the first instance with the Web-UI, save the payload for future use and change parameters when necessary.

 Scripting Oracle Cloud Instance Management with PSM Cli

On the Confirm screen, there is the option to Download the REST payload in JSON format

 Scripting Oracle Cloud Instance Management with PSM Cli

With the resulting json Config File being

{
  "edition": "<EDITION>",
  "vmPublicKeyText": "<SSH_TOKEN>",
  "enableNotification": "true",
  "notificationEmail": "<EMAIL>",
  "serviceVersion": "<VERSION>",
  "isBYOL": "false",
  "components": {
    "BI": {
      "adminUserPassword": "<ADMINPWD>",
      "adminUserName": "<ADMINUSER>",
      "analyticsStoragePassword": "<PWD>",
      "shape": "oc3",
      "createAnalyticsStorageContainer": "true",
      "profile_essbase": "false",
      "dbcsPassword": "<DBCSPWD>",
      "totalAnalyticsStorage": "280.0",
      "profile_bi": "true",
      "profile_dv_forced": "true",
      "analyticsStorageUser": "<EMAIL>",
      "dbcsUserName": "<DBUSER>",
      "dbcsPDBName": "<PDBNAME>",
      "dbcsName": "<DBCSNAME>",
      "idcs_enabled": "false",
      "analyticsStorageContainerURL": "<STORAGEURL>",
      "publicStorageEnabled": "false",
      "usableAnalyticsStorage": "180"
    }
  },
  "serviceLevel": "PAAS",
  "meteringFrequency": "HOURLY",
  "subscriptionId": "<SUBSCRIPTIONID>",
  "serviceName": "<SERVICENAME>"
}

This file can be stored and the parameters changed as necessary to create new OAC instances with the command:

psm analytics create-service -c <JSON_PAYLOAD_FILE> -of short/json/html

As shown previously, the result of the call is a Job Id that can be monitored with

psm analytics operation-status -j <JOB_ID>

Once the Job is finished successfully, the OAC instance is ready to be used. If at a certain point, the OAC instance is not needed anymore, it can be deleted via:

psm analytics delete-service -s <SERVICE_NAME> -n <DBA_NAME> -p <DBA_PWD>

Where

  • SERVICE_NAME is the OAC instance name
  • DBA_NAME and DBA_PWD are the DBA credentials where OAC schemas are residing
Summary

Worried about providing development isolation in OAC while keeping the costs down? Not anymore! With PSM Cli you now have a way of creating instances on demand, start/stop, up/down scaling with a command line tool easily integrable with automation tools like Jenkins.

Create an OAC instances automatically only when features need to be developed or tested, stop&start the instances based on your workforce timetables, take the benefit of the cloud minimizing the cost associated to it by using PSM Cli!

One last note; for a full DevOps OAC implementation, PSM Cli is not sufficient: tasks like automated regression testing, code versioning, and promotion can't be managed directly with PSM Cli but require usage of external toolsets like Rittman Mead BI Developer Toolkit. If you are interested in a full DevOps implementation on OAC and understanding the details on how PSM Cli can be used in conjunction with Rittman Mead BI Developer Toolkit don't hesitate to contact us!

Categories: BI & Warehousing

DevOps in OAC: Scripting Oracle Cloud Instance Management with PSM Cli

Mon, 2018-06-25 10:18
 Scripting Oracle Cloud Instance Management with PSM Cli

This summer we unselfish Italians decided to not participate to the World Cup to give another country the opportunity to win (good luck with that England!). This decision, which I strongly support, gives me lot of time for blogging!
 Scripting Oracle Cloud Instance Management with PSM Cli

As already written, two weeks ago while in Orlando for Kscope18, I presented a session about DevOps and OBIEE focusing on how to properly source control, promote and test for regression any component of the infrastructure.

 Scripting Oracle Cloud Instance Management with PSM Cli

Development Isolation

One key aspect of DevOps is providing the Development Isolation: a way of allowing multiple development streams to work independently and merging the outcome of the process into the main environment only after this has been tested and validated. This is needed to avoid the standard situation where code promotions are blocked due to different working streams not being in sync: forcing a team to postpone a code release just because another team doesn't have the UAT OK is just an example of non-isolated development platforms.

 Scripting Oracle Cloud Instance Management with PSM Cli

We have been discussing development isolation topic in the past focusing mainly on concurrent repository development and how to integrate it with versioning tools like Git and SVN. The concurrent online editing option is not viable since multiple developers are modifying the same artifact (RPD) without a way of testing for regression the changes or to verifying that what has been done is correct before merging the changes in the RPD.
Alternative solutions of using MUDE (default multi-user development method provided by the Admintool) or pure offline RPD work encounter the same problems defined above: no feature or regression testing available before merging the RPD in the main development environment.

Different RPD development techniques solve only partially the problem: almost any OAC/OBIEE development consist at least in both RPD and catalog work (creation of analysis/dashboards/VA projects) we need an approach which provides Development Isolation at both levels. The solution, in order to properly build a DevOps framework around OAC/OBIEE, it's to provide isolated feature-related full OBIEE instances where the RPD can be edited in online mode, the catalog work can be done independently, and the overall result can be tested and validated before being merged into the common development environment.

Feature-Related Instances

The feature instances, as described above, need to be full OAC/OBIEE development instances where only a feature (or a small set) is worked at the time in order to give the agility to developers to release the code as soon as it's ready and tested. In the on-premises world this can "easily" be achieved by providing a number of dedicated Virtual Machines or, more in line with the recent trends, an automated instance provisioning with Docker using a template image like the one built by our previous colleague Gianni Ceresa.

 Scripting Oracle Cloud Instance Management with PSM Cli

However, when we think about Oracle Analytics Cloud (OAC), we seem to have two problems:

  • There is a cost associated with every instance, thus minimizing the number of instances and the uptime is necessary
  • The OAC provisioning interface is point and click, thus automating the instance management seems impossible

The overall OAC instance cost can be mitigated by the Bring Your Own License (BYOL) licensing method which allows customers to migrate on-premises licenses to the cloud and have discounted prices on the hourly/monthly instance cost (more details here). However, since the target is to minimize the cost thus the # of instances and the uptime, we need to find a way to do so that doesn't rely on a human and a point and click interface. Luckily the PaaS Service Manager Command Line Interface (PSM Cli) allows us to solve this problem by providing a scriptable way of creating, starting and stopping instances.

PaaS Service Manager Command Line Interface

PSMCLI is a command line interface acting as a wrapper over the PaaS REST APIs. Its usage is not limited to OAC, the same interface can be used to create and manage instances of the Oracle's Database Cloud Service or Java Cloud Services amongst the others.
When talking about OAC please keep in mind that, as of now, PSM Cli works only with the non-autonomous version but I believe the Autonomous support will be added soon.

Installing and Configuring PSM Cli

PSMCLI has two prerequisites before it can be installed:

  • cURL - a command line utility to transfer data with URLs
  • Python 3.3 or later

Once both prerequisites are installed PSM can easily be downloaded with the following cURL call

curl -X GET -u <USER>:<PWD> -H X-ID-TENANT-NAME:<IDENTITY_DOMAIN> https://<REST_SERVER>/paas/core/api/v1.1/cli/<IDENTITY_DOMAIN>/client -o psmcli.zip

Where

  • <USER> and <PWD> are the credentials
  • <IDENTITY_DOMAIN> is the Identity Domain ID specified during the account creation
  • <REST_SERVER> is the REST API server name which is:
    • psm.us.oraclecloud.com if you are using a US datacenter
    • psm.aucom.oraclecloud.com if you are in the AuCom region
    • psm.europe.oraclecloud.com otherwise

Next step is to install PSM as a Python package with

pip3 install -U psmcli.zip  

After the installation is time for configuration

psm setup  

The configuration command will request the following information:

  • Oracle Cloud Username and Password
  • Identity Domain
  • Region, this need to be set to
    • emea if the RESTSERVER mentioned above contains emea
    • aucom if the RESTSERVER mentioned above contains aucom
    • us otherwise
  • Output format: the choice is between short, json and html
  • OAuth: the communication between the CLI and the REST API can use basic authentication (flag n) or OAuth (flag y). If OAuth is chosen then ClientID, Secret and Access Token need to be specified

A JSON profile file can also be used to provide the same information mentioned above. The structure of the file is the following

{ 
    "username":"<USER>",
    "password":"<PASSWORD>",
    "identityDomain":"<IDENTITY_DOMAIN>",
    "region":"<REGION>",
    "outputFormat":"<OUTPUT_FORMAT>",
    "oAuth":{ 
        "clientId":"",
        "clientSecret":"",
        "accessTokenServer":""
    }
}

If the profile is stored in a file profile.json the PSM configuration can be achieved by just executing

psm setup -c profile.json  

One quick note: the identity domain Id, shown in the Oracle Cloud header, isn't working if it's not the original name (name at the time of the creation).

 Scripting Oracle Cloud Instance Management with PSM Cli

In order to get the correct identity domain Id to use, check in an Oracle Cloud instance already created (e.g. a database one) and check the Details, you'll see the original identity domain listed there (credits to Pieter Van Puymbroeck).

 Scripting Oracle Cloud Instance Management with PSM Cli

Working With PSM Cli

Once the PSM has been correctly configured it's time to start checking what options are available, for a detailed list of the options check PSM documentation.

The PSM commands are product related, so each command is in the form:

psm <product> <command> <parameters>  

Where

  • product is the Oracle cloud product e.g. dbcs, analytics, BigDataAppliance, for a complete list use psm help
  • command is the action to be executed against the product e.g. services, stop, start, create-service
  • parameters is the list of parameters to pass depending on the command executed

The first step is to check what instances I already created, I can do so for the database by executing

psm dbcs services  

which, as expected, will list all my active instances

 Scripting Oracle Cloud Instance Management with PSM Cli

I can then start and stop it using:

psm dbcs start/stop/restart -s <INSTANCE_NAME>  

Which in my example provides the Id of the Job assigned to the stop operation.

 Scripting Oracle Cloud Instance Management with PSM Cli

When I check the status via the service command I get Maintenance like in the web UI.

 Scripting Oracle Cloud Instance Management with PSM Cli

The same applies to the start and restart operation. Please keep in mind that all the calls are asynchronous -> the command will call the related REST API and then return the associated Job ID without waiting for the command to be finished. The status of a job can be checked with:

psm dbcs operation-status -j <JOB_ID>  

 Scripting Oracle Cloud Instance Management with PSM Cli

The same operations described above are available on OAC with the same commands by simply changing the product from dbcs to analytics like:

psm analytics start/stop/restart -s <INSTANCE_NAME>  

On top of the basic operation, PSM Cli allows also the following:

  • Service Instance: start/stop/restart, instance creation-deletion
  • Access Control: lists, creates, deletes, enables and disables access rules for a service.
  • Scaling: changes the computer shape of an instance and allows scaling up/down.
  • Storage: extends the storage associated to OAC
  • Backup Configuration: updates/shows the backup configurations
  • Backups: lists, creates, deletes backups of the instance
  • Restore: restores a backup giving detailed information about it and the history of Restores
  • Patches: allows patching, rollbacking, doing pre-checks, and retrieving patching history
Creating an OAC Instance

So far we discussed the maintenance on already created instances with start/stop/restart commands, but PSM Cli allows also the creation of an instance via command line. The call is pretty simple:

psm analytics create-service -c <CONFIG_FILE> -of <OUTPUT_FORMAT>  

Where

  • CONFIG_FILE: is the file defining all OAC instance configurations
  • OUTPUT_FORMAT: is the desired output format between short, json and html

The question now is:

How do I create a Config File for OAC?

The documentation doesn't provide any help on this, but we can use the same approach as for on-premises OBIEE and response file: create the first instance with the Web-UI, save the payload for future use and change parameters when necessary.

 Scripting Oracle Cloud Instance Management with PSM Cli

On the Confirm screen, there is the option to Download the REST payload in JSON format

 Scripting Oracle Cloud Instance Management with PSM Cli

With the resulting json Config File being

{
  "edition": "<EDITION>",
  "vmPublicKeyText": "<SSH_TOKEN>",
  "enableNotification": "true",
  "notificationEmail": "<EMAIL>",
  "serviceVersion": "<VERSION>",
  "isBYOL": "false",
  "components": {
    "BI": {
      "adminUserPassword": "<ADMINPWD>",
      "adminUserName": "<ADMINUSER>",
      "analyticsStoragePassword": "<PWD>",
      "shape": "oc3",
      "createAnalyticsStorageContainer": "true",
      "profile_essbase": "false",
      "dbcsPassword": "<DBCSPWD>",
      "totalAnalyticsStorage": "280.0",
      "profile_bi": "true",
      "profile_dv_forced": "true",
      "analyticsStorageUser": "<EMAIL>",
      "dbcsUserName": "<DBUSER>",
      "dbcsPDBName": "<PDBNAME>",
      "dbcsName": "<DBCSNAME>",
      "idcs_enabled": "false",
      "analyticsStorageContainerURL": "<STORAGEURL>",
      "publicStorageEnabled": "false",
      "usableAnalyticsStorage": "180"
    }
  },
  "serviceLevel": "PAAS",
  "meteringFrequency": "HOURLY",
  "subscriptionId": "<SUBSCRIPTIONID>",
  "serviceName": "<SERVICENAME>"
}

This file can be stored and the parameters changed as necessary to create new OAC instances with the command:

psm analytics create-service -c <JSON_PAYLOAD_FILE> -of short/json/html  

As shown previously, the result of the call is a Job Id that can be monitored with

psm analytics operation-status -j <JOB_ID>  

Once the Job is finished successfully, the OAC instance is ready to be used. If at a certain point, the OAC instance is not needed anymore, it can be deleted via:

psm analytics delete-service -s <SERVICE_NAME> -n <DBA_NAME> -p <DBA_PWD>  

Where

  • SERVICE_NAME is the OAC instance name
  • DBA_NAME and DBA_PWD are the DBA credentials where OAC schemas are residing
Summary

Worried about providing development isolation in OAC while keeping the costs down? Not anymore! With PSM Cli you now have a way of creating instances on demand, start/stop, up/down scaling with a command line tool easily integrable with automation tools like Jenkins.

Create an OAC instances automatically only when features need to be developed or tested, stop&start the instances based on your workforce timetables, take the benefit of the cloud minimizing the cost associated to it by using PSM Cli!

One last note; for a full DevOps OAC implementation, PSM Cli is not sufficient: tasks like automated regression testing, code versioning, and promotion can't be managed directly with PSM Cli but require usage of external toolsets like Rittman Mead BI Developer Toolkit. If you are interested in a full DevOps implementation on OAC and understanding the details on how PSM Cli can be used in conjunction with Rittman Mead BI Developer Toolkit don't hesitate to contact us!

Categories: BI & Warehousing

Kscope18: It's a Wrap!

Thu, 2018-06-21 08:23
 It's a Wrap!

As announced few weeks back I represented Rittman Mead at ODTUG's Kscope18 hosted in the magnificent Walt Disney World Dolphin Resort. It's always hard to be credible when telling people you are going to Disneyworld for work but Kscope is a must-go event if you are in the Oracle landscape.

 It's a Wrap!

In the Sunday symposium Oracle PMs share hints about the products latest capabilities and roadmaps, then three full days of presentations spanning from the traditional Database, EPM and BI tracks to the new entries like Blockchain. On top of this the opportunity to be introduced to a network of Oracle experts including Oracle ACEs and Directors, PMs and people willing to share their experience with Oracle (and other) tools.

Sunday Symposium and Presentations

I attended the Oracle Analytics (BI and Essbase) Sunday Symposium run by Gabby Rubin and Matt Milella from Oracle. It was interesting to see the OAC product enhancements and roadmap as well as the feature catch-up in the latest release of OBIEE on-premises (version 12.2.1.4.0).

As expected, most of the push is towards OAC (Oracle Analytics Cloud): all new features will be developed there and eventually (but assurance on this) ported in the on-premises version. This makes a lot of sense from Oracle's point of view since it gives them the ability to produce new features quickly since they need to be tested only against a single set of HW/SW rather than the multitude they are supporting on-premises.

Most of the enhancements are expected in the Mode 2/Self Service BI area covered by Oracle Analytics Cloud Standard since a) this is the overall trend of the BI industry b) the features requested by traditional dashboard style reporting are well covered by OBIEE.
The following are just few of the items you could expect in future versions:

  • Recommendations during the data preparation phase like GeoLocation and Date enrichments
  • Data Flow enhancements like incremental updates or parametrized data-flows
  • New Visualizations and in general more control over the settings of the single charts.

In general Oracle's idea is to provide a single tool that meets both the needs of Mode 1 and Mode 2 Analytics (Self Service vs Centralized) rather than focusing on solving one need at a time like other vendors do.

Special mention to the Oracle Autonomous Analytics Cloud, released few weeks ago, that differs from traditional OAC for the fact that backups, patching and service monitoring are now managed automatically by Oracle thus releasing the customer from those tasks.

During the main conference days (mon-wed) I assisted a lot of very insightful presentations and the Oracle ACE Briefing which gave me ideas for future blog posts, so stay tuned! As written previously I had two sessions accepted for Kscope18: "Visualizing Streams" and "DevOps and OBIEE: Do it Before it's too late", in the following paragraph I'll share details (and link to the slides) of both.

Visualizing Streams

One of the latest trends in the data and analytics space is the transition from the old style batch based reporting systems which by design were adding a delay between the event creation and the appearance in the reporting to the concept of streaming: ingesting and delivering event information and analytics as soon as the event is created.

 It's a Wrap!

The session explains how the analytics space changed in recent times providing details on how to setup a modern analytical platform which includes streaming technologies like Apache Kafka, SQL based enrichment tools like Confluent's KSQL and connections to Self Service BI tools like Oracle's Data Visualization via sql-on-Hadoop technologies like Apache Drill. The slides of the session are available here.

DevOps and OBIEE: Do it Before it's Too Late

In the second session, slides here, I've been initially going through the motivations of applying DevOps principles to OBIEE: the self service BI wave started as a response to the long time to delivery associated with the old school centralized reporting projects. Huge monolithic sets of requirements to be delivered, no easy way to provide development isolation, manual testing and code promotion were only few of the stoppers for a fast delivery.

 It's a Wrap!

After an initial analysis of the default OBIEE developments methods, the presentation explains how to apply DevOps principles to an OBIEE (or OAC) environment and precisely:

  • Code versioning techniques
  • Feature-driven environment creation
  • Automated promotion
  • Automated regression testing

Providing details on how the Rittman Mead BI Developer Toolkit, partially described here, can act as an accelerator for the adoption of these practices in any custom OBIEE implementation and delivery process.

As mentioned before, the overall Kscope experience is great: plenty of technical presentation, roadmap information, networking opportunities and also much fun! Looking forward to Kscope19 in Seattle!

Categories: BI & Warehousing

Kscope18: It's a Wrap!

Thu, 2018-06-21 08:23
 It's a Wrap!

As announced few weeks back I represented Rittman Mead at ODTUG's Kscope18 hosted in the magnificent Walt Disney World Dolphin Resort. It's always hard to be credible when telling people you are going to Disneyworld for work but Kscope is a must-go event if you are in the Oracle landscape.

 It's a Wrap!

In the Sunday symposium Oracle PMs share hints about the products latest capabilities and roadmaps, then three full days of presentations spanning from the traditional Database, EPM and BI tracks to the new entries like Blockchain. On top of this the opportunity to be introduced to a network of Oracle experts including Oracle ACEs and Directors, PMs and people willing to share their experience with Oracle (and other) tools.

Sunday Symposium and Presentations

I attended the Oracle Analytics (BI and Essbase) Sunday Symposium run by Gabby Rubin and Matt Milella from Oracle. It was interesting to see the OAC product enhancements and roadmap as well as the feature catch-up in the latest release of OBIEE on-premises (version 12.2.1.4.0).

As expected, most of the push is towards OAC (Oracle Analytics Cloud): all new features will be developed there and eventually (but assurance on this) ported in the on-premises version. This makes a lot of sense from Oracle's point of view since it gives them the ability to produce new features quickly since they need to be tested only against a single set of HW/SW rather than the multitude they are supporting on-premises.

Most of the enhancements are expected in the Mode 2/Self Service BI area covered by Oracle Analytics Cloud Standard since a) this is the overall trend of the BI industry b) the features requested by traditional dashboard style reporting are well covered by OBIEE.
The following are just few of the items you could expect in future versions:

  • Recommendations during the data preparation phase like GeoLocation and Date enrichments
  • Data Flow enhancements like incremental updates or parametrized data-flows
  • New Visualizations and in general more control over the settings of the single charts.

In general Oracle's idea is to provide a single tool that meets both the needs of Mode 1 and Mode 2 Analytics (Self Service vs Centralized) rather than focusing on solving one need at a time like other vendors do.

Special mention to the Oracle Autonomous Analytics Cloud, released few weeks ago, that differs from traditional OAC for the fact that backups, patching and service monitoring are now managed automatically by Oracle thus releasing the customer from those tasks.

During the main conference days (mon-wed) I assisted a lot of very insightful presentations and the Oracle ACE Briefing which gave me ideas for future blog posts, so stay tuned! As written previously I had two sessions accepted for Kscope18: "Visualizing Streams" and "DevOps and OBIEE: Do it Before it's too late", in the following paragraph I'll share details (and link to the slides) of both.

Visualizing Streams

One of the latest trends in the data and analytics space is the transition from the old style batch based reporting systems which by design were adding a delay between the event creation and the appearance in the reporting to the concept of streaming: ingesting and delivering event information and analytics as soon as the event is created.

 It's a Wrap!

The session explains how the analytics space changed in recent times providing details on how to setup a modern analytical platform which includes streaming technologies like Apache Kafka, SQL based enrichment tools like Confluent's KSQL and connections to Self Service BI tools like Oracle's Data Visualization via sql-on-Hadoop technologies like Apache Drill. The slides of the session are available here.

DevOps and OBIEE: Do it Before it's Too Late

In the second session, slides here, I've been initially going through the motivations of applying DevOps principles to OBIEE: the self service BI wave started as a response to the long time to delivery associated with the old school centralized reporting projects. Huge monolithic sets of requirements to be delivered, no easy way to provide development isolation, manual testing and code promotion were only few of the stoppers for a fast delivery.

 It's a Wrap!

After an initial analysis of the default OBIEE developments methods, the presentation explains how to apply DevOps principles to an OBIEE (or OAC) environment and precisely:

  • Code versioning techniques
  • Feature-driven environment creation
  • Automated promotion
  • Automated regression testing

Providing details on how the Rittman Mead BI Developer Toolkit, partially described here, can act as an accelerator for the adoption of these practices in any custom OBIEE implementation and delivery process.

As mentioned before, the overall Kscope experience is great: plenty of technical presentation, roadmap information, networking opportunities and also much fun! Looking forward to Kscope19 in Seattle!

Categories: BI & Warehousing

ChitChat for OBIEE - Now Available as Open Source!

Fri, 2018-06-15 03:20

ChitChat is the Rittman Mead commentary tool for OBIEE. ChitChat enhances the BI experience by bridging conversational capabilities into the BI dashboard, increasing ease-of-use and seamlessly joining current workflows. From tracking the history behind analytical results to commenting on specific reports, ChitChat provides a multi-tiered platform built into the BI dashboard that creates a more collaborative and dynamic environment for discussion.

Today we're pleased to announce the release into open-source of ChitChat! You can find the github repository here: https://github.com/RittmanMead/ChitChat

Highlights of the features that ChitChat provides includes:

  • Annotate - ChitChat's multi-tiered annotation capabilities allow BI users to leave comments where they belong, at the source of the conversation inside the BI ecosystem.

  • Document - ChitChat introduces the ability to include documentation inside your BI environment for when you need more that a comment. Keeping key materials contained inside the dashboard gives the right people access to key information without searching.

  • Share - ChitChat allows to bring attention to important information on the dashboard using the channel or workflow manager you prefer.

  • Verified Compatibility - ChitChat has been tested against popular browsers, operating systems, and database platforms for maximum compatibility.

Getting Started

In order to use ChitChat you will need OBIEE 11.1.1.7.x, 11.1.1.9.x or 12.2.1.x.

First, download the application and unzip it to a convenient access location in the OBIEE server, such as a home directory or the desktop.

See the Installation Guide for full detail on how to install ChitChat.

Database Setup

Build the required database tables using the installer:

cd /home/federico/ChitChatInstaller
java -jar SocializeInstaller.jar -Method:BuildDatabase -DatabasePath:/app/oracle/oradata/ORCLDB/ORCLPDB1/ -JDBC:"jdbc:oracle:thin:@192.168.0.2:1521/ORCLPDB1" -DatabaseUser:"sys as sysdba" -DatabasePassword:password -NewDBUserPassword:password1

The installer will create a new user (RMREP), and tables required for the application to operate correctly. -DatabasePath flag tells the installer where to place the datafiles for ChitChat in your database server. -JDBC indicates what JDBC driver to use, followed by a colon and the JDBC string to connect to your database. -DatabaseUser specifies the user to access the database with. -DatabasePassword specifies the password for the user previously given. -NewDBUserPassword indicates the password for the new user (RMREP) being created.

WebLogic Data Source Setup

Add a Data Source object to WebLogic using WLST:

cd /home/federico/ChitChatInstaller/jndiInstaller
$ORACLE_HOME/oracle_common/common/bin/wlst.sh ./create-ds.py

To use this script, modify the ds.properties file using the method of your choice. The following parameters must be updated to reflect your installation: domain.name, admin.url, admin.userName, admin.password, datasource.target, datasource.url and datasource.password.

Deploying the Application on WebLogic

Deploy the application to WebLogic using WLST:

cd /home/federico/ChitChatInstaller
$ORACLE_HOME/oracle_common/common/bin/wlst.sh ./deploySocialize.py

To use this script, modify the deploySocialize.py file using the method of your choice. The first line must be updated with username, password and url to connect to your Weblogic Server instance. The second parameter in deploy command must be updated to reflect your ChitChat access location.

Configuring the Application

ChitChat requires several several configuration parameters to allow the application to operate successfully. To change the configuration, you must log in to the database schema as the RMREP user, and update the values manually into the APPLICATION_CONSTANT table.

See the Installation Guide for full detail on the available configuration and integration options.

Enabling the Application

To use ChitChat, you must add a small block of code on any given dashboard (in a new column on the right-side of the dashboard) where you want to have the application enabled:

<rm id="socializePageParams"
user="@{biServer.variables['NQ_SESSION.USER']}"
tab="@{dashboard.currentPage.name}"
page="@{dashboard.name}">
</rm>
<script src="/Socialize/js/dashboard.js"></script>

Congratulations! You have successfully installed the Rittman Mead commentary tool. To use the application to its fullest capabilities, please refer to the User Guide.

Problems?

Please raise any issues on the github issue tracker. This is open source, so bear in mind that it's no-one's "job" to maintain the code - it's open to the community to use, benefit from, and maintain.

If you'd like specific help with an implementation, Rittman Mead would be delighted to assist - please do get in touch with Jon Mead or DM us on Twitter @rittmanmead to get access to our Slack channel for support about ChitChat.

Please contact us on the same channels to request a demo.

Categories: BI & Warehousing

ChitChat for OBIEE - Now Available as Open Source!

Fri, 2018-06-15 03:20

ChitChat is the Rittman Mead commentary tool for OBIEE. ChitChat enhances the BI experience by bridging conversational capabilities into the BI dashboard, increasing ease-of-use and seamlessly joining current workflows. From tracking the history behind analytical results to commenting on specific reports, ChitChat provides a multi-tiered platform built into the BI dashboard that creates a more collaborative and dynamic environment for discussion.

Today we're pleased to announce the release into open-source of ChitChat! You can find the github repository here: https://github.com/RittmanMead/ChitChat

Highlights of the features that ChitChat provides includes:

  • Annotate - ChitChat's multi-tiered annotation capabilities allow BI users to leave comments where they belong, at the source of the conversation inside the BI ecosystem.

  • Document - ChitChat introduces the ability to include documentation inside your BI environment for when you need more that a comment. Keeping key materials contained inside the dashboard gives the right people access to key information without searching.

  • Share - ChitChat allows to bring attention to important information on the dashboard using the channel or workflow manager you prefer.

  • Verified Compatibility - ChitChat has been tested against popular browsers, operating systems, and database platforms for maximum compatibility.

Getting Started

In order to use ChitChat you will need OBIEE 11.1.1.7.x, 11.1.1.9.x or 12.2.1.x.

First, download the application and unzip it to a convenient access location in the OBIEE server, such as a home directory or the desktop.

See the Installation Guide for full detail on how to install ChitChat.

Database Setup

Build the required database tables using the installer:

cd /home/federico/ChitChatInstaller  
java -jar SocializeInstaller.jar -Method:BuildDatabase -DatabasePath:/app/oracle/oradata/ORCLDB/ORCLPDB1/ -JDBC:"jdbc:oracle:thin:@192.168.0.2:1521/ORCLPDB1" -DatabaseUser:"sys as sysdba" -DatabasePassword:password -NewDBUserPassword:password1  

The installer will create a new user (RMREP), and tables required for the application to operate correctly. -DatabasePath flag tells the installer where to place the datafiles for ChitChat in your database server. -JDBC indicates what JDBC driver to use, followed by a colon and the JDBC string to connect to your database. -DatabaseUser specifies the user to access the database with. -DatabasePassword specifies the password for the user previously given. -NewDBUserPassword indicates the password for the new user (RMREP) being created.

WebLogic Data Source Setup

Add a Data Source object to WebLogic using WLST:

cd /home/federico/ChitChatInstaller/jndiInstaller  
$ORACLE_HOME/oracle_common/common/bin/wlst.sh ./create-ds.py

To use this script, modify the ds.properties file using the method of your choice. The following parameters must be updated to reflect your installation: domain.name, admin.url, admin.userName, admin.password, datasource.target, datasource.url and datasource.password.

Deploying the Application on WebLogic

Deploy the application to WebLogic using WLST:

cd /home/federico/ChitChatInstaller  
$ORACLE_HOME/oracle_common/common/bin/wlst.sh ./deploySocialize.py

To use this script, modify the deploySocialize.py file using the method of your choice. The first line must be updated with username, password and url to connect to your Weblogic Server instance. The second parameter in deploy command must be updated to reflect your ChitChat access location.

Configuring the Application

ChitChat requires several several configuration parameters to allow the application to operate successfully. To change the configuration, you must log in to the database schema as the RMREP user, and update the values manually into the APPLICATION_CONSTANT table.

See the Installation Guide for full detail on the available configuration and integration options.

Enabling the Application

To use ChitChat, you must add a small block of code on any given dashboard (in a new column on the right-side of the dashboard) where you want to have the application enabled:

<rm id="socializePageParams"  
user="@{biServer.variables['NQ_SESSION.USER']}"  
tab="@{dashboard.currentPage.name}"  
page="@{dashboard.name}">  
</rm>  
<script src="/Socialize/js/dashboard.js"></script>  

Congratulations! You have successfully installed the Rittman Mead commentary tool. To use the application to its fullest capabilities, please refer to the User Guide.

Problems?

Please raise any issues on the github issue tracker. This is open source, so bear in mind that it's no-one's "job" to maintain the code - it's open to the community to use, benefit from, and maintain.

If you'd like specific help with an implementation, Rittman Mead would be delighted to assist - please do get in touch with Jon Mead or DM us on Twitter @rittmanmead to get access to our Slack channel for support about ChitChat.

Please contact us on the same channels to request a demo.

Categories: BI & Warehousing

Real-time Sailing Yacht Performance - Kafka (Part 2)

Thu, 2018-06-14 08:18

In the last two blogs Getting Started (Part 1) and Stepping back a bit (Part 1.1) I looked at what data I could source from the boat's instrumentation and introduced some new hardware to the boat to support the analysis.

Just to recap I am looking to create the yachts Polars with a view to improving our knowledge of her abilities (whether we can use this to improve our race performance is another matter).

Polars give us a plot of the boat's speed given a true wind speed and angle. This, in turn, informs us of the optimal speed the boat could achieve at any particular angle to wind and wind speed.

Image Description

In the first blog I wrote a reader in Python that takes messages from a TCP/IP feed and writes the data to a file. The reader is able, using a hash key to validate each message (See Getting Started (Part 1)). I'm also converting valid messages into a JSON format so that I can push meaningful structured data downstream. In this blog, I'll cover the architecture and considerations around the setup of Kafka for this use case. I will not cover the installation of each component, there has been a lot written in this area. (We have some internal IP to help with configuration). I discuss the process I went through to get the data in real time displayed in a Grafana dashboard.

Introducing Kafka

I have introduced Kafka into the architecture as a next step.

Why Kafka?

I would like to be able to stream this data real time and don't want to build my own batch mechanism or create a publish/ subscribe model. With Kafka I don't need to check that messages have been successfully received and if there is a failure while consuming messages the consumers will keep track of what has been consumed. If a consumer fails it can be restarted and it will pick up where it left off (consumer offset stored in Kafka as a topic). In the future, I could scale out the platform and introduce some resilience through clustering and replication (this shouldn't be required for a while). Kafka therefore is saving me a lot of manual engineering and will support future growth (should I come into money and am able to afford more sensors for the boat).

High level architecture

Let's look at the high-level components and how they fit together. Firstly I have the instruments transmitting on wireless TCP/IP and these messages are read using my Python I wrote earlier in the year.

I have enhanced the Python I wrote to read and translate the messages and instead of writing to a file I stream the JSON messages to a topic in Kafka.

Once the messages are in Kafka I use Kafka Connect to stream the data into InfluxDB. The messages are written to topic-specific measurements (tables in InfluxdDB).

Grafana is used to display incoming messages in real time.

Kafka components

I am running the application on a MacBook Pro. Basically a single node instance with zookeeper, Kafka broker and a Kafka connect worker. This is the minimum setup with very little resilience.

Image Description In summary

ZooKeeper is an open-source server that enables distributed coordination of configuration information. In the Kafka architecture ZooKeeper stores metadata about brokers, topics, partitions and their locations.
ZooKeeper is configured in zookeeper.properties.

Kafka broker is a single Kafka server.

"The broker receives messages from producers, assigns offsets to them, and commits the messages to storage on disk. It also services consumers, responding to fetch requests for partitions and responding with the messages that have been committed to disk." [1]

The broker is configured in server.properties. In this setup I have set auto.create.topics.enabled=false. Setting this to false gives me control over the environment as the name suggests it disables the auto-creation of a topic which in turn could lead to confusion.

Kafka connect worker allows us to take advantage of predefined connectors that enable the writing of messages to known external datastores from Kafka. The worker is a wrapper around a Kafka consumer. A consumer is able to read messages from a topic partition using offsets. Offsets keep track of what has been read by a particular consumer or consumer group. (Kafka connect workers can also write to Kafka from datastores but I am not using this functionality in this instance). The connect worker is configured in connect-distributed-properties. I have defined the location of the plugins in this configuration file. Connector definitions are used to determine how to write to an external data source.

Producer to InfluxDB

I use kafka-python to stream the messages into kafka. Within kafka-python there is a KafkaProducer that is intended to work in a similar way to the official java client.

I have created a producer for each message type (parameterised code). Although each producer reads the entire stream from the TCP/IP port it only processes it's assigned message type (wind or speed) this increasing parallelism and therefore throughput.

  producer = KafkaProducer(bootstrap_servers='localhost:9092' , value_serializer=lambda v: json.dumps(v).encode('utf-8'))
  producer.send(topic, json_str) 

I have created a topic per message type with a single partition. Using a single partition per topic guarantees I will consume messages in the order they arrive. There are other ways to increase the number of partitions and still maintain the read order but for this use case a topic per message type seemed to make sense. I basically have optimised throughput (well enough for the number of messages I am trying to process).

kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic wind-json

kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic speed-json

kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic gps-json 

When defining a topic you specify the replaication-factor and the number of partitions.

The topic-level configuration is replication.factor. At the broker level, you control the default.replication.factor for automatically created topics. [1:1] (I have turned off the automatic creation of topics).

The messages are consumed using Stream reactor which has an InfluxDB sink mechanism and writes directly to the measurements within a performance database I have created. The following parameters showing the topics and inset mechanism are configured in performance.influxdb-sink.properties.

topics=wind-json,speed-json,gps-json

connect.influx.kcql=INSERT INTO wind SELECT * FROM wind-json WITHTIMESTAMP sys_time();INSERT INTO speed SELECT * FROM speed-json WITHTIMESTAMP sys_time();INSERT INTO gps SELECT * FROM gps-json WITHTIMESTAMP sys_time()

The following diagram shows the detail from producer to InfluxDB.

If we now run the producers we get data streaming through the platform.

Producer Python log showing JSON formatted messages:

Status of consumers show minor lag reading from two topics, the describe also shows the current offsets for each consumer task and partitions being consumed (if we had a cluster it would show multiple hosts):

Inspecting the InfluxDB measurements:

When inserting into a measurement in InfluxDB if the measurement does not exist it gets created automatically. The datatypes of the fields are determined from the JSON object being inserted. I needed to adjust the creation of the JSON message to cast the values to floats otherwise I ended up with the wrong types. This caused reporting issues in Grafana. This would be a good case for using Avro and Schema Registry to handle these definitions.

The following gif shows Grafana displaying some of the wind and speed measurements using a D3 Gauge plugin with the producers running to the right of the dials.

Next Steps

I'm now ready to do some real-life testing on our next sailing passage.

In the next blog, I will look at making the setup more resilient to failure and how to monitor and automatically recover from some of these failures. I will also introduce the WorldMap pannel to Grafana so I can plot the location the readings were taken and overlay tidal data.

References
  1. Kafka the definitive guide ↩︎ ↩︎

Categories: BI & Warehousing

Real-time Sailing Yacht Performance - Kafka (Part 2)

Thu, 2018-06-14 08:18

In the last two blogs Getting Started (Part 1) and Stepping back a bit (Part 1.1) I looked at what data I could source from the boat's instrumentation and introduced some new hardware to the boat to support the analysis.

Just to recap I am looking to create the yachts Polars with a view to improving our knowledge of her abilities (whether we can use this to improve our race performance is another matter).

Polars give us a plot of the boat's speed given a true wind speed and angle. This, in turn, informs us of the optimal speed the boat could achieve at any particular angle to wind and wind speed.

Image Description

In the first blog I wrote a reader in Python that takes messages from a TCP/IP feed and writes the data to a file. The reader is able, using a hash key to validate each message (See Getting Started (Part 1)). I'm also converting valid messages into a JSON format so that I can push meaningful structured data downstream. In this blog, I'll cover the architecture and considerations around the setup of Kafka for this use case. I will not cover the installation of each component, there has been a lot written in this area. (We have some internal IP to help with configuration). I discuss the process I went through to get the data in real time displayed in a Grafana dashboard.

Introducing Kafka

I have introduced Kafka into the architecture as a next step.

Why Kafka?

I would like to be able to stream this data real time and don't want to build my own batch mechanism or create a publish/ subscribe model. With Kafka I don't need to check that messages have been successfully received and if there is a failure while consuming messages the consumers will keep track of what has been consumed. If a consumer fails it can be restarted and it will pick up where it left off (consumer offset stored in Kafka as a topic). In the future, I could scale out the platform and introduce some resilience through clustering and replication (this shouldn't be required for a while). Kafka therefore is saving me a lot of manual engineering and will support future growth (should I come into money and am able to afford more sensors for the boat).

High level architecture

Let's look at the high-level components and how they fit together. Firstly I have the instruments transmitting on wireless TCP/IP and these messages are read using my Python I wrote earlier in the year.

I have enhanced the Python I wrote to read and translate the messages and instead of writing to a file I stream the JSON messages to a topic in Kafka.

Once the messages are in Kafka I use Kafka Connect to stream the data into InfluxDB. The messages are written to topic-specific measurements (tables in InfluxdDB).

Grafana is used to display incoming messages in real time.

Kafka components

I am running the application on a MacBook Pro. Basically a single node instance with zookeeper, Kafka broker and a Kafka connect worker. This is the minimum setup with very little resilience.

Image Description

In summary

ZooKeeper is an open-source server that enables distributed coordination of configuration information. In the Kafka architecture ZooKeeper stores metadata about brokers, topics, partitions and their locations. ZooKeeper is configured in zookeeper.properties.

Kafka broker is a single Kafka server.

"The broker receives messages from producers, assigns offsets to them, and commits the messages to storage on disk. It also services consumers, responding to fetch requests for partitions and responding with the messages that have been committed to disk." 1

The broker is configured in server.properties. In this setup I have set auto.create.topics.enabled=false. Setting this to false gives me control over the environment as the name suggests it disables the auto-creation of a topic which in turn could lead to confusion.

Kafka connect worker allows us to take advantage of predefined connectors that enable the writing of messages to known external datastores from Kafka. The worker is a wrapper around a Kafka consumer. A consumer is able to read messages from a topic partition using offsets. Offsets keep track of what has been read by a particular consumer or consumer group. (Kafka connect workers can also write to Kafka from datastores but I am not using this functionality in this instance). The connect worker is configured in connect-distributed-properties. I have defined the location of the plugins in this configuration file. Connector definitions are used to determine how to write to an external data source.

Producer to InfluxDB

I use kafka-python to stream the messages into kafka. Within kafka-python there is a KafkaProducer that is intended to work in a similar way to the official java client.

I have created a producer for each message type (parameterised code). Although each producer reads the entire stream from the TCP/IP port it only processes it's assigned message type (wind or speed) this increasing parallelism and therefore throughput.

  producer = KafkaProducer(bootstrap_servers='localhost:9092' , value_serializer=lambda v: json.dumps(v).encode('utf-8'))
  producer.send(topic, json_str) 

I have created a topic per message type with a single partition. Using a single partition per topic guarantees I will consume messages in the order they arrive. There are other ways to increase the number of partitions and still maintain the read order but for this use case a topic per message type seemed to make sense. I basically have optimised throughput (well enough for the number of messages I am trying to process).

kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic wind-json

kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic speed-json

kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic gps-json 

When defining a topic you specify the replaication-factor and the number of partitions.

The topic-level configuration is replication.factor. At the broker level, you control the default.replication.factor for automatically created topics. 1 (I have turned off the automatic creation of topics).

The messages are consumed using Stream reactor which has an InfluxDB sink mechanism and writes directly to the measurements within a performance database I have created. The following parameters showing the topics and inset mechanism are configured in performance.influxdb-sink.properties.

topics=wind-json,speed-json,gps-json

connect.influx.kcql=INSERT INTO wind SELECT * FROM wind-json WITHTIMESTAMP sys_time();INSERT INTO speed SELECT * FROM speed-json WITHTIMESTAMP sys_time();INSERT INTO gps SELECT * FROM gps-json WITHTIMESTAMP sys_time()

The following diagram shows the detail from producer to InfluxDB.

If we now run the producers we get data streaming through the platform.

Producer Python log showing JSON formatted messages:

Status of consumers show minor lag reading from two topics, the describe also shows the current offsets for each consumer task and partitions being consumed (if we had a cluster it would show multiple hosts):

Inspecting the InfluxDB measurements:

When inserting into a measurement in InfluxDB if the measurement does not exist it gets created automatically. The datatypes of the fields are determined from the JSON object being inserted. I needed to adjust the creation of the JSON message to cast the values to floats otherwise I ended up with the wrong types. This caused reporting issues in Grafana. This would be a good case for using Avro and Schema Registry to handle these definitions.

The following gif shows Grafana displaying some of the wind and speed measurements using a D3 Gauge plugin with the producers running to the right of the dials.

Next Steps

I'm now ready to do some real-life testing on our next sailing passage.

In the next blog, I will look at making the setup more resilient to failure and how to monitor and automatically recover from some of these failures. I will also introduce the WorldMap pannel to Grafana so I can plot the location the readings were taken and overlay tidal data.

References
Categories: BI & Warehousing

OAC - Thoughts on Moving to the Cloud

Tue, 2018-06-12 12:43

Last week, I spent a couple of days with Oracle at Thames Valley Park and this presented me with a perfect opportunity to sit down and get to grips with the full extent of the Oracle Analytics Cloud (OAC) suite...without having to worry about client requirements or project deadlines!

As a company, Rittman Mead already has solid experience of OAC, but my personal exposure has been limited to presentations, product demonstrations, reading the various postings in the blog community and my existing experiences of Data Visualisation and BI cloud services (DVCS and BICS respectively). You’ll find Francesco’s post a good starting place if you need an overview of OAC and how it differs (or aligns) to Data Visualisation and BI Cloud Services.

So, having spent some time looking at the overall suite and, more importantly, trying to interpret what it could mean for organisations thinking about making a move to the cloud, here are my top three takeaways:

Clouds Come In Different Shapes and Flavours

Two of the main benefits that a move to the cloud offers are simplification in platform provisioning and an increase in flexibility, being able to ramp up or scale down resources at will. These both comes with a potential cost benefit, depending on your given scenario and requirement. The first step is understanding the different options in the OAC licensing and feature matrix.

First, we need to draw a distinction between Analytics Cloud and the Autonomous Analytics Cloud (interestingly, both options point to the same page on cloud.oracle.com, which makes things immediately confusing!). In a nutshell though, the distinction comes down to who takes responsibility for the service management: Autonomous Analytics Cloud is managed by Oracle, whilst Analytics Cloud is managed by yourself. It’s interesting to note that the Autonomous offering is marginally cheaper.

Next, Oracle have chosen to extend their BYOL (Bring Your Own License) option from their IaaS services to now incorporate PaaS services. This means that if you have existing licenses for the on-premise software, then you are able to take advantage of what appears to be a significantly discounted cost. Clearly, this is targeted to incentivise existing Oracle customers to make the leap into the Cloud, and should be considered against your ongoing annual support fees.

Since the start of the year, Analytics Cloud now comes in three different versions, with the Standard and Enterprise editions now being separated by the new Data Lake edition. The important things to note are that (possibly confusingly) Essbase is now incorporated into the Data Lake edition of the Autonomous Analytics Cloud and that for the full enterprise capability you have with OBIEE, you will need the Enterprise edition. Each version inherits the functionality of its preceding version: Enterprise edition gives you everything in the Data Lake edition; Data Lake edition incorporates everything in the Standard edition.

alt

Finally, it’s worth noting that OAC aligns to the Universal Credit consumption model, whereby the cost is determined based on the size and shape of the cloud that you need. Services can be purchased as Pay as You Go or Monthly Flex options (with differential costing to match). The PAYG model is based on hourly consumption and is paid for in arrears, making it the obvious choice for short term prototyping or POC activities. Conversely, the Monthly Flex model is paid in advance and requires a minimum 12 month investment and therefore makes sense for full scale implementations. Then, the final piece of the jigsaw comes with the shape of the service you consume. This is measured in OCPU’s (Oracle Compute Units) and the larger your memory requirements, the more OCPU’s you consume.

Where You Put Your Data Will Always Matter

Moving your analytics platform into the cloud may make a lot of sense and could therefore be a relatively simple decision to make. However, the question of where your data resides is a more challenging subject, given the sensitivities and increasing legislative constraints that exist around where your data can or should be stored. The answer to that question will influence the performance and data latency you can expect from your analytics platform.

OAC is architected to be flexible when it comes to its data sources and consequently the options available for data access are pretty broad. At a high level, your choices are similar to those you would have when implementing on-premise, namely:

  • perform ELT processing to transform and move the data (into the cloud);
  • replicate data from source to target (in the cloud) or;
  • query data sources via direct access.

These are supplemented by a fourth option to use the inbuilt Data Connectors available in OAC to connect to cloud or on-premise databases, other proprietary platforms or any other source accessible via JDBC. This is probably a decent path for exploratory data usage within DV, but I’m not sure it would always make the best long term option.

alt

Unsurprisingly, with the breadth of options comes a spectrum of tooling that can be used for shifting your data around and it is important to note that depending on your approach, additional cloud services may or may not be required.

For accessing data directly at its source, the preferred route seems to be to use RDC (Remote Data Connector), although it is worth noting that support is limited to Oracle (including OLAP), SQL Server, Teradata or DB2 databases. Also, be aware that RDC operates within WebLogic Server and so this will be needed within the on-premise network.

Data replication is typically achieved using Data Sync (the reincarnation of the DAC, which OBIA implementers will already be familiar with), although it is worth mentioning that there are other routes that could be taken, such as APEX or SQL Developer, depending on the data volumes and latency you have to play with.

Classic ELT processing can be achieved via Oracle Data Integrator (either the Cloud Service, a traditional on-premise implementation or a hybrid-model).

Ultimately, due care and attention needs to be taken when deciding on your data architecture as this will have a fundamental effect on the simplicity with which data can be accessed and interpreted, the query performance achieved and the data latency built into your analytics.

Data Flows Make For Modern Analytics Simplification

A while back, I wrote a post titled Enabling a Modern Analytics Platform in which I attempted to describe ways that Mode 1 (departmental) and Mode 2 (enterprise) analytics could be built out to support each other, as opposed to undermining one another. One of the key messages I made was the importance of having an effective mechanism for transitioning your Mode 1 outputs back into Mode 2 as seamlessly as possible. (The same is true in reverse for making enterprise data available as an Mode 1 input.)

One of the great things about OAC is how it serves to simplify this transition. Users are able to create analytic content based on data sourced from a broad range of locations: at the simplest level, Data Sets can be built from flat files or via one of the available Data Connectors to relational, NoSQL, proprietary database or Essbase sources. Moreover, enterprise curated metadata (via RPD lift-and-shift from an on-premise implementation) or analyst developed Subject Areas can be exposed. These sources can be ‘mashed’ together directly in a DV project or, for more complex or repeatable actions, Data Flows can be created to build Data Sets. Data Flows are pretty powerful, not only allowing users to join disparate data but also perform some useful data preparation activities, ranging from basic filtering, aggregation and data manipulation actions to more complex sentiment analysis, forecasting and even some machine learning modelling features. Importantly, Data Flows can be set to output their results to disk, either written to a Data Set or even to a database table and they can be scheduled for repetitive refresh.

For me, one of the most important things about the Data Flows feature is that it provides a clear and understandable interface which shows the sequencing of each of the data preparation stages, providing valuable information for any subsequent reverse engineering of the processing back into the enterprise data architecture.

alt

In summary, there are plenty of exciting and innovative things happening with Oracle Analytics in the cloud and as time marches on, the case for moving to the cloud in one shape or form will probably get more and more compelling. However, beyond a strategic decision to ‘Go Cloud’, there are many options and complexities that need to be addressed in order to make a successful start to your journey - some technical, some procedural and some organisational. Whilst a level of planning and research will undoubtedly smooth the path, the great thing about the cloud services is that they are comparatively cheap and easy to initiate, so getting on and building a prototype is always going to be a good, exploratory starting point.

Categories: BI & Warehousing

OAC - Thoughts on Moving to the Cloud

Tue, 2018-06-12 12:43

Last week, I spent a couple of days with Oracle at Thames Valley Park and this presented me with a perfect opportunity to sit down and get to grips with the full extent of the Oracle Analytics Cloud (OAC) suite...without having to worry about client requirements or project deadlines!

As a company, Rittman Mead already has solid experience of OAC, but my personal exposure has been limited to presentations, product demonstrations, reading the various postings in the blog community and my existing experiences of Data Visualisation and BI cloud services (DVCS and BICS respectively). You’ll find Francesco’s post a good starting place if you need an overview of OAC and how it differs (or aligns) to Data Visualisation and BI Cloud Services.

So, having spent some time looking at the overall suite and, more importantly, trying to interpret what it could mean for organisations thinking about making a move to the cloud, here are my top three takeaways:

Clouds Come In Different Shapes and Flavours

Two of the main benefits that a move to the cloud offers are simplification in platform provisioning and an increase in flexibility, being able to ramp up or scale down resources at will. These both comes with a potential cost benefit, depending on your given scenario and requirement. The first step is understanding the different options in the OAC licensing and feature matrix.

First, we need to draw a distinction between Analytics Cloud and the Autonomous Analytics Cloud (interestingly, both options point to the same page on cloud.oracle.com, which makes things immediately confusing!). In a nutshell though, the distinction comes down to who takes responsibility for the service management: Autonomous Analytics Cloud is managed by Oracle, whilst Analytics Cloud is managed by yourself. It’s interesting to note that the Autonomous offering is marginally cheaper.

Next, Oracle have chosen to extend their BYOL (Bring Your Own License) option from their IaaS services to now incorporate PaaS services. This means that if you have existing licenses for the on-premise software, then you are able to take advantage of what appears to be a significantly discounted cost. Clearly, this is targeted to incentivise existing Oracle customers to make the leap into the Cloud, and should be considered against your ongoing annual support fees.

Since the start of the year, Analytics Cloud now comes in three different versions, with the Standard and Enterprise editions now being separated by the new Data Lake edition. The important things to note are that (possibly confusingly) Essbase is now incorporated into the Data Lake edition of the Autonomous Analytics Cloud and that for the full enterprise capability you have with OBIEE, you will need the Enterprise edition. Each version inherits the functionality of its preceding version: Enterprise edition gives you everything in the Data Lake edition; Data Lake edition incorporates everything in the Standard edition.

alt

Finally, it’s worth noting that OAC aligns to the Universal Credit consumption model, whereby the cost is determined based on the size and shape of the cloud that you need. Services can be purchased as Pay as You Go or Monthly Flex options (with differential costing to match). The PAYG model is based on hourly consumption and is paid for in arrears, making it the obvious choice for short term prototyping or POC activities. Conversely, the Monthly Flex model is paid in advance and requires a minimum 12 month investment and therefore makes sense for full scale implementations. Then, the final piece of the jigsaw comes with the shape of the service you consume. This is measured in OCPU’s (Oracle Compute Units) and the larger your memory requirements, the more OCPU’s you consume.

Where You Put Your Data Will Always Matter

Moving your analytics platform into the cloud may make a lot of sense and could therefore be a relatively simple decision to make. However, the question of where your data resides is a more challenging subject, given the sensitivities and increasing legislative constraints that exist around where your data can or should be stored. The answer to that question will influence the performance and data latency you can expect from your analytics platform.

OAC is architected to be flexible when it comes to its data sources and consequently the options available for data access are pretty broad. At a high level, your choices are similar to those you would have when implementing on-premise, namely:

  • perform ELT processing to transform and move the data (into the cloud);
  • replicate data from source to target (in the cloud) or;
  • query data sources via direct access.

These are supplemented by a fourth option to use the inbuilt Data Connectors available in OAC to connect to cloud or on-premise databases, other proprietary platforms or any other source accessible via JDBC. This is probably a decent path for exploratory data usage within DV, but I’m not sure it would always make the best long term option.

alt

Unsurprisingly, with the breadth of options comes a spectrum of tooling that can be used for shifting your data around and it is important to note that depending on your approach, additional cloud services may or may not be required.

For accessing data directly at its source, the preferred route seems to be to use RDC (Remote Data Connector), although it is worth noting that support is limited to Oracle (including OLAP), SQL Server, Teredata or DB2 databases. Also, be aware that RDC operates within WebLogic Server and so this will be needed within the on-premise network.

Data replication is typically achieved using Data Sync (the reincarnation of the DAC, which OBIA implementers will already be familiar with), although it is worth mentioning that there are other routes that could be taken, such as APEX or SQL Developer, depending on the data volumes and latency you have to play with.

Classic ELT processing can be achieved via Oracle Data Integrator (either the Cloud Service, a traditional on-premise implementation or a hybrid-model).

Ultimately, due care and attention needs to be taken when deciding on your data architecture as this will have a fundamental effect on the simplicity with which data can be accessed and interpreted, the query performance achieved and the data latency built into your analytics.

Data Flows Make For Modern Analytics Simplification

A while back, I wrote a post titled Enabling a Modern Analytics Platform in which I attempted to describe ways that Mode 1 (departmental) and Mode 2 (enterprise) analytics could be built out to support each other, as opposed to undermining one another. One of the key messages I made was the importance of having an effective mechanism for transitioning your Mode 1 outputs back into Mode 2 as seamlessly as possible. (The same is true in reverse for making enterprise data available as an Mode 1 input.)

One of the great things about OAC is how it serves to simplify this transition. Users are able to create analytic content based on data sourced from a broad range of locations: at the simplest level, Data Sets can be built from flat files or via one of the available Data Connectors to relational, NoSQL, proprietary database or Essbase sources. Moreover, enterprise curated metadata (via RPD lift-and-shift from an on-premise implementation) or analyst developed Subject Areas can be exposed. These sources can be ‘mashed’ together directly in a DV project or, for more complex or repeatable actions, Data Flows can be created to build Data Sets. Data Flows are pretty powerful, not only allowing users to join disparate data but also perform some useful data preparation activities, ranging from basic filtering, aggregation and data manipulation actions to more complex sentiment analysis, forecasting and even some machine learning modelling features. Importantly, Data Flows can be set to output their results to disk, either written to a Data Set or even to a database table and they can be scheduled for repetitive refresh.

For me, one of the most important things about the Data Flows feature is that it provides a clear and understandable interface which shows the sequencing of each of the data preparation stages, providing valuable information for any subsequent reverse engineering of the processing back into the enterprise data architecture.

alt

In summary, there are plenty of exciting and innovative things happening with Oracle Analytics in the cloud and as time marches on, the case for moving to the cloud in one shape or form will probably get more and more compelling. However, beyond a strategic decision to ‘Go Cloud’, there are many options and complexities that need to be addressed in order to make a successful start to your journey - some technical, some procedural and some organisational. Whilst a level of planning and research will undoubtedly smooth the path, the great thing about the cloud services is that they are comparatively cheap and easy to initiate, so getting on and building a prototype is always going to be a good, exploratory starting point.

Categories: BI & Warehousing

Why DevOps Matters for Enterprise BI

Tue, 2018-06-12 09:44
Why DevOps Matters for Enterprise BI

Why are people frustrated with their existing enterprise BI tools such as OBIEE? My view is because it costs too much to produce relevant content. I think some of this is down to the tools themselves, and some of it is down to process.

Starting with the tools, they are not “bad” tools; the traditional licensing model can be expensive in today’s market, and traditional development methods are time-consuming and hence expensive. The vendor’s response is to move to the cloud and to highlight cost savings that can be made by having a managed platform. Oracle Analytics Cloud (OAC) is essentially OBIEE installed on Oracle’s servers in Oracle’s data centres with Oracle providing your system administration, coupled with the ability to flex your licensing on a monthly or annual basis.

Cloud does give organisations the potential for more agility. Provisioning servers can no longer hold up the start of a project, and if a system needs to increase capacity, then more CPUs or nodes can be added. This latter case is a bit murky due to the cost implications and the option to try and resolve performance issues through query efficiency on the database.

I don’t think this solves the problem. Tools that provide reports and dashboards are becoming more commoditised, up and coming vendors and platform providers are offering the service for a fraction of the cost of the traditional vendors. They may lack some of the enterprise features like open security models; however, these are an area that platform providers are continually improving. Over the last 10 years, Oracle's focus for OBIEE has been on more on integration than innovation. Oracle DV was a significant change; however, there is a danger that Oracle lost the first-mover advantage to tools such as Tableau and QlikView. Additionally, some critical features like lineage, software lifecycle development, versioning and process automation are not built in to OBIEE and worse still, the legacy design and architecture of the product often hinders these.

So this brings me back round to process. Defining “good” processes and having tools to support them is one of the best ways you can keep your BI tools relevant to the business by reducing the friction in generating content.

What is a “good” process? Put simply, a process that reduces the time between the identification of a business need and the realising it with zero impact on existing components of the system. Also, a “good” process should provide visibility of any design, development and testing, plus documentation of changes, typically including lineage in a modern BI system. Continuous integration is the Holy Grail.

This why DevOps matters. Using automated migration across environments, regression tests, automatically generated documentation in the form of lineage, native support for version control systems, supported merge processes and ideally a scripting interface or API to automate the generation of repetitive tasks such as changing the data type of a group of fields system-wide, can dramatically reduce the gap from idea to realisation.

So, I would recommend that when looking at your enterprise BI system, you not only consider the vendor, location and features but also focus on the potential for process optimisation and automation. Automation could be something that the vendor builds into the tool, or you may need to use accelerators or software provided by a third party. Over the next few weeks, we will be publishing some examples and case studies of how our BI and DI Developer Toolkits have helped clients and enabled them to automate some or all of the BI software development cycle, reducing the time to release new features and increasing the confidence and robustness of the system.

Categories: BI & Warehousing

Why DevOps Matters for Enterprise BI

Tue, 2018-06-12 09:44
Why DevOps Matters for Enterprise BI

Why are people frustrated with their existing enterprise BI tools such as OBIEE? My view is because it costs too much to produce relevant content. I think some of this is down to the tools themselves, and some of it is down to process.

Starting with the tools, they are not “bad” tools; the traditional licensing model can be expensive in today’s market, and traditional development methods are time-consuming and hence expensive. The vendor’s response is to move to the cloud and to highlight cost savings that can be made by having a managed platform. Oracle Analytics Cloud (OAC) is essentially OBIEE installed on Oracle’s servers in Oracle’s data centres with Oracle providing your system administration, coupled with the ability to flex your licensing on a monthly or annual basis.

Cloud does give organisations the potential for more agility. Provisioning servers can no longer hold up the start of a project, and if a system needs to increase capacity, then more CPUs or nodes can be added. This latter case is a bit murky due to the cost implications and the option to try and resolve performance issues through query efficiency on the database.

I don’t think this solves the problem. Tools that provide reports and dashboards are becoming more commoditised, up and coming vendors and platform providers are offering the service for a fraction of the cost of the traditional vendors. They may lack some of the enterprise features like open security models; however, these are an area that platform providers are continually improving. Over the last 10 years, Oracle's focus for OBIEE has been on more on integration than innovation. Oracle DV was a significant change; however, there is a danger that Oracle lost the first-mover advantage to tools such as Tableau and QlikView. Additionally, some critical features like lineage, software lifecycle development, versioning and process automation are not built in to OBIEE and worse still, the legacy design and architecture of the product often hinders these.

So this brings me back round to process. Defining “good” processes and having tools to support them is one of the best ways you can keep your BI tools relevant to the business by reducing the friction in generating content.

What is a “good” process? Put simply, a process that reduces the time between the identification of a business need and the realising it with zero impact on existing components of the system. Also, a “good” process should provide visibility of any design, development and testing, plus documentation of changes, typically including lineage in a modern BI system. Continuous integration is the Holy Grail.

This why DevOps matters. Using automated migration across environments, regression tests, automatically generated documentation in the form of lineage, native support for version control systems, supported merge processes and ideally a scripting interface or API to automate the generation of repetitive tasks such as changing the data type of a group of fields system-wide, can dramatically reduce the gap from idea to realisation.

So, I would recommend that when looking at your enterprise BI system, you not only consider the vendor, location and features but also focus on the potential for process optimisation and automation. Automation could be something that the vendor builds into the tool, or you may need to use accelerators or software provided by a third party. Over the next few weeks, we will be publishing some examples and case studies of how our BI and DI Developer Toolkits have helped clients and enabled them to automate some or all of the BI software development cycle, reducing the time to release new features and increasing the confidence and robustness of the system.

Categories: BI & Warehousing

Real-time Sailing Yacht Performance - stepping back a bit (Part 1.1)

Mon, 2018-06-11 07:20

Slight change to the planned article. At the end of my analysis in Part 1 I discovered I was missing a number of key messages. It turns out that not all the SeaTalk messages from the integrated instruments were being translated to an NMEA format and therefore not being sent wirelessly from the AIS hub. I didn't really want to introduce another source of data directly from the instruments as it would involve hard wiring the instruments to the laptop and then translating a different format of a message (SeaTalk). I decided to spend on some hardware (any excuse for new toys). I purchased a SeaTalk to NMEA converter from DigitalYachts (discounted at the London boat show I'm glad to say).

This article is about the installation of that hardware and the result (hence Part 1.1), not our usual type of blog. You never know it may be of interest to somebody out there and this is a real-life data issue! Don't worry it will be short and more of an insight into Yacht wiring than anything.

The next blog will be very much back on track. Looking at Kafka in the architecture.

The existing wiring

The following image shows the existing setup, what's behind the panels and how it links to the instrument architecture documented in Part 1. No laughing at the wiring spaghetti - I stripped out half a tonne of cable last year so this is an improvement. Most of the technology lives near the chart table and we have access to the navigation lights, cabin lighting, battery sensors and DSC VHF. The top left image also shows a spare GPS (Garmin) and far left an EPIRB.

Approach

I wanted to make sure I wasn't breaking anything by adding the new hardware so followed the approach we use as software engineers. Check before, during and after any changes enabling us to narrow down the point errors are introduced. To help with this I create a little bit of Python that reads the messages and lets me know the unique message types, the total number of messages and the number of messages in error.

 
import json
import sys
  
#DEF Function to test message
def is_message_valid (orig_line):

........ [Function same code described in Part 1]

#main body
f = open("/Development/step_1.log", "r")

valid_messages = 0
invalid_messages = 0
total_messages = 0
my_list = [""]
#process file main body
for line in f:

  orig_line = line

  if is_message_valid(orig_line):
    valid_messages = valid_messages + 1
    #look for wind message
    #print "message valid"

    if orig_line[0:1] == "$":
      if len(my_list) == 0:
        #print "ny list is empty"
        my_list.insert(0,orig_line[0:6]) 
      else:
        #print orig_line[0:5]
        my_list.append(orig_line[0:6])

      #print orig_line[20:26]
 
  else:
    invalid_messages = invalid_messages + 1

  total_messages = total_messages + 1

new_list = list(set(my_list))

i = 0

while i < len(new_list):
    print(new_list[i])
    i += 1

#Hight tech report
print "Summary"
print "#######"
print "valid messages -> ", valid_messages
print "invalid messages -> ", invalid_messages
print "total mesages -> ", total_messages

f.close()

For each of the steps, I used nc to write the output to a log file and then use the Python to analyse the log. I log about ten minutes of messages each step although I have to confess to shortening the last test as I was getting very cold.

nc -l 192.168.1.1 2000 > step_x.log

While spooling the message I artificially generate some speed data by spinning the wheel of the speedo. The image below shows the speed sensor and where it normally lives (far right image). The water comes in when you take out the sensor as it temporarily leaves a rather large hole in the bottom of the boat, don't be alarmed by the little puddle you can see.

Step 1;

I spool and analyse about ten minutes of data without making any changes to the existing setup.

The existing setup takes data directly from the back of a Raymarine instrument seen below and gets linked into the AIS hub.

Results;
 
$AITXT -> AIS (from AIS hub)

$GPRMC -> GPS (form AIS hub)
$GPGGA
$GPGLL
$GPGBS

$IIDBT -> Depth sensor
$IIMTW -> Sea temperature sensor
$IIMWV -> Wind speed 

Summary
#######
valid messages ->  2129
invalid messages ->  298
total mesages ->  2427
12% error

Step 2;

I disconnect the NMEA interface between the AIS hub and the integrated instruments. So in the diagram above I disconnect all four NMEA wires from the back of the instrument.

I observe the Navigation display of the integrated instruments no longer displays any GPS information (this is expected as the only GPS messages I have are coming from the AIS hub).

Results;

$AITXT -> AIS (from AIS hub)

$GPRMC -> GPS (form AIS hub)
$GPGGA
$GPGLL
$GPGBS

No $II messages as expected 

Summary
#######
valid messages ->  3639
invalid messages ->  232
total mesages ->  3871
6% error
Step 3;

I wire in the new hardware both NMEA in and out then directly into the course computer.

Results;

$AITXT -> AIS (from AIS hub)

$GPGBS -> GPS messages
$GPGGA
$GPGLL
$GPRMC

$IIMTW -> Sea temperature sensor
$IIMWV -> Wind speed 
$IIVHW -> Heading & Speed
$IIRSA -> Rudder Angle
$IIHDG -> Heading
$IIVLW -> Distance travelled

Summary
#######
valid messages ->  1661
invalid messages ->  121
total mesages ->  1782
6.7% error
Conclusion;

I get all the messages I am after (for now) the hardware seems to be working.

Now to put all the panels back in place!

In the next article, I will get back to technology and the use of Kafka in the architecture.

Real-time Sailing Yacht Performance - Getting Started (Part 1)

Real-time Sailing Yacht Performance - Kafka (Part 2)

Categories: BI & Warehousing

Real-time Sailing Yacht Performance - stepping back a bit (Part 1.1)

Mon, 2018-06-11 07:20

Slight change to the planned article. At the end of my analysis in Part 1 I discovered I was missing a number of key messages. It turns out that not all the SeaTalk messages from the integrated instruments were being translated to an NMEA format and therefore not being sent wirelessly from the AIS hub. I didn't really want to introduce another source of data directly from the instruments as it would involve hard wiring the instruments to the laptop and then translating a different format of a message (SeaTalk). I decided to spend on some hardware (any excuse for new toys). I purchased a SeaTalk to NMEA converter from DigitalYachts (discounted at the London boat show I'm glad to say).

This article is about the installation of that hardware and the result (hence Part 1.1), not our usual type of blog. You never know it may be of interest to somebody out there and this is a real-life data issue! Don't worry it will be short and more of an insight into Yacht wiring than anything.

The next blog will be very much back on track. Looking at Kafka in the architecture.

The existing wiring

The following image shows the existing setup, what's behind the panels and how it links to the instrument architecture documented in Part 1. No laughing at the wiring spaghetti - I stripped out half a tonne of cable last year so this is an improvement. Most of the technology lives near the chart table and we have access to the navigation lights, cabin lighting, battery sensors and DSC VHF. The top left image also shows a spare GPS (Garmin) and far left an EPIRB.

Approach

I wanted to make sure I wasn't breaking anything by adding the new hardware so followed the approach we use as software engineers. Check before, during and after any changes enabling us to narrow down the point errors are introduced. To help with this I create a little bit of Python that reads the messages and lets me know the unique message types, the total number of messages and the number of messages in error.

 
import json
import sys

#DEF Function to test message
def is_message_valid (orig_line):

........ [Function same code described in Part 1]

#main body
f = open("/Development/step_1.log", "r")

valid_messages = 0
invalid_messages = 0
total_messages = 0
my_list = [""]
#process file main body
for line in f:

  orig_line = line

  if is_message_valid(orig_line):
    valid_messages = valid_messages + 1
    #look for wind message
    #print "message valid"

    if orig_line[0:1] == "$":
      if len(my_list) == 0:
        #print "ny list is empty"
        my_list.insert(0,orig_line[0:6]) 
      else:
        #print orig_line[0:5]
        my_list.append(orig_line[0:6])

      #print orig_line[20:26]

  else:
    invalid_messages = invalid_messages + 1

  total_messages = total_messages + 1

new_list = list(set(my_list))

i = 0

while i < len(new_list):
    print(new_list[i])
    i += 1

#Hight tech report
print "Summary"
print "#######"
print "valid messages -> ", valid_messages
print "invalid messages -> ", invalid_messages
print "total mesages -> ", total_messages

f.close()

For each of the steps, I used nc to write the output to a log file and then use the Python to analyse the log. I log about ten minutes of messages each step although I have to confess to shortening the last test as I was getting very cold.

nc -l 192.168.1.1 2000 > step_x.log

While spooling the message I artificially generate some speed data by spinning the wheel of the speedo. The image below shows the speed sensor and where it normally lives (far right image). The water comes in when you take out the sensor as it temporarily leaves a rather large hole in the bottom of the boat, don't be alarmed by the little puddle you can see.

Step 1;

I spool and analyse about ten minutes of data without making any changes to the existing setup.

The existing setup takes data directly from the back of a Raymarine instrument seen below and gets linked into the AIS hub.

Results;
 
$AITXT -> AIS (from AIS hub)

$GPRMC -> GPS (form AIS hub)
$GPGGA
$GPGLL
$GPGBS

$IIDBT -> Depth sensor
$IIMTW -> Sea temperature sensor
$IIMWV -> Wind speed 

Summary
#######
valid messages ->  2129
invalid messages ->  298
total mesages ->  2427
12% error

Step 2;

I disconnect the NMEA interface between the AIS hub and the integrated instruments. So in the diagram above I disconnect all four NMEA wires from the back of the instrument.

I observe the Navigation display of the integrated instruments no longer displays any GPS information (this is expected as the only GPS messages I have are coming from the AIS hub).

Results;

$AITXT -> AIS (from AIS hub)

$GPRMC -> GPS (form AIS hub)
$GPGGA
$GPGLL
$GPGBS

No $II messages as expected 

Summary
#######
valid messages ->  3639
invalid messages ->  232
total mesages ->  3871
6% error
Step 3;

I wire in the new hardware both NMEA in and out then directly into the course computer.

Results;

$AITXT -> AIS (from AIS hub)

$GPGBS -> GPS messages
$GPGGA
$GPGLL
$GPRMC

$IIMTW -> Sea temperature sensor
$IIMWV -> Wind speed 
$IIVHW -> Heading & Speed
$IIRSA -> Rudder Angle
$IIHDG -> Heading
$IIVLW -> Distance travelled

Summary
#######
valid messages ->  1661
invalid messages ->  121
total mesages ->  1782
6.7% error
Conclusion;

I get all the messages I am after (for now) the hardware seems to be working.

Now to put all the panels back in place!

In the next article, I will get back to technology and the use of Kafka in the architecture.

Categories: BI & Warehousing

Rittman Mead at Kscope 2018

Thu, 2018-05-31 02:20
Rittman Mead at Kscope 2018

Kscope 2018 is just a week away! Magnificent location (Walt Disney World Swan and Dolphin Resort) for one of the best tech conferences of the year! The agenda is impressive (look here) spanning over ten different tracks from the traditional EPM, BI Analytics and Data Visualization, to the newly added Blockchain! Plenty of great content and networking opportunities!

I'll be representing Rittman Mead with two talks: one about Visualizing Streams (Wednesday at 10:15 Northern Hemisphere A2, Fifth Level) on how to build a modern analytical platform including Apache Kafka, Confluent's KSQL, Apache Drill and Oracle's Data Visualization (Cloud or Desktop).

Rittman Mead at Kscope 2018

During the second talk, titled DevOps and OBIEE:
Do it Before it's Too Late!
(Monday at 10:45 Northern Hemisphere A1, Fifth Level), I'll be sharing details, based on our experience, on how OBIEE can be fully included in a DevOps framework, what's the cost of "avoiding" DevOps and automation in general and how Rittman Mead's toolkits, partially described here, can be used to accelerate the adoption of DevOps practices in any situation.

Rittman Mead at Kscope 2018

If you’re at the event and you see me in sessions, around the conference or during my talks, I’d be pleased to speak with you about your projects and answer any questions you might have.

Categories: BI & Warehousing

Rittman Mead at Kscope 2018

Thu, 2018-05-31 02:20
Rittman Mead at Kscope 2018

Kscope 2018 is just a week away! Magnificent location (Walt Disney World Swan and Dolphin Resort) for one of the best tech conferences of the year! The agenda is impressive (look here) spanning over ten different tracks from the traditional EPM, BI Analytics and Data Visualization, to the newly added Blockchain! Plenty of great content and networking opportunities!

I'll be representing Rittman Mead with two talks: one about Visualizing Streams (Wednesday at 10:15 Northern Hemisphere A2, Fifth Level) on how to build a modern analytical platform including Apache Kafka, Confluent's KSQL, Apache Drill and Oracle's Data Visualization (Cloud or Desktop).

Rittman Mead at Kscope 2018

During the second talk, titled DevOps and OBIEE:
Do it Before it's Too Late!
(Monday at 10:45 Northern Hemisphere A1, Fifth Level), I'll be sharing details, based on our experience, on how OBIEE can be fully included in a DevOps framework, what's the cost of "avoiding" DevOps and automation in general and how Rittman Mead's toolkits, partially described here, can be used to accelerate the adoption of DevOps practices in any situation.

Rittman Mead at Kscope 2018

If you’re at the event and you see me in sessions, around the conference or during my talks, I’d be pleased to speak with you about your projects and answer any questions you might have.

Categories: BI & Warehousing

Rittman Mead at OUG Norway 2018

Mon, 2018-03-05 04:45
Rittman Mead at OUG Norway 2018

This week I am very pleased to represent Rittman Mead by presenting at the Oracle User Group Norway Spring Seminar 2018 delivering two sessions about Oracle Analytics, Kafka, Apache Drill and Data Visualization both on-premises and cloud. The OUGN conference it's unique due to both the really high level of presentations (see related agenda) and the fascinating location being the Color Fantasy Cruiseferry going from Oslo to Kiev and back.

Rittman Mead at OUG Norway 2018

I'll be speaking on Friday 9th at 9:30AM in Auditorium 2 about Visualizing Streams on how the world of Business Analytics has changed in recent years and how to successfully build a Modern Analytical Platform including Apache Kafka, Confluent's recently announced KSQL and Oracle's Data Visualization.

Rittman Mead at OUG Norway 2018

On the same day at 5PM, always in Auditorium 2, I'll be delivering the session OBIEE: Going Down the Rabbit Hole: providing details, built on experience, on how diagnostic tools, non standard configuration and well defined processes can enhance, secure and accelerate any analytical project.

If you’re at the event and you see me in sessions, around the conference or during my talks, I’d be pleased to speak with you about your projects and answer any questions you might have.

Categories: BI & Warehousing

Rittman Mead at OUG Norway 2018

Mon, 2018-03-05 04:45
Rittman Mead at OUG Norway 2018

This week I am very pleased to represent Rittman Mead by presenting at the Oracle User Group Norway Spring Seminar 2018 delivering two sessions about Oracle Analytics, Kafka, Apache Drill and Data Visualization both on-premises and cloud. The OUGN conference it's unique due to both the really high level of presentations (see related agenda) and the fascinating location being the Color Fantasy Cruiseferry going from Oslo to Kiev and back.

Rittman Mead at OUG Norway 2018

I'll be speaking on Friday 9th at 9:30AM in Auditorium 2 about Visualizing Streams on how the world of Business Analytics has changed in recent years and how to successfully build a Modern Analytical Platform including Apache Kafka, Confluent's recently announced KSQL and Oracle's Data Visualization.

Rittman Mead at OUG Norway 2018

On the same day at 5PM, always in Auditorium 2, I'll be delivering the session OBIEE: Going Down the Rabbit Hole: providing details, built on experience, on how diagnostic tools, non standard configuration and well defined processes can enhance, secure and accelerate any analytical project.

If you’re at the event and you see me in sessions, around the conference or during my talks, I’d be pleased to speak with you about your projects and answer any questions you might have.

Categories: BI & Warehousing

Spring into action with our new OBIEE 12c Systems Management & Security On Demand Training course

Mon, 2018-02-19 05:49

Rittman Mead are happy to release a new course to the On Demand Training platform.

The OBIEE 12c Systems Management & Security course is the essential learning tool for any developers or administrators who will be working on the maintenance & optimisation of their OBIEE platform.

Baseline Validation Tool

View lessons and live demos from our experts on the following subjects:

  • What's new in OBIEE 12c
  • Starting & Stopping Services
  • Managing Metadata
  • System Preferences
  • Troubleshooting Issues
  • Caching
  • Usage Tracking
  • Baseline Validation Tool
  • Direct Database Request
  • Write Back
  • LDAP Users & Groups
  • Application Roles
  • Permissions

Get hands on with the practical version of the course which comes with an OBIEE 12c training environment and 9 lab exercises.
System Preferences

Rittman Mead will also be releasing a theory version of the course. This will not include the lab exercises but gives each of the lessons and demos that you'd get as part of the practical course.

Course prices are as follows:

OBIEE 12c Systems Management & Security - PRACTICAL - $499

  • 30 days access to lessons & demos
  • 30 days access to OBIEE 12c training environment for lab exercises
  • 30 days access to Rittman Mead knowledge base for Q&A and lab support

OBIEE 12c Systems Management & Security - THEROY - $299

  • 30 days access to lessons & demos
  • 30 days access to Rittman Mead knowledge base for Q&A

To celebrate the changing of seasons we suggest you Spring into action with OBIEE 12c by receiving a 25% discount on both courses until 31st March 2018 using voucher code:

ODTSPRING18

Access both courses and the rest of our catalog at learn.rittmanmead.com

Categories: BI & Warehousing

Spring into action with our new OBIEE 12c Systems Management & Security On Demand Training course

Mon, 2018-02-19 05:49

Rittman Mead are happy to release a new course to the On Demand Training platform.

The OBIEE 12c Systems Management & Security course is the essential learning tool for any developers or administrators who will be working on the maintenance & optimisation of their OBIEE platform.

Baseline Validation Tool

View lessons and live demos from our experts on the following subjects:

  • What's new in OBIEE 12c
  • Starting & Stopping Services
  • Managing Metadata
  • System Preferences
  • Troubleshooting Issues
  • Caching
  • Usage Tracking
  • Baseline Validation Tool
  • Direct Database Request
  • Write Back
  • LDAP Users & Groups
  • Application Roles
  • Permissions

Get hands on with the practical version of the course which comes with an OBIEE 12c training environment and 9 lab exercises.
System Preferences

Rittman Mead will also be releasing a theory version of the course. This will not include the lab exercises but gives each of the lessons and demos that you'd get as part of the practical course.

Course prices are as follows:

OBIEE 12c Systems Management & Security - PRACTICAL - $499

  • 30 days access to lessons & demos
  • 30 days access to OBIEE 12c training environment for lab exercises
  • 30 days access to Rittman Mead knowledge base for Q&A and lab support

OBIEE 12c Systems Management & Security - THEROY - $299

  • 30 days access to lessons & demos
  • 30 days access to Rittman Mead knowledge base for Q&A

To celebrate the changing of seasons we suggest you Spring into action with OBIEE 12c by receiving a 25% discount on both courses until 31st March 2018 using voucher code:

ODTSPRING18

Access both courses and the rest of our catalog at learn.rittmanmead.com

Categories: BI & Warehousing

Pages