Yann Neuhaus

Subscribe to Yann Neuhaus feed
dbi services technical blog
Updated: 15 hours 45 min ago

Oracle disables your multitenant option when you run on EC2

Thu, 2020-03-26 17:56

I have installed Oracle 19.6 on an EC2 for our Multitenant Workshop training. And of course, during the workshop we create a lot of PDBs. If you don’t have paid for the Enterprise Edition plus the Multitenant Option you can create at most 3 pluggable database. But with this option you can create up to 252 pluggable databases. Does it worth the price, which according to the public price list is USD 47,500 + 17,500 per processor, which means per-core because Oracle doesn’t count the core factor when your Intel processors are in AWS Cloud (according to the Authorized Cloud Environments paper)? Probably not because Oracle detects where you run and bridles some features depending whether you are on the Dark or the Light Side of the public cloud (according to their criteria of course).

At one point I have 3 pluggable databases in my CDB:


SQL> show pdbs
   CON_ID     CON_NAME    OPEN MODE    RESTRICTED
_________ ____________ ____________ _____________
        2 PDB$SEED     READ ONLY    NO
        3 CDB1PDB01    MOUNTED
        4 CDB1PDB03    MOUNTED
        5 CDB1PDB02    MOUNTED

I want to create a 4th one:


SQL> create pluggable database CDB1PDB04 from CDB1PDB03;

create pluggable database CDB1PDB04 from CDB1PDB03
                          *
ERROR at line 1:
ORA-65010: maximum number of pluggable databases created

It fails. The maximum number of pluggable databases is defined by MAX_PDBS, but I defined nothing in my SPFILE:


SQL> show spparameter max_pdbs
SID NAME     TYPE    VALUE
--- -------- ------- -----
*   max_pdbs integer

I thought that the default was 4098 (which is incorrect anyway as you cannot create more than 4096) but it is actually 5 here:


SQL> show parameter max_pdbs
NAME     TYPE    VALUE
-------- ------- -----
max_pdbs integer 5

Ok… this parameter is supposed to count the number of user pluggable databases (the ones with CON_ID>2) and I have 3 of them here. The limit is 5 and I have an error mentioning that I’ve reached the limit. That’s not the first time I see wrong maths with this parameter. But there’s worse as I cannot change it:


SQL> alter system set max_pdbs=6;

alter system set max_pdbs=6
 *
ERROR at line 1:
ORA-02097: parameter cannot be modified because specified value is invalid
ORA-65334: invalid number of PDBs specified

I can change it in the SPFILE but it doesn’t help me to create more pluggable databases:


SQL> alter system set max_pdbs=200 scope=spfile;

System altered.

SQL> startup force;

Total System Global Area   2147482744 bytes
Fixed Size                    9137272 bytes
Variable Size               587202560 bytes
Database Buffers           1543503872 bytes
Redo Buffers                  7639040 bytes
Database mounted.
Database opened.

SQL> show parameter max_pdbs
NAME     TYPE    VALUE
-------- ------- -----
max_pdbs integer 200

SQL> create pluggable database CDB1PDB04 from CDB1PDB03;

create pluggable database CDB1PDB04 from CDB1PDB03
                          *
ERROR at line 1:
ORA-65010: maximum number of pluggable databases created

Something bridles me. There’s a MOS Note ORA-65010 When Oracle Database Hosted on AWS Cloud (Doc ID 2328600.1) about the same problem but that’s in 12.1.0.2 (before MAX_PDBS was introduced) which is supposed to be fixed in AUG 2017 PSU. But here I am 3 years later in 19.6 (the January 2020 Release Update for the latest version available on-premises).

So, Oracle limits the number of pluggable databases when we are on a public cloud provider which is not the Oracle Public Cloud. This limitation is not documented in the licensing documentation which mentions 252 as the Enterprise Edition limit, and I see nothing about “Authorized Cloud Environments” limitations for this item. This, and the fact that it can come and go with Release Updates put customers at risk when running on AWS EC2: financial risk and availability risk. I think there are only two choices, on long term, when you want to run your database on a cloud: go to Oracle Cloud or leave for another Database.

How does the Oracle instance know on which public cloud you run? All cloud platforms provide some metadata through HTTP api. I have straced all sendto() and recvfrom() system calls when starting the instance:


strace -k -e trace=recvfrom,sendto -yy -s 1000 -f -o trace.trc sqlplus / as sysdba <<<'startup force'

And I searched for Amazon and AWS here:

This is clear: the instance has a function to detect the cloud provider (kgcs_clouddb_provider_detect) when initializing the SGA in a multitenant architecture (kpdbInitSga) with the purpose of detecting non-oracle clouds (kscs_is_non_oracle_cloud). This queries the AWS metadata (documented on Retrieving Instance Metadata):


[oracle@ora-cdb-1 ~]$ curl http://169.254.169.254/latest/meta-data/services/domain
amazonaws.com/

When Oracle software sees the name of the enemy in the domain name amazonaws.com, it sets an internal limit for the number of pluggable databases that overrides the MAX_PDBS setting. Ok, I don’t need this metadata and I’m root on EC2 so my simple workaround is to block this metadata API:


[root@ora-cdb-1 ~]# iptables -A OUTPUT -d 169.254.169.254  -j REJECT
[root@ora-cdb-1 ~]# iptables -L
Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
REJECT     udp  --  anywhere             10.0.0.2             udp dpt:domain reject-with icmp-port-unreachable
REJECT     all  --  anywhere             10.0.0.2             reject-with icmp-port-unreachable

Then restart the instance and it works: I can set or reset MAX_PDBS and create more pluggable databases.

I can remove the rule


[root@ora-cdb-1 ~]# iptables -D OUTPUT -d 169.254.169.254  -j REJECT

If, for watever reason I want to revert back.

Finally, because they had many bugs with the MAX_PDBS soft limit, there’s a parameter to disable it and this disables also the hard limit:


SQL> alter system set "_cdb_disable_pdb_limit"=true scope=spfile;
System altered.

Thanks to Mauricio Melnik for the heads-up on that:

MOS 2538172.1 _cdb_disable_pdb_limit=true …. maybe?

— Mauricio Melnik (@maurimelnik) March 26, 2020

However, with this parameter you cannot control anymore the maximum number of PDBs so don’t forget to monitor your AUX_COUNT in DBA_FEATURE_USAGE_STATISTICS.

Here was my discovery when preparing the multitenant workshop lab environment. Note that given the current situation where everybody works from home when possible, we are ready to give this training full of hands-on exercises though Microsoft Teams and AWS EC2 virtual machines. Two days to be comfortable when moving to CDB architecture, which is what should be done this year when you plan to stay with Oracle Database for the future versions.

Update 27-MAR-2020

In order not to sound too negative here, this limit on AWS platforms has been removed in the past and this may be a bug re-introduced with the change from 1 to 3 PDBs in Standard Edition.

Cet article Oracle disables your multitenant option when you run on EC2 est apparu en premier sur Blog dbi services.

Sending PostgreSQL data to AWS S3 by using AWS Kinesis data streams

Thu, 2020-03-26 09:22

Before we really start with this post: This is just an experiment and you should not implement it like this in real life. The goal of this post is just to show what is possible and I am not saying that you should do it (the way it is implemented here will be catastrophic for your database performance and it is not really secure). As I am currently exploring a lot of AWS services I wanted to check if there is an easy way to send data from PostgreSQL into an AWS Kinesis data stream for testing purposes and it turned out that this is actually quite easy if you have the AWS Command Line Interface installed and configured on the database server.

Creating a new Kinesis stream in AWS is actually a matter of a few clicks (of course you can do that with the command line utilities as well):

What I want is a simple data stream where I can put data into:

Obviously the new stream needs a name and as I will not do any performance or stress testing one shard is absolutely fine:

That’s all what needs to be done, the new stream is ready:

An AWS Kinesis stream is not persistent by default. That means, if you want to permanently store the output of a stream you need to connect the stream to a consumer that processes, eventually transforms, and finally stores the data somewhere. For this you can use AWS Kinesis Firehose and this is what I’ll be doing here:

As I want to use AWS S3 as the target for my data I need to use a delivery stream:

The delivery stream needs a name as well and we will use the stream just created above as the source:


We could go ahead and transform the data with an AWS Lambda function but we’re going to keep it simple for now and skip this option:


The next screen is about the target for the data. This could be AWS Redshift, AWS Elasticsearch, Splunk or AWS S3, what we’ll be doing here:

Finally specifying the target S3 bucket:

The settings for buffering at not really important for this test but will matter for real systems as these settings determine how fast your data is delivered to S3 (we also do not care about encryption and compression for now):

Error logging should of course be enabled and we need an IAM role with appropriate permissions:

Final review:


… and the stream and delivery stream are ready to use:

That’s it for the setup on the AWS side and we can continue with configuring PostgreSQL to call the AWS command line utility to write data to the stream. Callling system commands from inside PostgreSQL can be done in various ways, we’ll be using pl/Perl for that, and even the untrusted version so only superusers will be able to do that:

postgres=# create extension plperlu;
CREATE EXTENSION
postgres=# \dx
                      List of installed extensions
  Name   | Version |   Schema   |              Description               
---------+---------+------------+----------------------------------------
 plperlu | 1.0     | pg_catalog | PL/PerlU untrusted procedural language
 plpgsql | 1.0     | pg_catalog | PL/pgSQL procedural language
(2 rows)

Next we need a table that will contain the data we want to send to the stream:

postgres=# create table stream_data ( id serial primary key
                                    , stream text );

A trigger will fire each time a new row is inserted and the trigger function will call the AWS command line interface:

create or replace function f_send_to_kinesis()
returns trigger
language plperlu
AS $$
     system('aws kinesis put-record --stream-name postgres-to-kinesis --partition-key 1 --data '.$_TD->{new}{stream});
     return;
$$;

create trigger tr_test
after insert or update
on stream_data
for each row
execute procedure f_send_to_kinesis();

This is all we need. Let’s insert a row into the table and check if it arrives in AWS S3 (remember that it will take up to 300 seconds or 5MB of data):

postgres=# insert into stream_data (stream) values ('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');
INSERT 0 1
postgres=# insert into stream_data (stream) values ('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');
INSERT 0 1
postgres=# insert into stream_data (stream) values ('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');
INSERT 0 1
postgres=# insert into stream_data (stream) values ('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');
INSERT 0 1
postgres=# insert into stream_data (stream) values ('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');
INSERT 0 1
postgres=# insert into stream_data (stream) values ('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');
INSERT 0 1
postgres=# insert into stream_data (stream) values ('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');
INSERT 0 1
postgres=# insert into stream_data (stream) values ('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');
INSERT 0 1
postgres=# insert into stream_data (stream) values ('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');
INSERT 0 1
postgres=# select * from stream_data;
 id |              stream              
----+----------------------------------
  1 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
  2 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
  3 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
  4 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
  5 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
  6 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
  7 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
  8 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
  9 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
(9 rows)

You will also notice that the insert takes quite some time because calling the AWS command line utility and waiting for the result takes ages compared to a normal insert.

While waiting for the data to arrive you can check the monitoring section of both, the stream and the delivery stream:

After a while the data appears in S3 and it is organized in [YEAR]/[MONTH]/[DAY]/[HOUR]:




Looking at the file itself all our data is there:

So, actually it is quite easy to send data to an AWS Kinesis stream. If you really need to send data out of PostgreSQL I probably would go for listen/notify to make the calls to the AWS command line utility not blocking the inserts or updates to the table that holds the data for the stream. Anyway, currently I am not aware of a good use case for sending streams of data out of PostgreSQL directly to AWS Kinesis. Maybe you do something like that and how?

Cet article Sending PostgreSQL data to AWS S3 by using AWS Kinesis data streams est apparu en premier sur Blog dbi services.

PostgreSQL message levels

Wed, 2020-03-25 02:02

When you start to write business logic in the database by using triggers or functions/procedures you usually want to report messages to the user that runs your code or you want to include some debugging output for your own. In Oracle a lot of people use the dbms_output package to return simple messages to the user’s screen. In PostgreSQL you can do the same but of course the way it is implemented is not the same. There are basically two parameters that control how much output is returned to the client or to the server’s log file: client_min_messages and log_min_messages.

Let’s start with client_min_messages: The default value of this parameter is ‘NOTICE’ and these are the valid values you cant set:

  • DEBUG5
  • DEBUG4
  • DEBUG3
  • DEBUG2
  • DEBUG1
  • LOG
  • NOTICE
  • WARNING
  • ERROR

A given level excludes all the lower levels, e.g. “LOG” would exclude all the “DEBUG*” levels and the default “NOTICE” will exclude “LOG” as well. The DEBUG* levels are usually not required and if you check the PostgreSQL source code you’ll notice that the highest level of information (DEBUG5) is not used that often while lower levels are used more often:

postgres@centos8pg:/home/postgres/postgresql/ [pg13] grep -r "DEBUG5" * | grep -v sgml | wc -l
20
postgres@centos8pg:/home/postgres/postgresql/ [pg13] grep -r "DEBUG4" * | grep -v sgml | wc -l
78
postgres@centos8pg:/home/postgres/postgresql/ [pg13] grep -r "DEBUG3" * | grep -v sgml | wc -l
64
postgres@centos8pg:/home/postgres/postgresql/ [pg13] grep -r "DEBUG2" * | grep -v sgml | wc -l
236
postgres@centos8pg:/home/postgres/postgresql/ [pg13] grep -r "DEBUG1" * | grep -v sgml | wc -l
221

Setting the highest level would give a lot of information even for basic tasks:

postgres=# set client_min_messages = 'DEBUG5';
DEBUG:  CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 0/1/0
SET
postgres=# create table tt1 ( a int );
DEBUG:  StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
DEBUG:  CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 559/1/1
CREATE TABLE
postgres=# create index ii1 on tt1 (a);
DEBUG:  StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
DEBUG:  building index "ii1" on table "tt1" serially
DEBUG:  index "ii1" can safely use deduplication
DEBUG:  CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 560/1/2
CREATE INDEX

Lowering that to “DEBUG1” already reduces the amount of messages:

postgres=# set client_min_messages = 'DEBUG1';
SET
postgres=# create table tt2 ( a int );
CREATE TABLE
postgres=# create index ii2 on tt2 (a);
DEBUG:  building index "ii2" on table "tt2" serially
DEBUG:  index "ii2" can safely use deduplication
CREATE INDEX
postgres=# 

You can use most of the levels to control how much information is given back by your code as well. Let’s assume we have a function like this:

postgres=# create or replace function f_test_msg () returns void 
postgres-# as
postgres-# $$
postgres$# declare
postgres$# begin
postgres$#     raise debug 'This is a DEBUG message';
postgres$# end;
postgres$# $$ language plpgsql;
CREATE FUNCTION

With the default setting of ‘NOTICE’ for client_min_messages you would not see any output when you execute the function:

postgres=# show client_min_messages;
 client_min_messages 
---------------------
 notice
(1 row)

postgres=# select f_test_msg();
 f_test_msg 
------------
 
(1 row)

Setting client_min_messages to the appropriate level will give you the output:

postgres=# set client_min_messages = 'DEBUG';
SET
postgres=# select f_test_msg();
DEBUG:  This is a DEBUG message
 f_test_msg 
------------
 
(1 row)

postgres=# 

It does not really matter to with “DEBUG” level you set the parameter, you’ll get the “raise debug” for all those levels:

postgres=# set client_min_messages = 'DEBUG1';
SET
postgres=# select f_test_msg();
DEBUG:  This is a DEBUG message
 f_test_msg 
------------
 
(1 row)

postgres=# set client_min_messages = 'DEBUG2';
SET
postgres=# select f_test_msg();
DEBUG:  This is a DEBUG message
 f_test_msg 
------------
 
(1 row)

postgres=# 

In the documentation of RAISE the only DEBUG level is “DEBUG” anyway. Using this method you can easily control the amount of messages your code will return, e.g.:

postgres=# create or replace function f_test () returns void 
postgres-# as
postgres-# $$
postgres$# declare
postgres$# begin
postgres$#     raise debug 'This is a DEBUG message';
postgres$#     raise log 'This is a LOG message';
postgres$#     raise notice 'This is an NOTICE message';
postgres$#     raise warning 'This is a WARNING message';
postgres$# end;
postgres$# $$ language plpgsql;
CREATE FUNCTION

Depending on you current setting of client_min_messages you’ll get more or less messages:

postgres=# set client_min_messages = 'WARNING';
SET
postgres=# select f_test();
WARNING:  This is a WARNING message
 f_test 
--------
 
(1 row)

postgres=# set client_min_messages = 'NOTICE';
SET
postgres=# select f_test();
NOTICE:  This is an NOTICE message
WARNING:  This is a WARNING message
 f_test 
--------
 
(1 row)

postgres=# set client_min_messages = 'LOG';
SET
postgres=# select f_test();
LOG:  This is a LOG message
NOTICE:  This is an NOTICE message
WARNING:  This is a WARNING message
 f_test 
--------
 
(1 row)

postgres=# set client_min_messages = 'DEBUG';
SET
postgres=# select f_test();
DEBUG:  This is a DEBUG message
LOG:  This is a LOG message
NOTICE:  This is an NOTICE message
WARNING:  This is a WARNING message
 f_test 
--------
 
(1 row)

Use that wisely and you will easily be able to debug your code by just setting the correct level for the information you want to get.

log_min_messages on the other side controls how much information is written to PostgreSQL’s log file. There are two more levels you can set than for client_min_messages but the logic is the same:

  • DEBUG5
  • DEBUG4
  • DEBUG3
  • DEBUG2
  • DEBUG1
  • INFO
  • NOTICE
  • WARNING
  • ERROR
  • LOG
  • FATAL

The default is “WARNING” so you might want to lower that if you need more information in the log file or you need to collect debugging information.

Cet article PostgreSQL message levels est apparu en premier sur Blog dbi services.

Documentum – Rename of Lockbox on RCS/CFS

Tue, 2020-03-24 14:00

As you probably know, Documentum introduced the optional use of a Lockbox since version 7. It was done initially to contribute to the security of the AEK key which is one of the central components of a Content Server. Time showed that, well, it’s not so easy and since then, nothing much changed on the Lockbox layer. With the version 16.4, OpenText introduced some new things regarding the Lockbox like the upgrade from version 3 to 4. On this blog, I will talk about one of the changes and that is renaming the file on RCS/CFS in case there is already one present with the same name. This blog will not discuss a technical issue because it’s not an issue, but it might become one, so I just wanted to share some thoughts about that.

 

As you know, when installing a repository in Documentum, you can either choose to use the basic AEK key or use a Lockbox that will contain it. Usually, people tend to use the basic AEK key, most probably because of the Documentum history: most companies are using Documentum for a very long time and therefore, there were no other choice at the beginning and it’s only an optional step to upgrade the AEK key to the Lockbox so it’s often left on the side. The second trend is to use a single Lockbox for all repositories of a specific server (Global Registry Repository + normal Repository(ies)). If you are in this case, then installing a Primary/First CS is the same thing as installing a Remote/Second CS (aka CFS). Or rather it was the case until Documentum 7.3 included.

 

Before talking about what changed, it’s important to know how Documentum works exactly in regard to the Lockbox. While installing an RCS/CFS, the installer will connect to the Primary CS to read data from the Repository. In the installation process, it will also execute the script “$DM_HOME/install/admin/dm_rcs_copyfiles.ebs”. What this script does is:

  1. Connect to the Primary CS
  2. Create a dm_method named dm_rcs_copyfiles
  3. Execute the dm_method dm_rcs_copyfiles on the Primary CS
  4. The above dm_method will create a dm_document named dm_rcs_copydoc that will contain all the global files of the Repository:
    • AEK key or Lockbox file (page 0)
    • dbpasswd.txt (page 1)
    • server.ini (page 2)
    • webcache.ini (page 3)
    • rkmrootcert.pem – if any
    • rkm_config.ini – if any
    • ldap_*.cnt – if any (7.3+)
  5. Download the files from the dm_document on the RCS/CFS to a temporary folder
  6. Destroy the dm_document and dm_method

 

Therefore, in regard to the Lockbox, it will basically take the file from the Repository on the Primary CS and put it on the RCS/CFS for usage. This is needed because of encryption mechanisms that uses the Lockbox. So, what changed exactly? Well, until Documentum 7.3, the RCS/CFS did copy the Lockbox from the Primary CS for each and every Repositories but it was just overwriting any files with the same name at the same location. Therefore, if you are using the default $DOCUMENTUM/dba/secure/lockbox.lb name on the Primary CS and using it for all Repositories, then on the RCS/CFS, it would just create one Lockbox file with the same name at the same location and overwriting it each and every time a new RCS/CFS is created.

 

You can potentially see what could go wrong with that. Let’s say that you have two different environments:

  • a first one with GR_A and RepoA, both using the same lockbox.lb
  • a second one with GR_B and RepoB, both using the same lockbox.lb

 

With the above, if you were trying to install an RCS/CFS for RepoA and then for RepoB using the same binaries on the same server, then the Lockbox of RepoA would be overwritten by the one of RepoB. The change that has been introduced in 16.4 (it might have been backported to some latest patches of 7.3 but at least it wasn’t there in the early patches) is that when you are installing a RCS/CFS, if a Lockbox already exist with the same name (lockbox.lb), then the installer will rename it to <RepoName>_lockbox.lb on the RCS/CFS and it will then update the server.ini to match the new name obviously. This means that the first RCS/CFS installed on the remote server (usually the Global Registry Repository) will continue to use lockbox.lb because it doesn’t exist at that time BUT the second RCS/CFS (usually the normal Repository) will use the <RepoName>_lockbox.lb file because lockbox.lb is already taken… I assume that the above problem happened for some customers of OpenText and therefore, this solution/workaround was probably implemented. Another possibility is that they needed that for the cloud setup to make sure no repositories overlap each other’s.

 

So, in the end, this is a as-good-as-it-can-be solution for this specific installation problem. However, if you are part the second group of people that usually use a single Lockbox for all Repositories of a single host and you don’t mix-up environments, then you might want to keep the old name. There are already a lot of differences between a Primary CS and a CFS/RCS (start script name, server.ini name, log file name) so unless there is a good reason (i.e.: if it’s needed), I would personally continue to use lockbox.lb for all Repositories of the RCS/CFS. This is in the perspective of keeping the servers aligned as much as possible to simplify the maintenance work. Obviously, the final decision is up to you.

 

If you want to keep using the same Lockbox on the RCS/CFS for all Repositories, then after the installation, you can just update the server.ini. Please note that we are always using server.ini on all Content Servers, no matter if it’s a Primary or a Remote. On RCS/CFS, this is actually a symlink to the real file name to simplify our work and therefore the below command uses the “–follow-symlinks” option:

[dmadmin@cs-1 ~]$ repo=Repo1
[dmadmin@cs-1 ~]$ lockbox_name=lockbox.lb
[dmadmin@cs-1 ~]$
[dmadmin@cs-1 ~]$ ls -l $DOCUMENTUM/dba/config/${repo}/server.ini
lrwxrwxrwx 1 dmadmin dmadmin 67 Oct 14 12:03 $DOCUMENTUM/config/${repo}/server.ini -> $DOCUMENTUM/config/${repo}/server_cs-1_${repo}.ini
[dmadmin@cs-1 ~]$
[dmadmin@cs-1 ~]$ grep "crypto_" $DOCUMENTUM/dba/config/${repo}/server.ini
crypto_mode = AES256_RSA1024_SHA256
crypto_keystore = Local
crypto_lockbox = Repo1_lockbox.lb
crypto_keyname = CSaek
[dmadmin@cs-1 ~]$
[dmadmin@cs-1 ~]$ sed -i --follow-symlinks "s/^\(crypto_lockbox[[:space:]]*=[[:space:]]*\).*/\1${lockbox_name}/" $DOCUMENTUM/dba/config/${repo}/server.ini
[dmadmin@cs-1 ~]$
[dmadmin@cs-1 ~]$ grep crypto_ $DOCUMENTUM/dba/config/${repo}/server.ini
crypto_mode = AES256_RSA1024_SHA256
crypto_keystore = Local
crypto_lockbox = lockbox.lb
crypto_keyname = CSaek
[dmadmin@cs-1 ~]$

 

Then simply restart the Repository. In case you made a mistake and the file lockbox.lb wasn’t the same for all Repositories, then it just won’t start. That was just my 2-cents on this change, if you are a different opinion, feel free to share! I didn’t test because I usually use the Lockbox, but I assume OpenText implemented the same thing for the AEK key?

 

Cet article Documentum – Rename of Lockbox on RCS/CFS est apparu en premier sur Blog dbi services.

Documentum – dm_DMFilescan fails with invalid method_verb?

Mon, 2020-03-23 14:00

Recently in a project, one of the Documentum environments we were working on started showing failure on the execution of the dm_DMFilescan job and only this one. After a couple minutes of investigation, I quickly found the root cause of the issue which wasn’t very important, but I thought I would share it because of the implications it has. In addition to that, it’s also not the first time I’m seeing this kind of issue so let’s dig into it.

 

The error shown in the job log was the following one:

[dmadmin@cs-0 ~]$ cd $DOCUMENTUM/dba/log/000f1234/sysadmin
[dmadmin@cs-0 sysadmin]$
[dmadmin@cs-0 sysadmin]$ cat ../agentexec/job_080f12348000035c
Sun Mar 15 08:50:32 2020 [INFORMATION] [LAUNCHER 25934] Detected while preparing job dm_DMFilescan for execution: Agent Exec connected to server Repo01:  [DM_SESSION_I_SESSION_START]info:  "Session 010f123480032063 started for user dmadmin."

[dmadmin@cs-0 sysadmin]$
[dmadmin@cs-0 sysadmin]$ cat DMFilescanDoc.txt
DMFilescan Report For DocBase Repo01 As Of 3/15/2020 09:50:37

Remove orphan content older than 168 hours.
  Generated DMFilescan script will be executed...
  The trace level is set to 0...

DMFilescan utility syntax: apply,c,NULL,DO_METHOD,METHOD,S,dmfilescan,ARGUMENTS,S,'-grace_period 168 '
Executing DMFilescan...
Unable to execute DMFilescan method...
[DM_METHOD_E_INVALID_MTHD_VERB]error:  "The dm_method named (dmfilescan) of type dmbasic has invalid method_verb (./dmfilescan)."


Exiting...
Report End  3/15/2020 09:50:37
[dmadmin@cs-0 sysadmin]$

 

Before explaining what happened, I think it’s necessary to take some time to explain how the execution of the dm_DMFilescan is done. The job “dm_DMFilescan” is a content type job which is used to scan storage areas in order to find files that do not have any relation to database objects. You can use it in report mode only (nothing done) or not for example. There are more details on the OpenText documentation but the thing to note here is that this job will execute the method of the same name: “dm_DMFilescan”. This method is a dmbasic one and it’s actually just an encapsulation. What I mean by that is it will actually execute a common dmbasic for several jobs, the only thing that changes are the parameters of the method. There are four jobs that will use the exact same dmbasic script and you can find the list easily:

[dmadmin@cs-0 sysadmin]$ iapi ${repo} -Udmadmin -Pxxx << EOC
> ?,c,select object_name, method_verb from dm_method where method_verb like '%mthd1.ebs%'
> EOC

        OpenText Documentum iapi - Interactive API interface
        Copyright (c) 2018. OpenText Corporation
        All rights reserved.
        Client Library Release 16.4.0170.0080

Connecting to Server using docbase Repo01
[DM_SESSION_I_SESSION_START]info:  "Session 010f123480032086 started for user dmadmin."

Connected to OpenText Documentum Server running Release 16.4.0170.0234  Linux64.Oracle
Session id is s0
API> object_name      method_verb
--------------------  ---------------------------------------------------------
dm_ContentWarning     ./dmbasic -f../install/admin/mthd1.ebs -eContentWarning
dm_DMClean            ./dmbasic -f../install/admin/mthd1.ebs -eDMClean
dm_DMFilescan         ./dmbasic -f../install/admin/mthd1.ebs -eDMFilescan
dm_DMArchive          ./dmbasic -f../install/admin/mthd1.ebs -eDMArchive
(4 rows affected)

API> Bye
[dmadmin@cs-0 sysadmin]$

 

Take a look at these “mthdx.ebs” files if you aren’t familiar with them, it’s always good to know how it actually works. This dmbasic method will “only” prepare and execute another method which is then specific to each of the jobs. For the “dm_DMFilescan” dmbasic method for example, it will execute the “dmfilescan” method. This second method isn’t a dmbasic one, it’s a binary. So why am I saying all that? Well it’s simply to explain where the issue is coming from… When reading the error message from the log file, one might thing at first sight that the issue is with the “dm_DMFilescan” method but actually it’s not. The issue was with the second method and if you read it carefully, you actually have all key items to solve it. As said, the “dmfilescan” method is executing a binary but on the error message above, it is saying that this method is of type dmbasic currently. Comparing the final method for the “dm_DMFilescan” and “dm_DMClean” jobs, you can see the error:

[dmadmin@cs-0 sysadmin]$ iapi ${repo} -Udmadmin -Pxxx << EOC
> ?,c,select object_name, method_type, method_verb from dm_method where object_name in ('dmfilescan','dmclean')
> EOC

        OpenText Documentum iapi - Interactive API interface
        Copyright (c) 2018. OpenText Corporation
        All rights reserved.
        Client Library Release 16.4.0170.0080

Connecting to Server using docbase Repo01
[DM_SESSION_I_SESSION_START]info:  "Session 010f12348003209b started for user dmadmin."

Connected to OpenText Documentum Server running Release 16.4.0170.0234  Linux64.Oracle
Session id is s0
API> object_name      method_type    method_verb
--------------------  -------------  --------------
dmclean                              ./dmclean
dmfilescan            dmbasic        ./dmfilescan
(2 rows affected)

API> Bye
[dmadmin@cs-0 sysadmin]$

 

What happened, in my opinion, is that someone opened the properties of the “dmfilescan” method in Documentum Administrator but instead of using “Cancel” to close it, he clicked on “OK” which then saved the current configuration. Unfortunately, with DA, there are often cases where opening a property page can load values that aren’t currently configured and therefore when you click on “OK”, it will save back a value that wasn’t the initial one… I have seen that quite often on job’s target_server for example when working on HA environment: you have a job configured to run on ANY (‘ ‘) and while opening it on DA, it can show the Primary CS target (‘Repo.Repo@cs-0’ for example). Therefore, while clicking on “OK”, it will change the target_server of this job. I believe this is what happened here for this specific method because by default it’s a binary and therefore the method_type should be ‘ ‘ but in this case, it was changed recently to dmbasic. Changing the method_type back to the correct value and the job is working again:

[dmadmin@cs-0 sysadmin]$ iapi ${repo} -Udmadmin -Pxxx << EOC
> ?,c,update dm_method object set method_type=' ' where object_name='dmfilescan'
> EOC

        OpenText Documentum iapi - Interactive API interface
        Copyright (c) 2018. OpenText Corporation
        All rights reserved.
        Client Library Release 16.4.0170.0080

Connecting to Server using docbase Repo01
[DM_SESSION_I_SESSION_START]info:  "Session 010f1234800320a7 started for user dmadmin."

Connected to OpenText Documentum Server running Release 16.4.0170.0234  Linux64.Oracle
Session id is s0
API> objects_updated
---------------
              1
(1 row affected)
[DM_QUERY_I_NUM_UPDATE]info:  "1 objects were affected by your UPDATE statement."

API> Bye
[dmadmin@cs-0 sysadmin]$
[dmadmin@cs-0 sysadmin]$ cat DMFilescanDoc.txt | grep -v ^#
DMFilescan Report For DocBase Repo01 As Of 3/15/2020 10:03:21

Remove orphan content older than 168 hours.
  Generated DMFilescan script will be executed...
  The trace level is set to 0...

DMFilescan utility syntax: apply,c,NULL,DO_METHOD,METHOD,S,dmfilescan,ARGUMENTS,S,'-grace_period 168 '
Executing DMFilescan...
Shell was successful!

Generated script from the DMFilescan method:
----- Start $DOCUMENTUM/dba/log/000f1234/sysadmin/080f12348000035c.bat output ------------------------

------- End $DOCUMENTUM/dba/log/000f1234/sysadmin/080f12348000035c.bat output ------------------------
Destroying DMFilescan script with ID 090f12348003d932...
Report End  3/15/2020 10:05:27
[dmadmin@cs-0 sysadmin]$

 

So, take care while using Documentum Administrator because it might surprise you.

 

Cet article Documentum – dm_DMFilescan fails with invalid method_verb? est apparu en premier sur Blog dbi services.

Migrating an Oracle instance to AWS Aurora – 4 – AWS DMS Change Data Capture (CDC)

Mon, 2020-03-23 05:41

This is the last post in this little series about migrating on Oracle instance to AWS Aurora with PostgreSQL compatibility. For the previous one you can check here, here and here. In this post we’ll look at how you can configure change data capture so that changes on the source Oracle instance are automatically replicated to the target Aurora instance. In a real life migration this is what you probably need as long down times are usually not acceptable. We’ll be starting from exactly the same state we left of in the previous post.

We’ve initially loaded the source “SH” schema from Oracle to AWS Aurora and now we want to replicate all the changes that are happening on the source to the target as well. Remember that AWS DMS sits between the source and the target and the we need a source end point and a target end point:

We can use exactly the same end points as in the previous post (nothing to add or modify here):

What we need to configure is a new database migration task. We still have the task we used to do the initial loading which could be modified but I prefer to keep it separated and will go ahead with a new task:

We’ll be using the same replication instance and the same end points but “Migration type” will be “Replicate data changes only”. We already enabled “supplemental logging” in the last post so the warning can be ignored:

We’ll go with the default task settings:

As I know that nothing happened on the source I do not need to specify a specific start point for the replication. The same is true for the tables on the target: I already know that the tables are there and that the content of the tables on the target is the same as on the source. For the LOB stuff I’ll go with the defaults as well.

The table mappings section is configured the same as we did it for the initial load previously. We want to replicate all tables in the “SH” schema:

Finally all the defaults for the remaining parameters of the task:

While the task is creating:

… you’ll see that LogMiner is starting on the source Oracle instance:

2020-03-23T08:44:08.261039+01:00
Thread 1 advanced to log sequence 39 (LGWR switch)
  Current log# 3 seq# 39 mem# 0: /u03/oradata/DB19/redog3m1DB19.dbf
  Current log# 3 seq# 39 mem# 1: /u04/oradata/DB19/redog3m2DB19.dbf
2020-03-23T08:44:08.398220+01:00
ARC2 (PID:4670): Archived Log entry 15 added for T-1.S-38 ID 0x2c7a7c94 LAD:1
2020-03-23T08:54:41.598409+01:00
LOGMINER: summary for session# = 2147484673
LOGMINER: StartScn: 1084978 (0x0000000000108e32)
LOGMINER: EndScn: 1085221 (0x0000000000108f25)
LOGMINER: HighConsumedScn: 0
LOGMINER: PSR flags: 0x0
LOGMINER: Session Flags: 0x4000441
LOGMINER: Session Flags2: 0x0
LOGMINER: Read buffers: 4
LOGMINER: Region Queue size: 256
LOGMINER: Redo Queue size: 4096
LOGMINER: Memory LWM: limit 10M, LWM 12M, 80%
LOGMINER: Memory Release Limit: 0M
LOGMINER: Max Decomp Region Memory: 1M
LOGMINER: Transaction Queue Size: 1024
2020-03-23T08:54:41.627157+01:00
LOGMINER: Begin mining logfile for session -2147482623 thread 1 sequence 39, /u03/oradata/DB19/redog3m1DB19.dbf
2020-03-23T08:54:46.780204+01:00
LOGMINER: summary for session# = 2147485697
LOGMINER: StartScn: 1085020 (0x0000000000108e5c)
LOGMINER: EndScn: 1085288 (0x0000000000108f68)
LOGMINER: HighConsumedScn: 0
LOGMINER: PSR flags: 0x0
LOGMINER: Session Flags: 0x4000441
LOGMINER: Session Flags2: 0x0
LOGMINER: Read buffers: 4
LOGMINER: Region Queue size: 256
LOGMINER: Redo Queue size: 4096
LOGMINER: Memory LWM: limit 10M, LWM 12M, 80%
LOGMINER: Memory Release Limit: 0M
LOGMINER: Max Decomp Region Memory: 1M
LOGMINER: Transaction Queue Size: 1024
2020-03-23T08:54:46.788212+01:00
LOGMINER: Begin mining logfile for session -2147481599 thread 1 sequence 39, /u03/oradata/DB19/redog3m1DB19.dbf
2020-03-23T08:54:51.861617+01:00
LOGMINER: summary for session# = 2147486721
LOGMINER: StartScn: 1085087 (0x0000000000108e9f)
LOGMINER: EndScn: 1085362 (0x0000000000108fb2)
LOGMINER: HighConsumedScn: 0
LOGMINER: PSR flags: 0x0
LOGMINER: Session Flags: 0x4000441
LOGMINER: Session Flags2: 0x0
LOGMINER: Read buffers: 4
LOGMINER: Region Queue size: 256
LOGMINER: Redo Queue size: 4096
LOGMINER: Memory LWM: limit 10M, LWM 12M, 80%
LOGMINER: Memory Release Limit: 0M
LOGMINER: Max Decomp Region Memory: 1M
LOGMINER: Transaction Queue Size: 1024
2020-03-23T08:54:51.868450+01:00
LOGMINER: Begin mining logfile for session -2147480575 thread 1 sequence 39, /u03/oradata/DB19/redog3m1DB19.dbf

The task itself will report it’s status as “Replication ongoing” a few moments later:

Having a look at the table statistics section of the tasks of course all is reported as zero at the moment:

Time to do some changes in Oracle and check if these changes will be properly replicated to the Aurora target instance. Let’s create a new table on the Oracle side:

SQL> create table sh.test ( a number );

Table created.

This table will show up in the AWS DMS console quite fast:

… and is also available on the target Aurora side:

postgres=> \d "SH"."TEST"
           Table "SH.TEST"
 Column |      Type      | Modifiers 
--------+----------------+-----------
 A      | numeric(38,10) | 

postgres=> select * from "SH"."TEST";
 A 
---
(0 rows)

Inserting data on the source:

SQL> insert into sh.test values(1);

1 row created.

SQL> commit;

Commit complete.

… and nothing happens on the target side. Why does DDL succeed but DML not? Well, first of all it takes some time for the changes to show up in the console. Doing some more inserts on the source and waiting some time:

SQL> insert into sh.test values(2);

1 row created.

SQL> commit;    

Commit complete.

SQL> insert into sh.test values(3);

1 row created.

SQL> commit;

Commit complete.

… the changes are actually recorded:

… but nothing arrives on the target:

postgres=> select * from "SH"."TEST";
 A 
---
(0 rows)

Why that? Let’s do some more inserts:

SQL> insert into sh.test select -1 from dba_objects;

23522 rows created.

SQL> commit;

Commit complete.

This is reported as “1” insert in the console as the number of inserts switched from 6 to 7:

Some picture, nothing there on the target. The issues was, that supplemental logging needs to be enabled for each table or on the database level, and this is what I did:

ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

Re-creating the table on Oracle side:

SQL> drop table sh.test;

Table dropped.

SQL> create table sh.test ( a number primary key );

Table created.

SQL> insert into sh.test values(1);

1 row created.

SQL> insert into sh.test values(2);

1 row created.

SQL> commit;

Commit complete.

SQL> 

From now on new data is showing up in the target Aurora instance:

postgres=> select * from "SH"."TEST";
      A       
--------------
 1.0000000000
 2.0000000000
(2 rows)

postgres=> 

So, again, quite easy to setup. Of course the usual bits apply as for every logical replication: You should have primary keys or at least unique keys on the source tables for the replication to run performant and smooth. Another point to add here: When you check the settings of the replication task there is one important parameter that should be enabled (you need to stop the task, otherwise it can not be modified):

Enabling this will create additional tables in the target Aurora instance and these give you more information on what is going on with the replication:

postgres=> select * from pg_tables where tablename like 'awsdms%';
 schemaname  |        tablename        | tableowner | tablespace | hasindexes | hasrules | hastriggers | rowsecurity 
-------------+-------------------------+------------+------------+------------+----------+-------------+-------------
 public      | awsdms_apply_exceptions | postgres   |            | f          | f        | f           | f
 aws_postgis | awsdms_apply_exceptions | postgres   |            | f          | f        | f           | f
 public      | awsdms_history          | postgres   |            | t          | f        | f           | f
 public      | awsdms_status           | postgres   |            | t          | f        | f           | f
 public      | awsdms_suspended_tables | postgres   |            | t          | f        | f           | f
(5 rows)

Here is an example of the replication history:

postgres=> select * from awsdms_history;
      server_name      |         task_name          |   timeslot_type   |      timeslot       | timeslot_duration | timeslot_latency | timeslot_records | timeslot_volume 
-----------------------+----------------------------+-------------------+---------------------+-------------------+------------------+------------------+-----------------
 localhost.localdomain | XEBQFBY6EH6SQNN6CBBMO25364 | CHANGE PROCESSING | 2020-03-23 09:42:29 |                 0 |                0 |                0 |               0
 localhost.localdomain | XEBQFBY6EH6SQNN6CBBMO25364 | CHANGE PROCESSING | 2020-03-23 09:47:29 |                 5 |                0 |            70592 |              28
 localhost.localdomain | XEBQFBY6EH6SQNN6CBBMO25364 | CHANGE PROCESSING | 2020-03-23 09:52:29 |                 5 |                0 |                6 |               0
 localhost.localdomain | XEBQFBY6EH6SQNN6CBBMO25364 | CHANGE PROCESSING | 2020-03-23 09:57:30 |                 5 |                0 |                0 |               0
 localhost.localdomain | XEBQFBY6EH6SQNN6CBBMO25364 | CHANGE PROCESSING | 2020-03-23 10:02:31 |                 5 |                0 |                0 |               0
 localhost.localdomain | XEBQFBY6EH6SQNN6CBBMO25364 | CHANGE PROCESSING | 2020-03-23 10:07:31 |                 5 |                0 |                0 |               0
 localhost.localdomain | XEBQFBY6EH6SQNN6CBBMO25364 | CHANGE PROCESSING | 2020-03-23 10:12:32 |                 5 |                0 |                0 |               0
 localhost.localdomain | XEBQFBY6EH6SQNN6CBBMO25364 | CHANGE PROCESSING | 2020-03-23 10:17:32 |                 5 |                0 |                0 |               0
 localhost.localdomain | XEBQFBY6EH6SQNN6CBBMO25364 | CHANGE PROCESSING | 2020-03-23 10:22:39 |                 0 |                0 |                0 |               0
(9 rows)

These tables can be used to monitor the replication, especially the apply exceptions.

To conclude this final post: The AWS Schema Conversion Tool is a great help for converting the schema and even comes with some Oracle compatibility. Use it, it saves a lot of manual work. AWS DMS on the other side is really easy to implement, the initial load is really easy to setup and change data capture works as expected. Of course this was only a playground and real issues will pop up when you do a real migration, especially when you have to migrate business logic inside the database.

Cet article Migrating an Oracle instance to AWS Aurora – 4 – AWS DMS Change Data Capture (CDC) est apparu en premier sur Blog dbi services.

Documentum – LSS registerOracleVersionView script with wrong content

Sun, 2020-03-22 02:00

As discussed in a previous blog, working with LSS might prove a little bit challenging from time to time. In this blog, I wanted to share an error I saw while installing LSS 16.6.1 on an Oracle database. Initially, I developed my silent installation for LSS (while encapsulate the LSS silent scripts provided by OpenText) using a PostgreSQL database because it’s usually easier to setup an environment on Kubernetes with PG because of licenses.

 

So, the silent installation scripts were created several months ago and working since then, apparently. Recently, I had to execute manually my silent install script of LSS on an environment which was using an Oracle database. The script completed properly, my automatic log file checking didn’t show any sign or errors or anything so for me it was fully installed. However, I still did a review of the logs printed on the screen to be sure and I did see a new “error” I wasn’t familiar with. I’m not sure you can call that an error because it’s just one line drowned in the flood of logs printed without any “ERROR” or “_E_” messages but it is clearly an error from a Linux point of view:

...
./registerOracleVersionView.sh: line 1: oracle: command not found
...

 

This message never appeared in the generated log file of the LSS installation, it’s only displayed on the screen, which makes it… quite difficult to see in automation. So, anyway, what’s the issue this time? Well looking at the message, it’s clear that the shell script has a wrong content because it is trying to execute a command “oracle” which doesn’t exist. Where is this file? What’s its content?

[dmadmin@cs-0 ~]$ workspace="/tmp/lss/"
[dmadmin@cs-0 ~]$
[dmadmin@cs-0 ~]$ cd ${workspace}/*/
[dmadmin@cs-0 LSSuite]$
[dmadmin@cs-0 LSSuite]$ ls -l *.sh
-rwxr-x--- 1 dmadmin dmadmin 13479 Oct  4 09:15 LSConfigImport.sh
-rwxr-x--- 1 dmadmin dmadmin  4231 Oct  4 09:15 iHubConfigImport.sh
-rwxr-x--- 1 dmadmin dmadmin  8384 Oct  4 09:15 install.sh
-rwxr-x--- 1 dmadmin dmadmin  3096 Oct  4 09:15 myInsightPostInstall.sh
[dmadmin@cs-0 LSSuite]$
[dmadmin@cs-0 LSSuite]$ find . -name registerOracleVersionView.sh
./scripts/registerOracleVersionView.sh
[dmadmin@cs-0 LSSuite]$
[dmadmin@cs-0 LSSuite]$ cd ./scripts/
[dmadmin@cs-0 scripts]$
[dmadmin@cs-0 scripts]$ cat registerOracleVersionView.sh
if  "$4" == "oracle"
then
        idql "$1" -U"$2" -P"$3" -R"./scripts/$4/oracleVersion.dql"
fi
[dmadmin@cs-0 scripts]$

 

If you are familiar with bash/shell scripting, you probably already saw what’s wrong with the script. It’s simply that this isn’t the correct way to write IF statements. I won’t go into the details of the correct formatting (one bracket, two brackets, with test command, aso…) because there are already plenty of documentation around that online but that’s definitively not a correct way to write IF statements. So, to correct this script, I opened the OpenText SR#4450083 and provided them the commands to fix it in a future patch/release. I didn’t receive a confirmation yet but it should be in the next LSS release. In the meanwhile, I put the workaround on my silent install script (if the correct format is already there it won’t be anything but if it’s not, then it will correct the file):

[dmadmin@cs-0 scripts]$ cat registerOracleVersionView.sh
if  "$4" == "oracle"
then
        idql "$1" -U"$2" -P"$3" -R"./scripts/$4/oracleVersion.dql"
fi
[dmadmin@cs-0 scripts]$
[dmadmin@cs-0 scripts]$ ./registerOracleVersionView.sh Repo01 dmadmin xxx oracle
./registerOracleVersionView.sh: line 1: oracle: command not found
[dmadmin@cs-0 scripts]$
[dmadmin@cs-0 scripts]$ sed -i -e 's,^if[[:space:]]*",if \[\[ ",' -e 's,^if \[\[ .*"$,& \]\],' registerOracleVersionView.sh
[dmadmin@cs-0 scripts]$
[dmadmin@cs-0 scripts]$ cat registerOracleVersionView.sh
if [[ "$4" == "oracle" ]]
then
        idql "$1" -U"$2" -P"$3" -R"./scripts/$4/oracleVersion.dql"
fi
[dmadmin@cs-0 scripts]$
[dmadmin@cs-0 scripts]$ ./registerOracleVersionView.sh Repo01 dmadmin xxx oracle

        OpenText Documentum idql - Interactive document query interface
        Copyright (c) 2018. OpenText Corporation
        All rights reserved.
        Client Library Release 16.4.0170.0080

Connecting to Server using docbase Repo01
[DM_SESSION_I_SESSION_START]info:  "Session 010f1234800113af started for user dmadmin."

Connected to OpenText Documentum Server running Release 16.4.0170.0234  Linux64.Oracle
1> 2> result
------------
T
(1 row affected)
1> 2> new_object_ID
----------------
190f1234800edc59
(1 row affected)
1>
[dmadmin@cs-0 scripts]$
[dmadmin@cs-0 scripts]$ cat ./scripts/oracle/oracleVersion.dql
execute exec_sql with query = 'create view oracle_version as select * from v$version'
go
REGISTER TABLE dm_dbo.oracle_version (banner String(80))
go[dmadmin@cs-0 scripts]$

 

As you can see above, the shell executes a DQL script “oracleVersion.dql”. This simply creates a new view “oracle_version”. I have no clue where this might be used in LSS but what I can tell you is that this script was already wrong in LSS 16.6.0 (released in Jul 2019 I believe) and nobody complained about it so far apparently, so maybe you can wait for the official fix from OpenText or you can fix it yourself like I did, up to you!

 

Cet article Documentum – LSS registerOracleVersionView script with wrong content est apparu en premier sur Blog dbi services.

Documentum – D2-Smartview class cast exception

Sat, 2020-03-21 11:22

D2-Smartview is a new UI that OpenText now provides starting with the version 16 of D2. It’s a lightweight UI that can perform some of the actions that D2 does. The list of features will probably increase with time but at the moment, I guess it’s more for simple users that have very basic needs, consumer like roles mainly. An interesting thing is that with a small configuration, users can switch between D2 and D2-Smartview on the fly.

 

The issue I wanted to talk about in this blog is a class cast exception that might while trying to download a document using D2-Smartview that has been deployed on a WebLogic Server:

2020-02-21 10:42:11,168 UTC [INFO ] ([ACTIVE] ExecuteThread: '30' for queue: 'weblogic.kernel.Default (self-tuning)') - c.e.d.a.c.documentset.D2DocumentSetSwitch     : D2DocumentSetSwitch.getDocumentSetList end: 0.000s
2020-02-21 10:42:11,184 UTC [INFO ] ([ACTIVE] ExecuteThread: '30' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2fs.dctm.utils.FormatUtils           : Compiled format pattern [a-zA-Z_\-0-9]+{1,32}
2020-02-21 10:42:11,405 UTC [INFO ] ([ACTIVE] ExecuteThread: '30' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.common.dctm.utils.DfServerUtil        : loadBalancedContentServer set to false
2020-02-21 10:42:11,681 UTC [INFO ] ([ACTIVE] ExecuteThread: '95' for queue: 'weblogic.kernel.Default (self-tuning)') - c.e.d.d.w.services.config.D2X3ConfigService   : D2X3ConfigService.getAvailableWidgets end : 0.119s
2020-02-21 10:42:11,903 UTC [ERROR] ([ACTIVE] ExecuteThread: '34' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2fs.dctm.servlets.D2HttpServlet      : Download could not parse request body
2020-02-21 10:42:11,908 UTC [ERROR] ([ACTIVE] ExecuteThread: '34' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2fs.dctm.servlets.D2HttpServlet      : Request parsing failed
java.lang.ClassCastException: weblogic.security.principal.WLSUserImpl cannot be cast to com.emc.d2fs.authc.HttpAuthPrincipal
        at com.emc.d2fs.dctm.servlets.D2HttpContext.<init>(D2HttpContext.java:259)
        at com.emc.d2fs.dctm.servlets.D2HttpServlet.doGetAndPost(D2HttpServlet.java:360)
        at com.emc.d2fs.dctm.servlets.D2HttpServlet.doGet(D2HttpServlet.java:113)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
        at weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:286)
        at weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:260)
        at weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:137)
        at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:350)
        at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:247)
        at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.wrapRun(WebAppServletContext.java:3705)
        at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3672)
        at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:328)
        at weblogic.security.service.SecurityManager.runAsForUserCode(SecurityManager.java:197)
        at weblogic.servlet.provider.WlsSecurityProvider.runAsForUserCode(WlsSecurityProvider.java:203)
        at weblogic.servlet.provider.WlsSubjectHandle.run(WlsSubjectHandle.java:71)
        at weblogic.servlet.internal.WebAppServletContext.doSecuredExecute(WebAppServletContext.java:2443)
        at weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2291)
        at weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2269)
        at weblogic.servlet.internal.ServletRequestImpl.runInternal(ServletRequestImpl.java:1705)
        at weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1665)
        at weblogic.servlet.provider.ContainerSupportProviderImpl$WlsRequestExecutor.run(ContainerSupportProviderImpl.java:272)
        at weblogic.invocation.ComponentInvocationContextManager._runAs(ComponentInvocationContextManager.java:352)
        at weblogic.invocation.ComponentInvocationContextManager.runAs(ComponentInvocationContextManager.java:337)
        at weblogic.work.LivePartitionUtility.doRunWorkUnderContext(LivePartitionUtility.java:57)
        at weblogic.work.PartitionUtility.runWorkUnderContext(PartitionUtility.java:41)
        at weblogic.work.SelfTuningWorkManagerImpl.runWorkUnderContext(SelfTuningWorkManagerImpl.java:652)
        at weblogic.work.ExecuteThread.execute(ExecuteThread.java:420)
        at weblogic.work.ExecuteThread.run(ExecuteThread.java:360)
2020-02-21 10:42:11,931 UTC [ERROR] ([ACTIVE] ExecuteThread: '34' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.documentum.rest.util.LogHelper        : LogId: 2516f8e0-f920-437c-86f3-502b3ec8ad14, Status: 500, code: E_INTERNAL_SERVER_ERROR, message: An internal server error occurs.
java.lang.ClassCastException: weblogic.security.principal.WLSUserImpl cannot be cast to com.emc.d2fs.authc.HttpAuthPrincipal
        at com.emc.d2fs.dctm.servlets.D2HttpContext.<init>(D2HttpContext.java:259)
        at com.emc.d2fs.dctm.servlets.D2HttpServlet.doGetAndPost(D2HttpServlet.java:360)
        at com.emc.d2fs.dctm.servlets.D2HttpServlet.doGet(D2HttpServlet.java:113)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
        at weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:286)
        at weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:260)
        at weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:137)
        at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:350)
        at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:247)
        at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.wrapRun(WebAppServletContext.java:3705)
        at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3672)
        at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:328)
        at weblogic.security.service.SecurityManager.runAsForUserCode(SecurityManager.java:197)
        at weblogic.servlet.provider.WlsSecurityProvider.runAsForUserCode(WlsSecurityProvider.java:203)
        at weblogic.servlet.provider.WlsSubjectHandle.run(WlsSubjectHandle.java:71)
        at weblogic.servlet.internal.WebAppServletContext.doSecuredExecute(WebAppServletContext.java:2443)
        at weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2291)
        at weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2269)
        at weblogic.servlet.internal.ServletRequestImpl.runInternal(ServletRequestImpl.java:1705)
        at weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1665)
        at weblogic.servlet.provider.ContainerSupportProviderImpl$WlsRequestExecutor.run(ContainerSupportProviderImpl.java:272)
        at weblogic.invocation.ComponentInvocationContextManager._runAs(ComponentInvocationContextManager.java:352)
        at weblogic.invocation.ComponentInvocationContextManager.runAs(ComponentInvocationContextManager.java:337)
        at weblogic.work.LivePartitionUtility.doRunWorkUnderContext(LivePartitionUtility.java:57)
        at weblogic.work.PartitionUtility.runWorkUnderContext(PartitionUtility.java:41)
        at weblogic.work.SelfTuningWorkManagerImpl.runWorkUnderContext(SelfTuningWorkManagerImpl.java:652)
        at weblogic.work.ExecuteThread.execute(ExecuteThread.java:420)
        at weblogic.work.ExecuteThread.run(ExecuteThread.java:360)
2020-02-21 10:42:13,972 UTC [ERROR] ([ACTIVE] ExecuteThread: '25' for queue: 'weblogic.kernel.Default (self-tuning)') - c.e.d.d.w.s.b.D2BravaCSRAnnotationService     : Invalid object id:
2020-02-21 10:42:13,978 UTC [ERROR] ([ACTIVE] ExecuteThread: '25' for queue: 'weblogic.kernel.Default (self-tuning)') - c.e.d.d.d.i.D2BravaCSRAnnotationsManagerImpl  : Error getting changemark for document 090f123480003979

 

D2-Smartview is using, in some part, REST services and therefore you might find several similarities between D2-REST and D2-Smartview for example. If you are familiar with the D2-REST application on WebLogic, you might know that OpenText usually asks you to remove a certain set of JARs, then disable some security, aso… It’s not surprising that the documentation for D2-Smartview also asks the same kind of thing. For D2-REST, I personally never had to remove JARs (except this one, which isn’t in the list from OpenText anyway) and I never had to disable the basic authentication. This is because I could always rely on LDAP authentication. Most companies rely on LDAP nowadays and therefore you usually don’t need to disable WebLogic basic authentication on D2-REST because you can just configure WebLogic to use the LDAP Server for authentication and D2-REST will use the same account from the Repository (assuming it is also integrated with the LDAP).

 

Therefore, when I started to do some engineering on D2-Smartview, to deploy and test it, I obviously didn’t follow all the documentation provided by OpenText, only the parts that I usually do for D2-REST because I know that’s sufficient. These are the things that OpenText asks you to do for D2-Smartview 16.5.1 that I simply ignored:

  • Disable web service annotation scan
  • Add “-Dweblogic.servlet.DIDisabled=true” to JVM parameters of the Managed Server
  • Remove JARs files: jsr*.jar, stax-api-*.jar, stax2-api-*.jar, xml-apis-*.jar, xmlParserAPIs-*.jar
  • Disable WebLogic basic authentication: <enforce-valid-basic-auth-credentials>false</enforcevalid-basic-auth-credentials>

 

With the above ignored, I could deploy successfully D2-Smartview 16.5.1. The login was working properly with my LDAP account, I could browse the repository, perform searches, see properties of documents, aso… However, there was one thing not working: the download of document which failed with the above exception. As you can probably tell yourself, the exception is related to class cast exception from WebLogic principal (weblogic.security.principal.WLSUserImpl) to D2-Smartview principal (com.emc.d2fs.authc.HttpAuthPrincipal). Therefore, this was most probably linked to the basic authentication that I kept enabled just like for D2-REST. It looked like for D2-Smartview, it was really needed to disable it.

 

Usually, I don’t like to disable security because you never know… This is especially true for the WebLogic basic authentication because this configuration applies to the whole WebLogic Domain! OpenText claims that disabling the basic authentication on WebLogic isn’t a security issue for their applications (DA, D2, D2-Config, D2-REST, D2-Smartview, d2ls, XMLViewer, DFS, …) because each of these handle the authentication directly. However, at some point in the future, some custom application might be deployed on that domain that expected WebLogic to perform the authentication so you might end-up with security holes. Unfortunately for D2-Smartview, it looks like there is no way around it at the moment (contrary to D2-REST), so disabling the basic authentication is needed if you expect it to work fully:

[weblogic@ws-0 ~]$ stopDOMAIN

The server 'msDA-01' has been stopped successfully.
The server 'msD2Conf-01' has been stopped successfully.
The server 'msD2SV-01' has been stopped successfully.
The server 'msD2-01' has been stopped successfully.
The server 'AdminServer' has been stopped successfully.
The NodeManager has been stopped successfully.

[weblogic@ws-0 ~]$
[weblogic@ws-0 ~]$ grep -A1 "valid-basic" $DOMAIN_HOME/config/config.xml
[weblogic@ws-0 ~]$
[weblogic@ws-0 ~]$ sed -i 's,.*</security-configuration>,    <enforce-valid-basic-auth-credentials>false</enforce-valid-basic-auth-credentials>\n&,' $DOMAIN_HOME/config/config.xml
[weblogic@ws-0 ~]$
[weblogic@ws-0 ~]$ grep -A1 "valid-basic" $DOMAIN_HOME/config/config.xml
    <enforce-valid-basic-auth-credentials>false</enforce-valid-basic-auth-credentials>
  </security-configuration>
[weblogic@ws-0 ~]$
[weblogic@ws-0 ~]$ startDOMAIN

The NodeManager has been started successfully.
The server 'AdminServer' has been started successfully.
The server 'msDA-01' has been started successfully.
The server 'msD2Conf-01' has been started successfully.
The server 'msD2SV-01' has been started successfully.
The server 'msD2-01' has been started successfully.

[weblogic@ws-0 ~]$

 

As you can see above, I updated the config.xml directly but the preferred way to do it is using WLST. There are several explanations on that on the internet. Once that’s done, the download of files is then working as expected. Maybe OpenText will change the source code so that it doesn’t need that at some point in the future but until then at least, no real other solution. The other three requirements from the documentation are still not in place on my side and I didn’t find anything not working so far so I guess I will just continue to ignore them (especially the “DIDisabled” one).

 

Cet article Documentum – D2-Smartview class cast exception est apparu en premier sur Blog dbi services.

Migrating an Oracle instance to AWS Aurora – 3 – Data replication with AWS DMS

Fri, 2020-03-20 09:31

If you followed the last two posts in this little series (here and here) this is the current status of this demo migration project: We have converted the Oracle SH schema and applied it to the target AWS Aurora with PostgreSQL compatibility instance. This worked quite well but a few objects (mostly PL/SQL procedures) would need manual actions. The next and final step is to replicate the data from Oracle to the Aurora instance using AWS DMS (Data Migration Service). As with all logical replications this is a two step process: First a consistent state of the current data needs to be loaded to the target. Once that is completed changes that happened since the load on the source need to be captured and replayed on the target.

Coming back to our current setup:

What we do not need anymore is the Windows host, this one was only required for the AWS Schema Conversion Utility (although you can install that on Linux and macOS as well). The target setup will look like this:

The key point here is the DMS replication instance. This one will connect to a source end point, mines the transaction logs and replays all the changes to the target end point. The replication instance will also take care of the initial load of the data. You have two choices for log mining, either Oracle LogMiner or the AWS DMS Binary Reader which is an AWS implementation for mining the log files. AWS recommends to use Oracle LogMiner in most situation except:

Before we go on and create the replication instance we need to prepare the Oracle instance. The first requirement is, that the Oracle instance need to be in archiving mode:

[oracle@ip-10-0-1-146 ~]$ sqlplus sys/manager@localhost/XE as sysdba

SQL*Plus: Release 18.0.0.0.0 - Production on Fri Mar 20 10:46:49 2020
Version 18.4.0.0.0

Copyright (c) 1982, 2018, Oracle.  All rights reserved.


Connected to:
Oracle Database 18c Express Edition Release 18.0.0.0.0 - Production
Version 18.4.0.0.0

SQL> shutdown immediate;

...
[oracle@ip-10-0-1-146 ~]$ sqlplus / as sysdba

SQL*Plus: Release 18.0.0.0.0 - Production on Fri Mar 20 10:48:14 2020
Version 18.4.0.0.0

Copyright (c) 1982, 2018, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup mount
ORACLE instance started.

Total System Global Area 1191181104 bytes
Fixed Size                  8895280 bytes
Variable Size             436207616 bytes
Database Buffers          738197504 bytes
Redo Buffers                7880704 bytes
Database mounted.
SQL> alter database archivelog;

Database altered.

SQL> alter database open;

Database altered.

SQL> archive log list;
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            /opt/oracle/product/18c/dbhomeXE/dbs/arch
Oldest online log sequence     2
Next log sequence to archive   4
Current log sequence           4

The second requirement is that supplemental logging needs to be enabled:

SQL> ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA
*
ERROR at line 1:
ORA-44608: Supplemental logging settings cannot be modified in Oracle Database
Express Edition.


SQL> SELECT supplemental_log_data_min FROM v$database;

SUPPLEME
--------
NO

SQL> 

So, with our setup this is not possible as Oracle XE 18 does not support supplemental logging. What we can do is to load the complete data, but we can not capture changes. For some migrations that might actually be fine for others the downtime for fully loading the data is not acceptable. So lets see how far we will come.

Step one is creating the end points. One for Oracle as the source:



… and the other one for the target:


Both end points are now ready:

Next we need the replication instance:


The creation of the replication instance will take some time:

Now we need something that connects the end points with the replication instance and that is the goal of the “Database migration tasks”:

Without log mining we can only go for “Migrate existing data”:




Nothing really happens, and the reason is this:

So, with our current setup we can not proceed. What I did is: I took one of our training EC2 instances which runs Oracle 19c, enabled archive mode, supplemental logging and installed the same sample schemas. Now the connection is successful:

Restart the task:


After a few seconds the load is complete:

That was quite easy, lets see if there really is something in the Aurora instance:

postgres=> select * from "SH"."PRODUCTS";
 PROD_ID |                   PROD_NAME                    |                      PROD_DESC                      |   PROD_SUBCATEGORY   | PROD_SUBCATEGORY_ID | PROD_SUBCATEGORY_DESC |        PROD_CATEGORY        | PROD_CATEGORY_ID |     PROD_CATEGORY_DESC      | PROD_WEIG
HT_CLASS | PROD_UNIT_OF_MEASURE | PROD_PACK_SIZE | SUPPLIER_ID | PROD_STATUS | PROD_LIST_PRICE | PROD_MIN_PRICE | PROD_TOTAL | PROD_TOTAL_ID | PROD_SRC_ID |    PROD_EFF_FROM    | PROD_EFF_TO | PROD_VALID 
---------+------------------------------------------------+-----------------------------------------------------+----------------------+---------------------+-----------------------+-----------------------------+------------------+-----------------------------+----------
---------+----------------------+----------------+-------------+-------------+-----------------+----------------+------------+---------------+-------------+---------------------+-------------+------------
      13 | 5MP Telephoto Digital Camera                   | 5MP Telephoto Digital Camera                        | Cameras              |     2044.0000000000 | Cameras               | Photo                       |   204.0000000000 | Photo                       |          
       1 | U                    | P              |           1 | STATUS      |          899.99 |         899.99 | TOTAL      |  1.0000000000 |             | 1998-01-01 00:00:00 |             | A
      14 | 17" LCD w/built-in HDTV Tuner                  | 17" LCD w/built-in HDTV Tuner                       | Monitors             |     2035.0000000000 | Monitors              | Peripherals and Accessories |   203.0000000000 | Peripherals and Accessories |          
       1 | U                    | P              |           1 | STATUS      |          999.99 |         999.99 | TOTAL      |  1.0000000000 |             | 1998-01-01 00:00:00 |             | A
      15 | Envoy 256MB - 40GB                             | Envoy 256MB - 40Gb                                  | Desktop PCs          |     2021.0000000000 | Desktop PCs           | Hardware                    |   202.0000000000 | Hardware                    |          
       1 | U                    | P              |           1 | STATUS      |          999.99 |         999.99 | TOTAL      |  1.0000000000 |             | 1998-01-01 00:00:00 |             | A
      16 | Y Box                                          | Y Box                                               | Game Consoles        |     2011.0000000000 | Game Consoles         | Electronics                 |   201.0000000000 | Electronics                 |          
       1 | U                    | P              |           1 | STATUS      |          299.99 |         299.99 | TOTAL      |  1.0000000000 |             | 1998-01-01 00:00:00 |             | A
      17 | Mini DV Camcorder with 3.5" Swivel LCD         | Mini DV Camcorder with 3.5" Swivel LCD              | Camcorders           |     2041.0000000000 | Camcorders            | Photo                       |   204.0000000000 | Photo                       |          
       1 | U                    | P              |           1 | STATUS      |         1099.99 |        1099.99 | TOTAL      |  1.0000000000 |             | 1998-01-01 00:00:00 |             | A
      18 | Envoy Ambassador                               | Envoy Ambassador                                    | Portable PCs         |     2022.0000000000 | Portable PCs          | Hardware                    |   202.0000000000 | Hardware                    |          
       1 | U                    | P              |           1 | STATUS      |         1299.99 |        1299.99 | TOTAL      |  1.0000000000 |             | 1998-01-01 00:00:00 |             | A

Looks fine. Once you get used to the DMS interface and know what the differnent pieces are really there for it is quite easy to get a simple migration going. In the next post we’ll see how chance capture can be configured so that changes on the source are automatically replicated to the target.

Cet article Migrating an Oracle instance to AWS Aurora – 3 – Data replication with AWS DMS est apparu en premier sur Blog dbi services.

Migrating an Oracle instance to AWS Aurora – 2– The AWS Schema Conversion Tool (SCT)

Wed, 2020-03-18 10:16

Having the required infrastructure ready now it is time to have a look at the AWS Schema Conversion Tool (SCT). The goal of this tool is to convert a source schema (Oracle in our case) to a target schema (AWS Aurora in our case). From the description this should also migrate stored functions and procedures and we will see later on how well that works. Automatically converting Oracle’s PL/SQL to PostgreSQL’s PL/pgSQL quickly becomes tricky and often there is no way around re-implementing a major part of the business logic in the database. There is EDB Postgres Advanced Server which comes with Oracle compatibility and greatly reduces migration time but this is not the topic of this post.

Quickly coming back to the basic infrastructure, this is what we have now:

We’ll use the Windows host to install and configure AWS SCT and then convert the Oracle source schema to the Aurora PostgreSQL target schema. To connect to the Windows host you can right-click the Windows instance to get the required information:


Now that we are connected to the Windows host we can continue with installing AWS SCT. The latest version for Windows can be downloaded here. Having that downloaded and extracted simply execute it, you will not be asked any questions and once completed you will find AWS SCT in the start menu:


Simply start it and accept the license:

You will be directly asked what type of migration you want to do, so a new migration project can be created:

The next screen will directly ask you for the connection details to the Oracle instance. Before you do anything here you need to download the Oracle jdbc driver from here:

I’ve put the driver here:

,,. and now can select the driver in the SCT screen:

Before your connection test will succeed you will need to adjust the inbound rules of the security which is attached to the Oracle instance:

Once the connection fine SCT will load all the meta data of the source (we are only interested in the “HR” schema for now):

The next step is to connect to the target Aurora instance:

Some procedure here: Before you can connect you’ll need the driver (download from here):

Once ready (and again you need to adjust the security group to allow inbound connection on port 5432):

Now we have both sides connected: Oracle on the left and AWS Aurora on the right:

From now on the schema can be migrated from the source to the target:




Completed with a few issues:

Aurora for PostgreSQL comes with little helpers as you can see in the screenshot below:

You can see that on the right side, there is a new schema “aws_oracle_ext” that holds all these objects:

To actually apply the schema to the target database you have to tell SCT to do so:

Again, you will need to check the issues on the target side:

By connecting to the Aurora instance with psql you can actually confirm that the objects have been created:

postgres# \dn
        List of schemas
        Name        |  Owner   
--------------------+----------
 aws_oracle_context | postgres
 aws_oracle_data    | postgres
 aws_oracle_ext     | postgres
 aws_postgis        | postgres
 hr                 | postgres
 public             | postgres
(6 rows)

postgres=> select * from hr.employees;
 employee_id | first_name | last_name | email | phone_number | hire_date | job_id | salary | commission_pct | manager_id | department_id 
-------------+------------+-----------+-------+--------------+-----------+--------+--------+----------------+------------+---------------
(0 rows)

Quite helpful is the report you can generate for getting an idea how many objects can be converted without any issue and how many of them need manual action:

Before proceeding further with loading the data using AWS DMS all the issues reported by SCT should be fixed, like this one:

That’s it for this post. The AWS Schema Conversion Tool in general does a good job. Even functions and procedures are migrated but of course not everything can be done automatically and some of the features/implementations that Oracle provides need to be re-implemented/adjusted on the AWS Aurora side.

In the next post we’ll look at the actual data migration using AWS DMS.

Cet article Migrating an Oracle instance to AWS Aurora – 2– The AWS Schema Conversion Tool (SCT) est apparu en premier sur Blog dbi services.

Can I put my temporary tablespaces on a RAM disk with PostgreSQL?

Wed, 2020-03-18 02:20

The short answer to the title of this blog post is: yes, of course you can. The more important question is: should you? The PostgreSQL documentation about tablespaces contains an explicit warning: “Placing a tablespace on a temporary file system like a RAM disk risks the reliability of the entire cluster”. This is for sure true for tablespaces containing persistent data, but what about objects/files that are created in a temporary tablespace like temporary tables or sorts that are going to disk? Does the warning from the documentation apply to these as well? You can check the last blog about temporary tablespaces for getting an idea what actually goes to temporary tablespaces in PostgreSQL.

On Linux a RAM disk can be created quite easily:

postgres@centos8pg:/home/postgres/ [pgdev] sudo mkfs  /dev/my_ram_disk 81920
mke2fs 1.44.6 (5-Mar-2019)
Creating regular file /dev/my_ram_disk
Creating filesystem with 81920 1k blocks and 20480 inodes
Filesystem UUID: ecc6d90e-2d59-47f8-a598-7726c309c389
Superblock backups stored on blocks: 
        8193, 24577, 40961, 57345, 73729

Allocating group tables: done                            
Writing inode tables: done                            
Writing superblocks and filesystem accounting information: done 

Of course that needs to be mounted somewhere so that we can put a temporary tablespace on it:

postgres@centos8pg:/home/postgres/ [pgdev] sudo mkdir /my_ram_disk
postgres@centos8pg:/home/postgres/ [pgdev] sudo mount /dev/my_ram_disk /my_ram_disk/
postgres@centos8pg:/home/postgres/ [pgdev] df -h | grep my_ram
/dev/loop0            78M  1.6M   72M   3% /my_ram_disk
postgres@centos8pg:/home/postgres/ [pgdev] sudo chown postgres:postgres /my_ram_disk

I am going to create a new PostgreSQL cluster from scratch:

postgres@centos8pg:/home/postgres/ [pgdev] initdb -D /var/tmp/pg
postgres@centos8pg:/home/postgres/ [pgdev] export PGPORT=8888
postgres@centos8pg:/home/postgres/ [pgdev] pg_ctl -D /var/tmp/pg/ start -l /dev/null

Preparing the temporary tablespace and setting it as the default:

postgres@centos8pg:/home/postgres/ [pgdev] psql -X -p 8888 postgres
psql (13devel)
Type "help" for help.

postgres=# \! mkdir /my_ram_disk/tbs/
postgres=# create tablespace my_temp_tbs location '/my_ram_disk/tbs/';
CREATE TABLESPACE
postgres=# \db+
                                       List of tablespaces
    Name     |  Owner   |     Location     | Access privileges | Options |  Size   | Description 
-------------+----------+------------------+-------------------+---------+---------+-------------
 my_temp_tbs | postgres | /my_ram_disk/tbs |                   |         | 0 bytes | 
 pg_default  | postgres |                  |                   |         | 22 MB   | 
 pg_global   | postgres |                  |                   |         | 559 kB  | 
(3 rows)

postgres=# alter system set temp_tablespaces = 'my_temp_tbs';
ALTER SYSTEM
postgres=# select pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)

postgres=# show temp_tablespaces ;
 temp_tablespaces 
------------------
 my_temp_tbs
(1 row)

Creating a new temporary table should now create the underlying file on the RAM disk:

postgres=# create temporary table tmp1 ( a int, b date );
CREATE TABLE
postgres=# select pg_relation_filepath('tmp1');
              pg_relation_filepath              
------------------------------------------------
 pg_tblspc/16384/PG_13_202003051/12732/t3_16387
(1 row)
postgres-# \! ls -la /var/tmp/pg/pg_tblspc/
total 4
drwx------.  2 postgres postgres   19 Mar 16 21:32 .
drwx------. 19 postgres postgres 4096 Mar 16 21:32 ..
lrwxrwxrwx.  1 postgres postgres   16 Mar 16 21:32 16384 -> /my_ram_disk/tbs

Everything as expected. What could possibly go wrong with that? Will PostgreSQL start just fine if we put data in that table and then crash the postmaster?

postgres=# insert into tmp1 select * from generate_series(1,10000);
INSERT 0 10000
postgres=# \! ps -ef | grep postgres | grep "var/tmp"
postgres 25900     1  0 21:31 ?        00:00:00 /u01/app/postgres/product/DEV/db_1/bin/postgres -D /var/tmp/pg
postgres 25992 25911  0 21:47 pts/0    00:00:00 sh -c ps -ef | grep postgres | grep "var/tmp"
postgres 25995 25992  0 21:47 pts/0    00:00:00 grep var/tmp
postgres=# \! kill -9 25900
postgres=# select 1;
FATAL:  terminating connection due to unexpected postmaster exit
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!?> select 1;
You are currently not connected to a database.
!?> 

Can we start normally from here on?

postgres@centos8pg:/home/postgres/ [pgdev] pg_ctl -D /var/tmp/pg/ start
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2020-03-16 21:49:33.034 CET [26010] LOG:  starting PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.3.1 20190507 (Red Hat 8.3.1-4), 64-bit
2020-03-16 21:49:33.035 CET [26010] LOG:  listening on IPv6 address "::1", port 8888
2020-03-16 21:49:33.035 CET [26010] LOG:  listening on IPv4 address "127.0.0.1", port 8888
2020-03-16 21:49:33.037 CET [26010] LOG:  listening on Unix socket "/tmp/.s.PGSQL.8888"
2020-03-16 21:49:33.041 CET [26011] LOG:  database system was interrupted; last known up at 2020-03-16 21:36:18 CET
2020-03-16 21:49:33.263 CET [26011] LOG:  database system was not properly shut down; automatic recovery in progress
2020-03-16 21:49:33.264 CET [26011] LOG:  redo starts at 0/156D588
2020-03-16 21:49:33.264 CET [26011] LOG:  invalid record length at 0/1574240: wanted 24, got 0
2020-03-16 21:49:33.264 CET [26011] LOG:  redo done at 0/1574068
2020-03-16 21:49:33.277 CET [26010] LOG:  database system is ready to accept connections
 done
server started

All is fine. Of course the temporary table is gone but that would also have been the case if we just ended our session instead of killing the postmaster:

postgres@centos8pg:/home/postgres/ [pgdev] psql -X -p 8888 postgres
psql (13devel)
Type "help" for help.

postgres=# \d tmp1
Did not find any relation named "tmp1".
postgres=# \db+
                                        List of tablespaces
    Name     |  Owner   |     Location     | Access privileges | Options |    Size    | Description 
-------------+----------+------------------+-------------------+---------+------------+-------------
 my_temp_tbs | postgres | /my_ram_disk/tbs |                   |         | 1024 bytes | 
 pg_default  | postgres |                  |                   |         | 22 MB      | 
 pg_global   | postgres |                  |                   |         | 559 kB     | 
(3 rows)

postgres=# 

Sort operations will just work fine on that tablespace as well:

postgres=# select * from generate_series(1,1000000) order by random();
 generate_series 
-----------------
          943370
          301661
...

… but you need to be careful with the size of the RAM disk:

postgres=# select * from generate_series(1,5000000) order by random();
2020-03-16 22:04:47.738 CET [26026] ERROR:  could not write block 629 of temporary file: No space left on device
2020-03-16 22:04:47.738 CET [26026] STATEMENT:  select * from generate_series(1,5000000) order by random();
ERROR:  could not write block 629 of temporary file: No space left on device

So actually you can put a temporary tablespace on a RAM disk and I am not aware of serious issues, even if you lose the RAM disk that can easily be fixed:

postgres@centos8pg:/home/postgres/ [pgdev] sudo umount /my_ram_disk 
postgres@centos8pg:/home/postgres/ [pgdev] pg_ctl -D /var/tmp/pg/ restart
waiting for server to shut down....2020-03-16 22:06:23.118 CET [26010] LOG:  received fast shutdown request
2020-03-16 22:06:23.124 CET [26010] LOG:  aborting any active transactions
2020-03-16 22:06:23.133 CET [26010] LOG:  background worker "logical replication launcher" (PID 26017) exited with exit code 1
2020-03-16 22:06:23.133 CET [26012] LOG:  shutting down
2020-03-16 22:06:23.148 CET [26010] LOG:  database system is shut down
 done
server stopped
waiting for server to start....2020-03-16 22:06:23.281 CET [26236] LOG:  starting PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.3.1 20190507 (Red Hat 8.3.1-4), 64-bit
2020-03-16 22:06:23.282 CET [26236] LOG:  listening on IPv6 address "::1", port 8888
2020-03-16 22:06:23.282 CET [26236] LOG:  listening on IPv4 address "127.0.0.1", port 8888
2020-03-16 22:06:23.286 CET [26236] LOG:  listening on Unix socket "/tmp/.s.PGSQL.8888"
2020-03-16 22:06:23.289 CET [26236] LOG:  could not open directory "pg_tblspc/16384/PG_13_202003051": No such file or directory
2020-03-16 22:06:23.293 CET [26237] LOG:  database system was shut down at 2020-03-16 22:06:23 CET
2020-03-16 22:06:23.293 CET [26237] LOG:  could not open directory "pg_tblspc/16384/PG_13_202003051": No such file or directory
2020-03-16 22:06:23.299 CET [26236] LOG:  database system is ready to accept connections
 done
server started

PostreSQL will complain but it will start and once the RAM disk is avaialble again there is business as usual:

postgres@centos8pg:/home/postgres/ [pgdev] sudo mount /dev/my_ram_disk /my_ram_disk/
22:07:33 postgres@centos8pg:/home/postgres/ [pgdev] psql -X -p 8888 postgres
psql (13devel)
Type "help" for help.

postgres=# create temporary table tmp1 ( a int );
CREATE TABLE
postgres=# \! ls -la /my_ram_disk/
total 19
drwxr-xr-x.  4 postgres postgres  1024 Mar 16 21:31 .
dr-xr-xr-x. 23 root     root      4096 Mar 16 21:28 ..
drwx------.  2 root     root     12288 Mar 16 21:27 lost+found
drwx------.  3 postgres postgres  1024 Mar 16 21:32 tbs
postgres=# \! ls -la /my_ram_disk/tbs/
total 5
drwx------. 3 postgres postgres 1024 Mar 16 21:32 .
drwxr-xr-x. 4 postgres postgres 1024 Mar 16 21:31 ..
drwx------. 4 postgres postgres 1024 Mar 16 22:02 PG_13_202003051
postgres=# 

If you know any issues with that or have any comments I would be happy if you can share your thoughts.

Another option would by to use tmpfs:

postgres@centos8pg:/home/postgres/ [pgdev] sudo mkdir /my_ram_disk_2/
postgres@centos8pg:/home/postgres/ [pgdev] sudo mount -t tmpfs -o size=2G tmpfs /my_ram_disk_2/
postgres@centos8pg:/home/postgres/ [pgdev] df -h | grep disk_2
tmpfs                2.0G     0  2.0G   0% /my_ram_disk_2

This can be used for temporary tablespaces as well:

postgres@centos8pg:/home/postgres/ [pgdev] psql -X -p 8888 postgres
psql (13devel)
Type "help" for help.

postgres=# \! sudo chown postgres:postgres /my_ram_disk_2/
postgres=# \! mkdir /my_ram_disk_2/tbs2
postgres=# create tablespace my_temp_tbs2 location '/my_ram_disk_2/tbs2';
CREATE TABLESPACE
postgres=# \db+
                                          List of tablespaces
     Name     |  Owner   |      Location       | Access privileges | Options |    Size    | Description 
--------------+----------+---------------------+-------------------+---------+------------+-------------
 my_temp_tbs  | postgres | /my_ram_disk/tbs    |                   |         | 2048 bytes | 
 my_temp_tbs2 | postgres | /my_ram_disk_2/tbs2 |                   |         | 0 bytes    | 
 pg_default   | postgres |                     |                   |         | 22 MB      | 
 pg_global    | postgres |                     |                   |         | 559 kB     | 
(4 rows)

postgres=# alter system set temp_tablespaces = 'my_temp_tbs2';
ALTER SYSTEM
postgres=# select pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)

postgres=# show temp_tablespaces ;
 temp_tablespaces 
------------------
 my_temp_tbs2
(1 row)

Same test as above:

postgres=# create temporary table tmp3 ( a int );
CREATE TABLE
postgres=# insert into tmp3 select * from generate_series(1,10000);
INSERT 0 10000
postgres=# select pg_relation_filepath('tmp3');
              pg_relation_filepath              
------------------------------------------------
 pg_tblspc/24583/PG_13_202003051/12732/t3_24587
(1 row)

postgres=# \! ls -la /var/tmp/pg/pg_tblspc/
total 4
drwx------.  2 postgres postgres   32 Mar 16 22:31 .
drwx------. 19 postgres postgres 4096 Mar 16 22:32 ..
lrwxrwxrwx.  1 postgres postgres   16 Mar 16 21:32 16384 -> /my_ram_disk/tbs
lrwxrwxrwx.  1 postgres postgres   19 Mar 16 22:31 24583 -> /my_ram_disk_2/tbs2
postgres=# \! ps -ef | grep postgres | grep "var/tmp"
postgres 26236     1  0 22:06 ?        00:00:00 /u01/app/postgres/product/DEV/db_1/bin/postgres -D /var/tmp/pg
postgres 26412 26379  0 22:35 pts/0    00:00:00 sh -c ps -ef | grep postgres | grep "var/tmp"
postgres 26415 26412  0 22:35 pts/0    00:00:00 grep var/tmp
postgres=# \! kill -9 26236
postgres=# select 1;
FATAL:  terminating connection due to unexpected postmaster exit
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!?> select 1;
You are currently not connected to a database.
!?> 

PostgreSQL will start just fine:

postgres@centos8pg:/home/postgres/ [pgdev] pg_ctl -D /var/tmp/pg/ start
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2020-03-16 22:37:44.903 CET [26431] LOG:  starting PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.3.1 20190507 (Red Hat 8.3.1-4), 64-bit
2020-03-16 22:37:44.903 CET [26431] LOG:  listening on IPv6 address "::1", port 8888
2020-03-16 22:37:44.903 CET [26431] LOG:  listening on IPv4 address "127.0.0.1", port 8888
2020-03-16 22:37:44.906 CET [26431] LOG:  listening on Unix socket "/tmp/.s.PGSQL.8888"
2020-03-16 22:37:44.910 CET [26432] LOG:  database system was interrupted; last known up at 2020-03-16 22:31:24 CET
2020-03-16 22:37:45.172 CET [26432] LOG:  database system was not properly shut down; automatic recovery in progress
2020-03-16 22:37:45.173 CET [26432] LOG:  redo starts at 0/15A1EC8
2020-03-16 22:37:45.174 CET [26432] LOG:  invalid record length at 0/15B9E88: wanted 24, got 0
2020-03-16 22:37:45.174 CET [26432] LOG:  redo done at 0/15B9CD0
2020-03-16 22:37:45.195 CET [26431] LOG:  database system is ready to accept connections
 done
server started

To conclude: I am not sure if I would do this in real life but it seems to work quite well. As long as no persistent objects go into these tablespaces all should be fine. I’ve asked the mailing list for any experience with this, you can follow that here.

Cet article Can I put my temporary tablespaces on a RAM disk with PostgreSQL? est apparu en premier sur Blog dbi services.

Migrating an Oracle instance to AWS Aurora – 1 – Setting up the base infrastructure

Tue, 2020-03-17 10:24

Migrating database to the cloud is a hot topic since a few years. As more and more of our customers are in the cloud, and some of them in AWS, migrating some of their Oracle databases to AWS Aurora becomes a topic from time to time. In this little blog series we’ll have a look at how you can use the AWS Database Migration Service (DMS) and the AWS Schema Conversion Tool (SCT) for simplifying such a task. In this very first post we’ll setup the basic infrastructure we need for the demo.

I’ll be using Terraform once again for bringing up all the basic stuff. For the purpose of VPCs, Subnets, route tables and Security Groups please check the AWS documentation, this is not in the scope of this post.

The first thing to do in the Terraform script is to specify the AWS profile and region we want to use:

// set the provider to AWS and the AWS region to eu-central-1
provider "aws" {
  profile    = "test"
  region     = "eu-central-1"
}

After that there are two variables: the first one defines my local IP-address which is used in the security group definitions below, so connections via SSH and RDP will be possible from my current location. The second one defines the User data that will be passed to the EC2 instance that will host the Oracle source database. Basically it installs the Oracle Database Express Edition (XE) Release 18.4.0.0.0 (18c) and the Oracle sample schemas:

locals {
  my_ip        = ["XXX.XXX.XXX.XXX/32"]
  instance-userdata = <<EOF
#!/bin/bash
sudo yum update -y
sudo yum install -y wget perl
wget https://download.oracle.com/otn-pub/otn_software/db-express/oracle-database-xe-18c-1.0-1.x86_64.rpm -O /home/ec2-user/oracle-database-xe-18c-1.0-1.x86_64.rpm
wget https://yum.oracle.com/repo/OracleLinux/OL7/latest/x86_64/getPackage/oracle-database-preinstall-18c-1.0-1.el7.x86_64.rpm -O /home/ec2-user/oracle-database-preinstall-18c-1.0-1.el7.x86_64.rpm
wget wget http://mirror.centos.org/centos/7/os/x86_64/Packages/compat-libstdc++-33-3.2.3-72.el7.i686.rpm -O /home/ec2-user/compat-libstdc++-33-3.2.3-72.el7.i686.rpm
sudo yum localinstall -y /home/ec2-user/compat-libstdc++-33-3.2.3-72.el7.i686.rpm
sudo yum localinstall -y /home/ec2-user/oracle-database-preinstall-18c-1.0-1.el7.x86_64.rpm
sudo yum localinstall -y /home/ec2-user/oracle-database-xe-18c-1.0-1.x86_64.rpm
(echo "manager"; echo "manager";) | /etc/init.d/oracle-xe-18c configure
sudo echo ORACLE_HOME=/opt/oracle/product/18c/dbhomeXE/ >> /home/oracle/.bash_profile
sudo echo PATH=\$PATH:\$ORACLE_HOME/bin >> /home/oracle/.bash_profile
sudo echo ORACLE_SID=xe >> /home/oracle/.bash_profile
sudo echo export ORACLE_HOME PATH ORACLE_SID >> /home/oracle/.bash_profile
wget https://github.com/oracle/db-sample-schemas/archive/v19.2.tar.gz -O /home/oracle/v19.2.tar.gz
sudo su - oracle -c "tar -axf v19.2.tar.gz"
sudo su - oracle -c "cd db-sample-schemas-19.2; perl -p -i.bak -e 's#__SUB__CWD__#/home/oracle/db-sample-schemas-19.2#g' *.sql */*.sql */*.dat"
sudo su - oracle -c "cd db-sample-schemas-19.2; sqlplus system/manager@localhost/XEPDB1 @mksample manager manager manager manager manager manager manager manager users temp /tmp/ localhost/XEPDB1"
chkconfig --add oracle-xe-18c
EOF
}

The next lines of the Terraform script will setup all the network related stuff which I am not going to explain here:

// create the virtual private network
resource "aws_vpc" "dwe-vpc" {
  cidr_block = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support = true
  
  tags = {
    Name = "dwe-vpc"
  }
}

// create the internet gateway
resource "aws_internet_gateway" "dwe-igw" {
  vpc_id = "${aws_vpc.dwe-vpc.id}"

  tags = {
    Name = "dwe-igw"
  }
}

// create a dedicated subnet
resource "aws_subnet" "dwe-subnet" {
  vpc_id            = "${aws_vpc.dwe-vpc.id}"
  cidr_block        = "10.0.1.0/24"
  availability_zone = "eu-central-1a"

  tags = {
    Name = "dwe-subnet"
  }
}

// create a second dedicated subnet, this is required for RDS
resource "aws_subnet" "dwe-subnet-2" {
  vpc_id            = "${aws_vpc.dwe-vpc.id}"
  cidr_block        = "10.0.2.0/24"
  availability_zone = "eu-central-1b"

  tags = {
    Name = "dwe-subnet-2"
  }
}


// create routing table which points to the internet gateway
resource "aws_route_table" "dwe-route" {
  vpc_id = "${aws_vpc.dwe-vpc.id}"

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = "${aws_internet_gateway.dwe-igw.id}"
  }

  tags = {
    Name = "dwe-igw"
  }
}

// associate the routing table with the subnet
resource "aws_route_table_association" "subnet-association" {
  subnet_id      = "${aws_subnet.dwe-subnet.id}"
  route_table_id = "${aws_route_table.dwe-route.id}"
}

// create a security group for ssh access to the linux systems
resource "aws_security_group" "dwe-sg-ssh" {
  name        = "dwe-sg-ssh"
  description = "Allow SSH inbound traffic"
  vpc_id      = "${aws_vpc.dwe-vpc.id}"

  ingress {
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = local.my_ip
  }

  // allow access to the internet
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "dwe-sg-ssh"
  }
}

// create a security group for rdp access to the windows systems
resource "aws_security_group" "dwe-sg-rdp" {
  name        = "dwe-sg-rdp"
  description = "Allow RDP inbound traffic"
  vpc_id      = "${aws_vpc.dwe-vpc.id}"

  ingress {
    from_port   = 3389
    to_port     = 3389
    protocol    = "tcp"
    cidr_blocks = local.my_ip
  }

  // allow access to the internet
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "dwe-sg-rdp"
  }
}

Once the network is ready we’ll deploy the EC2 instance that will run the Oracle database (Red Hat 7.7 in this case):

// setup a red hat 7 system for the oracle source
resource "aws_instance" "dwe-oracle-source" {
  ami                         = "ami-05798e9b15f285b27"
  instance_type               = "t2.medium"
  key_name                    = "dwe-key-pair"
  vpc_security_group_ids      = ["${aws_security_group.dwe-sg-ssh.id}"]
  subnet_id                   = "${aws_subnet.dwe-subnet.id}"
  associate_public_ip_address = "true"
  user_data                   = "${base64encode(local.instance-userdata)}"

  root_block_device {
    volume_size           = "30"
    volume_type           = "standard"
    delete_on_termination = "true"
  }

  tags = {
    Name = "dwe-oracle-source"
  }
}

As the target for the migration will be an Aurora cluster we will need that as well. This are basically three steps:

// create the subnet group for RDS instance
resource "aws_db_subnet_group" "dwe-rds-subnet-group" {
    name = "dwe-rds-subnet-group"
    subnet_ids = [
        "${aws_subnet.dwe-subnet.id}",
        "${aws_subnet.dwe-subnet-2.id}"
    ]
}

// create the RDS cluster
resource "aws_rds_cluster" "aws_rds_cluster_dwe" {
    backup_retention_period = "7"
    cluster_identifier = "aurora-dwe"
    db_cluster_parameter_group_name = "default.aurora-postgresql10"
    db_subnet_group_name = "${aws_db_subnet_group.dwe-rds-subnet-group.id}"
    deletion_protection = "false"
    engine = "aurora-postgresql"
    engine_mode = "provisioned"
    engine_version = "10.11"
    master_password = "manager123"
    master_username = "postgres"
    port = "5432"
    skip_final_snapshot = true
}

// create the RDS instance
resource "aws_rds_cluster_instance" "aws_db_instance_dwe" {
    auto_minor_version_upgrade = "true"
    publicly_accessible = "false"
    monitoring_interval = "0"
    instance_class = "db.r5.large"
    cluster_identifier = "${aws_rds_cluster.aws_rds_cluster_dwe.id}"    
    identifier = "aurora-1-instance-1"
    db_subnet_group_name = "${aws_db_subnet_group.dwe-rds-subnet-group.id}"
    engine = "aurora-postgresql"
    engine_version = "10.11"
}

For running the AWS Schema Conversion Tool we’ll finally setup a Windows instance so we are able to connect via RDP and install the AWS Schema Conversion Tool in the next post:

// create a windows instance for the AWS SCT
resource "aws_instance" "dwe-oracle-sct" {
  ami           = "ami-0cc2a6842e0da929f"
  instance_type = "t2.micro"
  key_name                    = "dwe-key-pair"
  vpc_security_group_ids      = ["${aws_security_group.dwe-sg-rdp.id}"]
  subnet_id                   = "${aws_subnet.dwe-subnet.id}"
  associate_public_ip_address = "true"

  root_block_device { 
    volume_size           = "30"
    volume_type           = "standard"
    delete_on_termination = "true"
  }

  tags = {
    Name = "dwe-oracle-sct"
  }
}

Put all these steps together in a file and let Terraform do the work for you:

$ terraform init
$ terrafrom plan
$ terraform apply

The whole Oracle stuff will take some time to complete as downloading the Oracle XE rpm and the installation is nothing you can do in seconds.
In the next post we’ll look at the AWS Schema Conversion Utility and how that can be used to convert an Oracle schema to an AWS Aurora with PostgreSQL compatibility schema.

Cet article Migrating an Oracle instance to AWS Aurora – 1 – Setting up the base infrastructure est apparu en premier sur Blog dbi services.

AWS Automation – Stop a RDS instance

Mon, 2020-03-16 12:17
Introduction

I created a RDS SQL Server instance on AWS. I use it for test purpose and to minimize the costs I needed to ensure that my instance is stopped automatically at the end of the working day and also because on RDS instances are automatically activated when its schedule maintenance is triggered.

In this article I will share the solution I implemented to achieve my goal.
I assume you have already created a RDS instance.

I also mentioned that during the walkthrough even some parameters are optional like name and description, I recommend to set it for a better clarity of your environment.

So first login to your AWS console and let you walkthrough

1- Preparation of your RDS Instance

Open the RDS service console and select your RDS instance
Adding tags on the RDS instance is the first step of the process, it will be used later
Click Add and key in a Tag Key and a Tag Value

2- Create a policy

Now open the IAM service, select the Policies section and create a new policy

Select the RDS service, typing RDS in the search textbox and selecting RDS in the result list

Then choose the action by typing StopDBInstance in the search textbox and select the it in the result list.

Type now DescribeDBInstance in the search textbox and select it in the result list

Check if you have both actions DescribeDBInstance and StopDBInstance listed and select All resources in the Resources part. You can also select specific DB instances if you will.
Than click Review Policy button

Key in a name for your policy and a description (optional) and click Create Policy

So we are done with the security policy creation, we can proceed with the next step

3- Create a role

Now we have to create a role that will be granted with the policy we just created. The role will be use by System Manager to execute the needed task.

Again in the IAM service console, chose the Role option and create a new role.

Select AWS service in the type of trusted entity and select EC2 in the Use Case list

In the Select your use case chose EC2 again and click on Next:Permissions button

In the search text box type AmazonSSMMaintenanceWindowRole and select it in the list

Repeat permission selection selecting the security policy you created (in my case secpol_RDS_StopDBInstance) and click Next:Tags and Next:Review on the following screen

Enter a role name and a description (optional), check the policies and click Create Role

On the Role screen, notice the information that your role has been created and make sure to see it in the list.
You can also notice that the Trusted entities for your role is EC2

As we want to use the role in SSM we have to change the Trusted entity accordingly.
To achieve that, click on the role you just created, select the Trust relationships tab and click and Edit trust relationship.

You can see that the service is ec2.amazomaws.com.

Change the service replacing ec2 with SSM and click on Update Trust Policy

We are done concerning the IAM service

4- Create a resource group

Now switch to the System Manager service
Click on the menu Services, type System Manager in the search textbox and select System Manager

Select Resource Groups in the left pane and click on Create resource Group button.

Chose the Tag based option, search for DBInstance and select it as resource type, than search for the tag set on your RDS instance (in my case tag_RDSInstance) and its value (in my case RDS_awsvdata. Enter than a name and a description and click on Create group.

5- Create maintenance windows

Switch back on the System Manager service home screen and select Maintenance Windows on the left pane and click on the Create maintenance Window button.

Capture a name and a description

Define your schedule when your maintenance window must run. Do not omit to specify your time zone to avoid your maintenance window to execute at an unexpected time. Click on the Create maintenance window button.

Check if your maintenance window has been created and it appears in the list. Check has well that the next execution time match the expected schedule.

Select the maintenance window you just created and select target tab and click on Register target button

In the Register target screen, defined a name a description and defined the target selecting the Choose a resource group option and selecting the resource group created in point 4. Select also your resource types if you have other resources than DBInstance in your resource group. Click register target button.

Next step is to register an Automation task. Select again your maintenance window and chose the Tasks tab and click on Register task selecting the Register Automation task.

Set a name and a description.

Select the Automation document, scrolling in the pages to find and select AWS-StopRDSInstance

Define the target selecting the registered target group and the one you created previously. Set the rate control according to your need and constrains.

Then important is to defined the IAM service role selecting the role you created in point 3.

In the input parameter use enter the ID of your RDS database instance.
Finally click the Register Automation task.

6- Test you Maintenance Window

If you can stop your instance whenever you want, adjust your Cron settings to a near time and check the history of your maintenance window.
After the schedule time check the maintenance window history

Conclusion

There are indeed some tricks to go through and to know in order to setup this but it is worth doing it to avoid costs surprises.
Hoping you enjoy reading

Cet article AWS Automation – Stop a RDS instance est apparu en premier sur Blog dbi services.

DynamoDB: adding a Local covering index to reduce the cost

Mon, 2020-03-16 10:28
By Franck Pachot

.
This is a continuation on the previous post on DynamoDB: adding a Global Covering Index to reduce the cost. I have a DynamoDB partitioned on “MyKeyPart”,”MyKeySort” and I have many queries that retrieve a small “MyIndo001” attribute. And less frequent ones needing the large “MyData001” attribute. I have created a Global Secondary Index (GSI) that covers the same key and this small attribute. Now, because the index is prefixed by the partition key, I can create a Local Secondary Index (LSI) to do the same. But there are many limitations. The first one is that I cannot add a local index afterwards. I need to define it at the table creation.

Drop table

Here I am in a lab so that I can drop and re-create the table. In real live you may have to create a new one, copy the items, synchronize it (DynamoDB Stream),…


aws dynamodb delete-table --table-name Demo
while aws --output json dynamodb describe-table --table-name Demo | grep '"TableStatus": "DELETING"' ; do sleep 1 ; done 2>/dev/null | awk '{printf "."}END{print}'

-------------------------------------------------------------------------------
|                                 DeleteTable                                 |
+-----------------------------------------------------------------------------+
||                             TableDescription                              ||
|+----------------+----------------------------------------------------------+|
||  ItemCount     |  0                                                       ||
||  TableArn      |  arn:aws:dynamodb:eu-central-1:802756008554:table/Demo   ||
||  TableId       |  d8238050-556c-4158-85e8-bf90d483d9e2                    ||
||  TableName     |  Demo                                                    ||
||  TableSizeBytes|  0                                                       ||
||  TableStatus   |  DELETING                                                ||
|+----------------+----------------------------------------------------------+|
|||                          ProvisionedThroughput                          |||
||+-----------------------------------------------------------+-------------+||
|||  NumberOfDecreasesToday                                   |  0          |||
|||  ReadCapacityUnits                                        |  25         |||
|||  WriteCapacityUnits                                       |  25         |||
||+-----------------------------------------------------------+-------------+||
..        "TableStatus": "DELETING",
Re-create table

I use the same create-table where I add the definition for a “DemoLSI” local index that includes “MyInfo001”:


aws dynamodb create-table \
 --attribute-definitions \
  AttributeName=MyKeyPart,AttributeType=N \
  AttributeName=MyKeySort,AttributeType=N \
 --key-schema \
  AttributeName=MyKeyPart,KeyType=HASH \
  AttributeName=MyKeySort,KeyType=RANGE \
 --billing-mode PROVISIONED \
 --provisioned-throughput ReadCapacityUnits=25,WriteCapacityUnits=25 \
 --local-secondary-indexes ' [{
  "IndexName": "DemoLSI",
  "KeySchema":[
   {"AttributeName":"MyKeyPart","KeyType":"HASH"},
   {"AttributeName":"MyKeySort","KeyType":"RANGE"}
  ],
  "Projection":{"ProjectionType":"INCLUDE","NonKeyAttributes":["MyInfo001"]}
  }] ' \
 --table-name Demo

while aws --output json dynamodb describe-table --table-name Demo | grep '"TableStatus": "CREATING"' ; do sleep 1 ; done 2>/dev/null | awk '{printf "."}END{print}'

Here is the output:


-----------------------------------------------------------------------------------------------
|                                         CreateTable                                         |
+---------------------------------------------------------------------------------------------+
||                                     TableDescription                                      ||
|+----------------------+--------------------------------------------------------------------+|
||  CreationDateTime    |  1584347785.29                                                     ||
||  ItemCount           |  0                                                                 ||
||  TableArn            |  arn:aws:dynamodb:eu-central-1:802756008554:table/Demo             ||
||  TableId             |  ebc9488a-fc5f-4022-b947-72b733784a6a                              ||
||  TableName           |  Demo                                                              ||
||  TableSizeBytes      |  0                                                                 ||
||  TableStatus         |  CREATING                                                          ||
|+----------------------+--------------------------------------------------------------------+|
|||                                  AttributeDefinitions                                   |||
||+-------------------------------------------+---------------------------------------------+||
|||               AttributeName               |                AttributeType                |||
||+-------------------------------------------+---------------------------------------------+||
|||  MyKeyPart                                |  N                                          |||
|||  MyKeySort                                |  N                                          |||
||+-------------------------------------------+---------------------------------------------+||
|||                                        KeySchema                                        |||
||+-----------------------------------------------------+-----------------------------------+||
|||                    AttributeName                    |              KeyType              |||
||+-----------------------------------------------------+-----------------------------------+||
|||  MyKeyPart                                          |  HASH                             |||
|||  MyKeySort                                          |  RANGE                            |||
||+-----------------------------------------------------+-----------------------------------+||
|||                                  LocalSecondaryIndexes                                  |||
||+----------------+------------------------------------------------------------------------+||
|||  IndexArn      |  arn:aws:dynamodb:eu-central-1:802756008554:table/Demo/index/DemoLSI   |||
|||  IndexName     |  DemoLSI                                                               |||
|||  IndexSizeBytes|  0                                                                     |||
|||  ItemCount     |  0                                                                     |||
||+----------------+------------------------------------------------------------------------+||
||||                                       KeySchema                                       ||||
|||+----------------------------------------------------+----------------------------------+|||
||||                    AttributeName                   |             KeyType              ||||
|||+----------------------------------------------------+----------------------------------+|||
||||  MyKeyPart                                         |  HASH                            ||||
||||  MyKeySort                                         |  RANGE                           ||||
|||+----------------------------------------------------+----------------------------------+|||
||||                                      Projection                                       ||||
|||+-----------------------------------------------------+---------------------------------+|||
||||  ProjectionType                                     |  INCLUDE                        ||||
|||+-----------------------------------------------------+---------------------------------+|||
|||||                                  NonKeyAttributes                                   |||||
||||+-------------------------------------------------------------------------------------+||||
|||||  MyInfo001                                                                          |||||
||||+-------------------------------------------------------------------------------------+||||
|||                                  ProvisionedThroughput                                  |||
||+------------------------------------------------------------------------+----------------+||
|||  NumberOfDecreasesToday                                                |  0             |||
|||  ReadCapacityUnits                                                     |  25            |||
|||  WriteCapacityUnits                                                    |  25            |||
||+------------------------------------------------------------------------+----------------+||
.....        "TableStatus": "CREATING",

What is very different from the global index is that here I didn’t specify read and write capacity for it. The RCU and WCU is counted within the table provisioned ones. That gives more agility in my case as I don’t have to think about the ratio of my two use cases (read from the index only or read the whole item from the table).

Put items

I’m using the same script as in the previous post to put 8 items here. What is different is the output as I can see one additional WCU for the index entry:


----------------------------------
|             PutItem            |
+--------------------------------+
||       ConsumedCapacity       ||
|+----------------+-------------+|
||  CapacityUnits |  TableName  ||
|+----------------+-------------+|
||  246.0         |  Demo       ||
|+----------------+-------------+|
|||    LocalSecondaryIndexes   |||
||+----------------------------+||
||||          DemoLSI         ||||
|||+------------------+-------+|||
||||  CapacityUnits   |  1.0  ||||
|||+------------------+-------+|||
|||            Table           |||
||+------------------+---------+||
|||  CapacityUnits   |  245.0  |||
||+------------------+---------+||
||     ItemCollectionMetrics    ||
|+------------------------------+|
|||      ItemCollectionKey     |||
||+----------------------------+||
||||         MyKeyPart        ||||
|||+------------+-------------+|||
||||  N         |  2          ||||
|||+------------+-------------+|||
|||     SizeEstimateRangeGB    |||
||+----------------------------+||
|||  0.0                       |||
|||  1.0                       |||
||+----------------------------+||
Query the local index

Again, as with the global one, the local index will be used only when explicitely mentioned. Reading only the small attributes, and mentioning only the table, still reads the whole item. Again, I read 4 items (all from the same partition key):


aws --output json dynamodb query \
 --key-condition-expression "MyKeyPart = :k" \
 --expression-attribute-values  '{":k":{"N":"1"}}' \
 --return-consumed-capacity TABLE \
 --return-consumed-capacity INDEXES \
 --select SPECIFIC_ATTRIBUTES \
 --projection-expression="MyInfo001" \
 --no-consistent-read \
 --table-name Demo
...
    "ScannedCount": 4,
    "ConsumedCapacity": {
        "CapacityUnits": 122.5,
        "TableName": "Demo",
        "Table": {
            "CapacityUnits": 122.5
        }
    }
}

4 items of 245KB is 122.5 RCUs.
Here is the same query mentioning the local index:

There, no RCU from the table and 0.5 from the index. This is exactly the same as what I had with the global index, and this is a similar way to optimize the queries that read only a small part of the item.

Consistent reads

One advatage of local indexes is that they support consistent reads (–consistent-read):


aws --output json dynamodb query \
 --key-condition-expression "MyKeyPart = :k" \
 --expression-attribute-values  '{":k":{"N":"1"}}' \
 --return-consumed-capacity TABLE \
 --return-consumed-capacity INDEXES \
 --select SPECIFIC_ATTRIBUTES \
 --projection-expression="MyKeyPart","MyKeySort","MyInfo001" \
 --consistent-read \
 --index-name DemoLSI \
 --table-name Demo
...
    "ScannedCount": 4,
    "ConsumedCapacity": {
        "CapacityUnits": 1.0,
        "TableName": "Demo",
        "LocalSecondaryIndexes": {
            "DemoLSI": {
                "CapacityUnits": 1.0
            }
        },
        "Table": {
            "CapacityUnits": 0.0
        }
    }
}

The cost doubles for consistent reads: 1 RCU for 4 items (smaller than 4KB) instead of 0.5 with eventual consistency.
Another advantage of local indexes is that it automatically get the full item transparently when asking for an additional attribute projection which is not in the index:


aws --output json dynamodb query \
 --key-condition-expression "MyKeyPart = :k" \
 --expression-attribute-values  '{":k":{"N":"1"}}' \
 --return-consumed-capacity TABLE \
 --return-consumed-capacity INDEXES \
 --select SPECIFIC_ATTRIBUTES \
 --projection-expression="MyKeyPart","MyKeySort","MyInfo001","MyData001" \
 --no-consistent-read \
 --index-name DemoLSI \
 --table-name Demo
...
    "ScannedCount": 4,
    "ConsumedCapacity": {
        "CapacityUnits": 124.5,
        "TableName": "Demo",
        "LocalSecondaryIndexes": {
            "DemoLSI": {
                "CapacityUnits": 0.5
            }
        },
        "Table": {
            "CapacityUnits": 124.0
        }
    }
}


There, I still have 0.5 RCU from the index and now 124 RC from the table. So it is transparent, a bit more expensive than querying directly the table in this case (was 122.5 RCU).
Again, I am in a case where I read all items from a partitioned key so the index is not useful for filtering the items. This is a special case for this demo on covering indexes.

“Item Collection Size Limit”

In the previous blog post as in the current one I always mentioned –return-item-collection-metrics with the put-item. This returned nothing when the table had no local index but you can see above that, with local indexes, each put-item returned:


||+------------------+---------+||
||     ItemCollectionMetrics    ||
|+------------------------------+|
|||      ItemCollectionKey     |||
||+----------------------------+||
||||         MyKeyPart        ||||
|||+------------+-------------+|||
||||  N         |  2          ||||
|||+------------+-------------+|||
|||     SizeEstimateRangeGB    |||
||+----------------------------+||
|||  0.0                       |||
|||  1.0                       |||
||+----------------------------+||

This is the size of the collection of items for MyKeyPart=2 and there’s a hard limit of 10GB. This happens when there are local indexes on the table. So, be certain to control the size of the item collection.

Because we chose DynamoDB for its scalability (the storage can grow) and availability (the structure can evolve without downtime), the limitations of Local Secondary Indexes (collection size limit and rebuild the table to create them) make them not very appealing and Global Secondary Indexes may be prefered even when both are possible.

Cet article DynamoDB: adding a Local covering index to reduce the cost est apparu en premier sur Blog dbi services.

DynamoDB: adding a Global covering index to reduce the cost

Mon, 2020-03-16 09:57
By Franck Pachot

.
People often think of indexes as a way to optimize row filtering (“get item” faster and cheaper). But indexes are also about columns (“attribute projection”) like some kind of vertical partitioning. In relational (“SQL”) databases we often add more columns to the indexed key. This is called “covering” or “including” indexes, to avoid reading the whole row. The same is true in NoSQL. I’ll show in this post how, even when an index is not required to filter the items, because the primary key partitioning is sufficient, we may have to create a secondary index to reduce the cost of partial access to the item. Here is an example with AWS DynamoDB where the cost depends on I/O throughput.

Create table

I create a Demo table with a generic name for the key columns: MyKeyPart is the partition key and MyKeySort is the sort key. This is a (HASH,RANGE) partitioning where rows (“items”) with same MyKeyPart and close MyKeySort are clustered together. Different MyKeyPart are scattered across the storage.
Those keys are numbers (AttributeType=N) and I have defined the provisioned read and write IOPS throughput (in RCU/WCU units) at their maximum for the free tier (25).


aws dynamodb create-table \
 --attribute-definitions \
  AttributeName=MyKeyPart,AttributeType=N \
  AttributeName=MyKeySort,AttributeType=N \
 --key-schema \
  AttributeName=MyKeyPart,KeyType=HASH \
  AttributeName=MyKeySort,KeyType=RANGE \
 --billing-mode PROVISIONED \
 --provisioned-throughput ReadCapacityUnits=25,WriteCapacityUnits=25 \
 --table-name Demo

---------------------------------------------------------------------------------
|                                  CreateTable                                  |
+-------------------------------------------------------------------------------+
||                              TableDescription                               ||
|+-------------------+---------------------------------------------------------+|
||  CreationDateTime |  1584271061.57                                          ||
||  ItemCount        |  0                                                      ||
||  TableArn         |  arn:aws:dynamodb:eu-central-1:802756008554:table/Demo  ||
||  TableId          |  557cd9b8-a739-4c4a-9aea-cdae4d8cc6c2                   ||
||  TableName        |  Demo                                                   ||
||  TableSizeBytes   |  0                                                      ||
||  TableStatus      |  CREATING                                               ||
|+-------------------+---------------------------------------------------------+|
|||                           AttributeDefinitions                            |||
||+------------------------------------+--------------------------------------+||
|||            AttributeName           |            AttributeType             |||
||+------------------------------------+--------------------------------------+||
|||  MyKeyPart                         |  N                                   |||
|||  MyKeySort                         |  N                                   |||
||+------------------------------------+--------------------------------------+||
|||                                 KeySchema                                 |||
||+---------------------------------------------+-----------------------------+||
|||                AttributeName                |           KeyType           |||
||+---------------------------------------------+-----------------------------+||
|||  MyKeyPart                                  |  HASH                       |||
|||  MyKeySort                                  |  RANGE                      |||
||+---------------------------------------------+-----------------------------+||
|||                           ProvisionedThroughput                           |||
||+-------------------------------------------------------------+-------------+||
|||  NumberOfDecreasesToday                                     |  0          |||
|||  ReadCapacityUnits                                          |  25         |||
|||  WriteCapacityUnits                                         |  25         |||
||+-------------------------------------------------------------+-------------+||

Here is how I wait for the creation to be completed because I like asynchronous operations but my mind is synchronous:


while aws --output json dynamodb describe-table --table-name Demo | grep '"TableStatus": "CREATING"' ; do sleep 1 ; done 2>/dev/null | awk '{printf "."}END{print}'

...        "TableStatus": "CREATING",
Put items

I’ll insert (“put”) 8 rows (“items”) which contain one small column (“attribute”): MyInfo001. And a large one (150000 random characters text, which is 245KB): MyData001. I build an “item.json” file to load them through “aws dynamodb put-item –item file://item.json”.

I have 2 different MyKeyPart with 4 different MyKeySort each. As many examples for DynamoDB are taken from retail (because it was built by Amazon for their e-commerce business) you can think of it as customer transactions. You want to distribute customers to many nodes for scalability and you want to get the history of each customer grouped and ordered.

Here is my script:


for CusID in {1..2} ; do
for Xdate in {1..4} ; do
cat > item.json <<CAT
{
 "MyKeyPart":{"N":"$CusID"},
 "MyKeySort":{"N":"$Xdate"},
 "MyInfo001":{"S":"$SECONDS"},
 "MyData001":{"S":"$(tr -cd "[:alnum:]"</dev/urandom|head -c250000)"}
}
CAT

du -h item.json
aws dynamodb put-item \
 --return-consumed-capacity TABLE \
 --return-consumed-capacity INDEXES \
 --return-item-collection-metrics SIZE \
 --item file://item.json\
 --table-name Demo
done
done

Each put-item execution returns:


----------------------------------
|             PutItem            |
+--------------------------------+
||       ConsumedCapacity       ||
|+----------------+-------------+|
||  CapacityUnits |  TableName  ||
|+----------------+-------------+|
||  245.0         |  Demo       ||
|+----------------+-------------+|
|||            Table           |||
||+------------------+---------+||
|||  CapacityUnits   |  245.0  |||
||+------------------+---------+||

Look at the consumed capacity for the table: 245 WCU. I “inserted” eight 245KB items. I’m in the default “standard” (aka non-transactional aka no ACID) write where 1KB is 1 WCU, and that is the “cost” of those writes: 245 WCU. This cost becomes money because you pay for the maximum throughput capacity. Here I didn’t see it with 8 items inserted in a few seconds. But my reserved capacity is 25 per second which means that if I continue to insert I’ll wait for retries and get ProvisionedThroughputExceededException after the retries are exhausted. So either I pay for more provisioned WCU (reserving them or through auto-scaling) or I choose on-demand, or I accept some write throttling affecting the response time, which has also a cost in your business.

So, basically, if you read and write fewer items and smaller items, you save money. Like in any IT system but here the performance metric is easily converted to dollars. There’s not a lot I can do here, as I have to insert those items, but let’s see the read scenario now.

Query without index

I have two read use-cases in my scenario. One is reading the customer transaction history with all info like the MyInfo001 attribute here. The second one is reading the detail for one transaction, like the large MyData001 attribute here. For both of them, I don’t need any index to access the specific items because I am partitioned on the filtering attributes. But, in NoSQL as in Relational databases, indexes are required not only to filter rows (row selection) but also to filter attributes (column projection). And this is what I’ll show here

I’m reading all attributes here (–select ALL_ATTRIBUTES) for one customer (–key-condition-expression “MyKeyPart = :k”) and evaluating the cost (-return-consumed-capacity):

aws --output json dynamodb query \
 --key-condition-expression "MyKeyPart = :k" \
 --expression-attribute-values  '{":k":{"N":"1"}}' \
 --return-consumed-capacity TABLE \
 --return-consumed-capacity INDEXES \
 --select ALL_ATTRIBUTES \
 --table-name Demo | cut -c 1-80
...
    "ScannedCount": 4,
    "ConsumedCapacity": {
        "CapacityUnits": 122.5,
        "TableName": "Demo",
        "Table": {
            "CapacityUnits": 122.5
        }
    }

The read cost in capacity units is 122.5 RCU. Why a fraction? Because reads are cheaper than writes and reading 8KB is only 1 RCU where writing it is 1 WCU for each KiloByte. All is “eventually consistent” read here, and would be the double (1 RCU for 4KB) with “strongly consistent” reads (because it needs to read 2 copies to get the quorum on the 3 copies). It can even double again (1 RCU for 2KB) for “transactional reads”. So, in this case, I’ve consumed 122.5 to read the 4 items.

Now if I do not query for the large MyData001 attribute, including only the MyInfo001 in the projection:

aws --output json dynamodb query \
 --key-condition-expression "MyKeyPart = :k" \
 --expression-attribute-values  '{":k":{"N":"1"}}' \
 --return-consumed-capacity TABLE \
 --return-consumed-capacity INDEXES \
 --select SPECIFIC_ATTRIBUTES \
 --projection-expression="MyKeySort","MyInfo001" \
 --table-name Demo

...
    "ScannedCount": 4,
    "ConsumedCapacity": {
        "CapacityUnits": 122.5,
        "TableName": "Demo",
        "Table": {
            "CapacityUnits": 122.5
        }
    }
}

The cost is still the same. Because even if the large MyData001 was not in my query projection I had to read the full item before discarding it.

I mentioned earlier that if you want strong consistency (–consistent-read) it is more expensive:

aws --output json dynamodb query \
 --key-condition-expression "MyKeyPart = :k" \
 --expression-attribute-values  '{":k":{"N":"1"}}' \
 --return-consumed-capacity TABLE \
 --return-consumed-capacity INDEXES \
 --select SPECIFIC_ATTRIBUTES \
 --projection-expression="MyKeySort","MyInfo001" \
 --consistent-read \
 --table-name Demo

...
    "ScannedCount": 4,
    "ConsumedCapacity": {
        "CapacityUnits": 245.0,
        "TableName": "Demo",
        "Table": {
            "CapacityUnits": 245.0
        }
    }
}

The read capacity unit has doubled to 245 RCU for strong consistency: 1RCU every 4KB and I had to read 4 items that are 245 KB each.

Global Secondary Index

I’ll show that creating an index can help with this. Technically, as I access by the partition key (MyKeyPart) I can create a Local Secondary Index. But this comes with some restrictions, like the need to re-create the table, and I’ll show it later. Creating a global index is much easier and then can be a solution even when prefixed by the partition key.


aws dynamodb update-table \
 --attribute-definitions \
    AttributeName=MyKeyPart,AttributeType=N \
    AttributeName=MyKeySort,AttributeType=N \
 --global-secondary-index-updates ' [{"Create":{
  "IndexName": "DemoGSI",
  "KeySchema":[
   {"AttributeName":"MyKeyPart","KeyType":"HASH"},
   {"AttributeName":"MyKeySort","KeyType":"RANGE"}
  ],
  "ProvisionedThroughput":{"ReadCapacityUnits":10,"WriteCapacityUnits":5},
  "Projection":{"ProjectionType":"INCLUDE","NonKeyAttributes":["MyInfo001"]}
  }}] ' \
 --table-name Demo

-----------------------------------------------------------------------------------------------
|                                         UpdateTable                                         |
+---------------------------------------------------------------------------------------------+
||                                     TableDescription                                      ||
|+----------------------+--------------------------------------------------------------------+|
||  CreationDateTime    |  1584277665.92                                                     ||
||  ItemCount           |  8                                                                 ||
||  TableArn            |  arn:aws:dynamodb:eu-central-1:802756008554:table/Demo             ||
||  TableId             |  d8238050-556c-4158-85e8-bf90d483d9e2                              ||
||  TableName           |  Demo                                                              ||
||  TableSizeBytes      |  2000331                                                           ||
||  TableStatus         |  UPDATING                                                          ||
|+----------------------+--------------------------------------------------------------------+|
|||                                  AttributeDefinitions                                   |||
||+-------------------------------------------+---------------------------------------------+||
|||               AttributeName               |                AttributeType                |||
||+-------------------------------------------+---------------------------------------------+||
|||  MyKeyPart                                |  N                                          |||
|||  MyKeySort                                |  N                                          |||
||+-------------------------------------------+---------------------------------------------+||
|||                                 GlobalSecondaryIndexes                                  |||
||+----------------+------------------------------------------------------------------------+||
|||  Backfilling   |  False                                                                 |||
|||  IndexArn      |  arn:aws:dynamodb:eu-central-1:802756008554:table/Demo/index/DemoGSI   |||
|||  IndexName     |  DemoGSI                                                               |||
|||  IndexSizeBytes|  0                                                                     |||
|||  IndexStatus   |  CREATING                                                              |||
|||  ItemCount     |  0                                                                     |||
||+----------------+------------------------------------------------------------------------+||
||||                                       KeySchema                                       ||||
|||+----------------------------------------------------+----------------------------------+|||
||||                    AttributeName                   |             KeyType              ||||
|||+----------------------------------------------------+----------------------------------+|||
||||  MyKeyPart                                         |  HASH                            ||||
||||  MyKeySort                                         |  RANGE                           ||||
|||+----------------------------------------------------+----------------------------------+|||
||||                                      Projection                                       ||||
|||+-----------------------------------------------------+---------------------------------+|||
||||  ProjectionType                                     |  INCLUDE                        ||||
|||+-----------------------------------------------------+---------------------------------+|||
|||||                                  NonKeyAttributes                                   |||||
||||+-------------------------------------------------------------------------------------+||||
|||||  MyInfo001                                                                          |||||
||||+-------------------------------------------------------------------------------------+||||
||||                                 ProvisionedThroughput                                 ||||
|||+----------------------------------------------------------------------+----------------+|||
||||  NumberOfDecreasesToday                                              |  0             ||||
||||  ReadCapacityUnits                                                   |  10            ||||
||||  WriteCapacityUnits                                                  |  5             ||||
|||+----------------------------------------------------------------------+----------------+|||
|||                                        KeySchema                                        |||
||+-----------------------------------------------------+-----------------------------------+||
|||                    AttributeName                    |              KeyType              |||
||+-----------------------------------------------------+-----------------------------------+||
|||  MyKeyPart                                          |  HASH                             |||
|||  MyKeySort                                          |  RANGE                            |||
||+-----------------------------------------------------+-----------------------------------+||
|||                                  ProvisionedThroughput                                  |||
||+------------------------------------------------------------------------+----------------+||
|||  NumberOfDecreasesToday                                                |  0             |||
|||  ReadCapacityUnits                                                     |  25            |||
|||  WriteCapacityUnits                                                    |  25            |||
||+------------------------------------------------------------------------+----------------+||

Building the index takes some time with my free tier capacity units limitation:


while aws --output json dynamodb describe-table --table-name Demo | grep '"TableStatus": "UPDATING"' ; do sleep 1 ; done 2>/dev/null | awk '{printf "."}END{print}'
while aws --output json dynamodb describe-table --table-name Demo | grep '"IndexStatus": "CREATING"' ; do sleep 1 ; done 2>/dev/null | awk '{printf "."}END{print}'

....        "TableStatus": "UPDATING",
.................................................................................................................................                "IndexStatus": "CREATING",

This index has the same HASH/RANGE keys as the table be it will be smaller because the only additional attribute is MyInfo001 ( “Projection”:{“ProjectionType”:”INCLUDE”,”NonKeyAttributes”:[“MyInfo001”]} )

Note that I mentioned a ProvisionedThoughput in the index create: this is mandatory. Global indexes have their own provisioning and that means that you need to think about the frequency of queries which will use them as the goal is to reduce the table’s provisioning in order to reduce the cost.

We are not in the SQL / relational nirvana where you create an index and transparently all queries that can benefit from it will do:

aws --output json dynamodb query \
 --key-condition-expression "MyKeyPart = :k" \
 --expression-attribute-values  '{":k":{"N":"1"}}' \
 --return-consumed-capacity TABLE \
 --return-consumed-capacity INDEXES \
 --select SPECIFIC_ATTRIBUTES \
 --projection-expression="MyInfo001" \
 --no-consistent-read \
 --table-name Demo
...
    "ScannedCount": 4,
    "ConsumedCapacity": {
        "CapacityUnits": 122.5,
        "TableName": "Demo",
        "Table": {
            "CapacityUnits": 122.5
        }
    }
}

Same cost as without the index: 4 items read from the table, that’s 4x245KB and each 8KB takes 1 RCU with eventual consistent reads (–no-consistent-read) so that’s 122.5 RCI.

There’s no optimizer or query planner here, we need to explicitly mention the index usage (–index-name DemoGSI ):

aws --output json dynamodb query \
 --key-condition-expression "MyKeyPart = :k" \
 --expression-attribute-values  '{":k":{"N":"1"}}' \
 --return-consumed-capacity TABLE \
 --return-consumed-capacity INDEXES \
 --select SPECIFIC_ATTRIBUTES \
 --projection-expression="MyInfo001" \
 --no-consistent-read \
 --index-name DemoGSI \
 --table-name Demo
...
   ],
    "ScannedCount": 4,
    "ConsumedCapacity": {
        "CapacityUnits": 0.5,
        "GlobalSecondaryIndexes": {
            "DemoGSI": {
                "CapacityUnits": 0.5
            }
        },
        "TableName": "Demo",
        "Table": {
            "CapacityUnits": 0.0
        }
    }
}

No RCU consumed from the table and only 0.5 RCU consumed from the index: 4 entries that are smaller than 4KB, which is only 0.5 RCU. That is the huge benefit from this index which does not contain the large MyData001 attribute.

The key columns were not displayed, so I can add them in the projection (–projection-expression=”MyKeyPart”,”MyKeySort”,”MyInfo001″):

aws --output json dynamodb query \
 --key-condition-expression "MyKeyPart = :k" \
 --expression-attribute-values  '{":k":{"N":"1"}}' \
 --return-consumed-capacity TABLE \
 --return-consumed-capacity INDEXES \
 --select SPECIFIC_ATTRIBUTES \
 --projection-expression="MyKeyPart","MyKeySort","MyInfo001" \
 --no-consistent-read \
 --index-name DemoGSI \
 --table-name Demo
...
   "ScannedCount": 4,
    "ConsumedCapacity": {
        "CapacityUnits": 0.5,
        "GlobalSecondaryIndexes": {
            "DemoGSI": {
                "CapacityUnits": 0.5
            }
        },
        "TableName": "Demo",
        "Table": {
            "CapacityUnits": 0.0
        }
    }
}

This is the same cost because all is there in the index.

One drawback with Global Secondary Indexes is that they support eventual consistency only. Trying a –consistent-read:

aws --output json dynamodb query \
 --key-condition-expression "MyKeyPart = :k" \
 --expression-attribute-values  '{":k":{"N":"1"}}' \
 --return-consumed-capacity TABLE \
 --return-consumed-capacity INDEXES \
 --select SPECIFIC_ATTRIBUTES \
 --projection-expression="MyKeyPart","MyKeySort","MyInfo001" \
 --consistent-read \
 --index-name DemoGSI \
 --table-name Demo

An error occurred (ValidationException) when calling the Query operation: Consistent reads are not supported on global secondary indexes

This raises an error. Global Secondary Indexes are like another table and DynamoDB does not ensure multi-table consistency.

I mentioned that the index is like another table, and that I have to mention it to read it (–index-name). But there is worse (when compared with Relational SQL databases). An index is supposed to be a pointer to a table row but that is not the case here. If I want to get more attributes from the table I have to do it in another query. I cannot just add those attributes in the projection (–projection-expression) if they are not present in the index:


aws --output json dynamodb query \
 --key-condition-expression "MyKeyPart = :k" \
 --expression-attribute-values  '{":k":{"N":"1"}}' \
 --return-consumed-capacity TABLE \
 --return-consumed-capacity INDEXES \
 --select SPECIFIC_ATTRIBUTES \
 --projection-expression="MyKeyPart","MyKeySort","MyInfo001","MyData001" \
 --no-consistent-read \
 --index-name DemoGSI \
 --table-name Demo

An error occurred (ValidationException) when calling the Query operation: One or more parameter values were invalid: Global secondary index DemoGSI does not project [MyData001]

 
Here, because the index is prefixed with the partition key I can also choose a local index. This may have some advantages (like allowing strong consistency) but also many drawbacks which makes them no so friendly. I’ll continue on that in the next post on DynamoDB: adding a Local covering index to reduce the cost.

Cet article DynamoDB: adding a Global covering index to reduce the cost est apparu en premier sur Blog dbi services.

About temp_tablespaces in PostgreSQL

Sat, 2020-03-14 04:29

There often is a discussion if tablespaces should be used in PostgreSQL. People used to work with Oracle are often a bit surprised by this discussion as there is no way around using tablespaces in that technology. PostgreSQL does not force you to use tablespaces at all and I am pretty sure that there are plenty of PostgreSQL installations out there that do not use them and never even considered using them and that is absolutely fine. In our PostgreSQL basics workshop we cover tablespaces but recommend not to use them if there are no good reasons. Implementing tablespaces in PostgreSQL just because you are forced to use them in other databases is not the way to go. In this post I’d like to talk about one reason where tablespaces can make sense, more precisely: temp_tablespaces.

As the name implies, temporary tablespaces are there for temporary objects. The question is then, what exactly is considered as a temporary object? Well, the obvious case is temporary tables. Without any temporary tablespaces, when you create a temporary table, the temporary files get created in the default tablespace of the current database. In a standard PostgreSQL setup it looks like this:

postgres=# \l+
                                                                    List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   |  Size   | Tablespace |                Description                 
-----------+----------+----------+-------------+-------------+-----------------------+---------+------------+--------------------------------------------
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |                       | 872 MB  | pg_default | default administrative connection database
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +| 7393 kB | pg_default | unmodifiable empty database
           |          |          |             |             | postgres=CTc/postgres |         |            | 
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +| 7393 kB | pg_default | default template for new databases
           |          |          |             |             | postgres=CTc/postgres |         |            | 

All the databases have a default tablespace called “pg_default” which is a kind of pseudo tablespace as it does not really exist. Asking the catalog about the location of that tablespace will show an empty location:

postgres=# select spcname AS "Name"
postgres-#      , pg_catalog.pg_get_userbyid(spcowner) AS "Owner"
postgres-#      , pg_catalog.pg_tablespace_location(oid) AS "Location"
postgres-#   from pg_catalog.pg_tablespace
postgres-#  where pg_catalog.pg_tablespace.spcname = 'pg_default'
postgres-#  order by 1;
    Name    |  Owner   | Location 
------------+----------+----------
 pg_default | postgres | 
(1 row)

If we create temporary objects, where will the files be created then?

postgres=# create temporary table tmp1 ( a int, b text, c date );
CREATE TABLE
postgres=# select pg_relation_filepath('tmp1');
 pg_relation_filepath 
----------------------
 base/12732/t3_16436
(1 row)

This is the standard directory of my “postgres” database:

postgres@centos8pg:/home/postgres/ [pgdev] cd $PGDATA
postgres@centos8pg:/u02/pgdata/DEV/ [pgdev] oid2name 
All databases:
    Oid  Database Name  Tablespace
----------------------------------
  12732       postgres  pg_default
  12731      template0  pg_default
      1      template1  pg_default
postgres@centos8pg:/u02/pgdata/DEV/ [pgdev] ls -l base/12732/t3_16436
-rw-------. 1 postgres postgres 0 Mar 12 18:17 base/12732/t3_16436

So, by default, files required for temporary tables go to the same location as all the other files that make up the specific database. If we populate the temporary table the files will grow, of course:

postgres=# insert into tmp1 (a,b,c) select i,i::text,now() from generate_series(1,100) i;
INSERT 0 100
postgres=# \! ls -la $PGDATA/base/12732/t3_16436
-rw-------. 1 postgres postgres 8192 Mar 12 18:41 /u02/pgdata/DEV/base/12732/t3_16436
postgres=# insert into tmp1 (a,b,c) select i,i::text,now() from generate_series(1,1000) i;
INSERT 0 1000
postgres=# \! ls -la $PGDATA/base/12732/t3_16436
-rw-------. 1 postgres postgres 49152 Mar 12 18:42 /u02/pgdata/DEV/base/12732/t3_16436
postgres=# 

The consequence of that is, that I/O for temporary tables will compete with I/O for all the other objects in this PostgreSQL cluster. Another consequence is, that temporary tables can potentially fill your file system until it is full and no one will be be able to work from then on. This is the first reason for creating one or more dedicated temporary tablespaces: By doing this you can avoid that temporary tables going crazy impact your whole cluster as long as the temporary tablespace is on it’s own file system.

Creating a temporary tablespace is not different from creating a normal tablespace as it is actually exactly the same:

postgres=# \! mkdir /var/tmp/tbstmp
postgres=# create tablespace tbstmp location '/var/tmp/tbstmp';
CREATE TABLESPACE
postgres=# \db+
                                      List of tablespaces
    Name    |  Owner   |    Location     | Access privileges | Options |  Size   | Description 
------------+----------+-----------------+-------------------+---------+---------+-------------
 pg_default | postgres |                 |                   |         | 886 MB  | 
 pg_global  | postgres |                 |                   |         | 575 kB  | 
 tbstmp     | postgres | /var/tmp/tbstmp |                   |         | 0 bytes | 
(3 rows)

Once we have the new tablespace we can tell PostgreSQL to use it as the default for temporary objects:

postgres=# alter system set temp_tablespaces = 'tbstmp';
ALTER SYSTEM
postgres=# select pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)

postgres=# show temp_tablespaces ;
 temp_tablespaces 
------------------
 tbstmp
(1 row)

Creating another temporary table will result in the files being created in the new location:

postgres=# create temporary table tmp2 ( a int, b text, c date );
CREATE TABLE
postgres=# select pg_relation_filepath('tmp2');
              pg_relation_filepath              
------------------------------------------------
 pg_tblspc/16442/PG_13_202003051/12732/t3_16443
(1 row)

postgres=# \! ls -la $PGDATA/pg_tblspc/
total 4
drwx------.  2 postgres postgres   19 Mar 12 18:50 .
drwx------. 20 postgres postgres 4096 Mar 12 18:54 ..
lrwxrwxrwx.  1 postgres postgres   15 Mar 12 18:50 16442 -> /var/tmp/tbstmp
postgres=# \! ls -la $PGDATA/pg_tblspc/16442/
total 0
drwx------. 3 postgres postgres  29 Mar 12 18:50 .
drwxrwxrwt. 7 root     root     163 Mar 12 18:49 ..
drwx------. 3 postgres postgres  19 Mar 12 18:53 PG_13_202003051
postgres=# \! ls -la $PGDATA/pg_tblspc/16442/PG_13_202003051/
total 0
drwx------. 3 postgres postgres 19 Mar 12 18:53 .
drwx------. 3 postgres postgres 29 Mar 12 18:50 ..
drwx------. 2 postgres postgres 54 Mar 12 18:53 12732
postgres=# \! ls -la $PGDATA/pg_tblspc/16442/PG_13_202003051/12732/
total 8
drwx------. 2 postgres postgres   54 Mar 12 18:53 .
drwx------. 3 postgres postgres   19 Mar 12 18:53 ..
-rw-------. 1 postgres postgres    0 Mar 12 18:53 t3_16443
-rw-------. 1 postgres postgres    0 Mar 12 18:53 t3_16446
-rw-------. 1 postgres postgres 8192 Mar 12 18:53 t3_16448

If you are wondering why there are three files, here is the answer:

postgres=# select relname from pg_class where oid in (16443,16446,16448);
       relname        
----------------------
 pg_toast_16443
 pg_toast_16443_index
 tmp2
(3 rows)

The toast objects get created as well as I have a “text” column in my temporary table. Creating a temporary table with data types that do not require toast objects will result in one file only:

postgres=# create temporary table tmp3 ( a int, b date );
CREATE TABLE
postgres=# select pg_relation_filepath('tmp3');
              pg_relation_filepath              
------------------------------------------------
 pg_tblspc/16442/PG_13_202003051/12732/t3_16449
(1 row)
postgres=# \! ls -la $PGDATA/pg_tblspc/16442/PG_13_202003051/12732/
total 8
drwx------. 2 postgres postgres   70 Mar 12 19:07 .
drwx------. 3 postgres postgres   19 Mar 12 18:53 ..
-rw-------. 1 postgres postgres    0 Mar 12 18:53 t3_16443
-rw-------. 1 postgres postgres    0 Mar 12 18:53 t3_16446
-rw-------. 1 postgres postgres 8192 Mar 12 18:53 t3_16448
-rw-------. 1 postgres postgres    0 Mar 12 19:07 t3_16449

So for now we know that all temporary tables will go to the new temporary table space. What else will go there from now on. There is a parameter log_temp_files which can be used to report temp file usage into the PostgreSQL log file and this comes quite handy if you want to know what goes there. The default setting is “-1” which means of, “0” means log everything, all other values greater than “1” specify the minimum size of the temp files for being logged. Setting it to “0” will, as said, log all temp files being created in the background so let’s do that:

postgres=# alter system set log_temp_files = 0;
ALTER SYSTEM
postgres=# select pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)

Creating another temporary table and then checking the log file will confirm that this is working and we’ll get the information we want:

postgres=# create temporary table tmp4 as select * from generate_series(1,1000000);
SELECT 1000000

The entry in the log file will look like this and it confirms that the temporary files have been written to the temporary tablespsace we created above:

2020-03-13 02:33:35.658 CET - 1 - 10535 - [local] - postgres@postgres LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10535.0", size 14000000
2020-03-13 02:33:35.658 CET - 2 - 10535 - [local] - postgres@postgres STATEMENT:  create temporary table tmp4 as select * from generate_series(1,1000000);

But: If you create, with the default configuration of PostgreSQL, this temporary table:

postgres=# create temporary table tmp4 as select * from generate_series(1,100000);
SELECT 10000

… you will not see any lines in the log file for this. Why? Because there is temp_buffers and temporary files will be only be reported in the log file if they exceed the value of this parameter. In the default configuration this is ‘8MB’ and that is not enough for the smaller temporary table to be logged. Decreasing the parameter will log the temporary files for the smaller table as well:

postgres=# set temp_buffers = '1024kB';
SET
postgres=# create temporary table tmp5 as select * from generate_series(1,100000);
SELECT 100000

So now we know how to log the creation of temporary files to the PostgreSQL log file. What other operations will cause temporary file to be created? Sorts?

postgres=# set work_mem = '64kB';
SET
postgres=# select * from generate_series(1,1000000) order by random();

Yes, definitely:

2020-03-13 02:47:14.297 CET - 19 - 10609 - [local] - postgres@postgres LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.2", size 26083328
2020-03-13 02:47:14.297 CET - 20 - 10609 - [local] - postgres@postgres STATEMENT:  select * from generate_series(1,1000000) order by random();
2020-03-13 02:47:14.298 CET - 21 - 10609 - [local] - postgres@postgres LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.1", size 14000000
2020-03-13 02:47:14.298 CET - 22 - 10609 - [local] - postgres@postgres STATEMENT:  select * from generate_series(1,1000000) order by random();
2020-03-13 02:47:14.298 CET - 23 - 10609 - [local] - postgres@postgres LOG:  duration: 2994.386 ms  statement: select * from generate_series(1,1000000) order by random();

What about creating indexes?

postgres=# create table tt1 as select * from generate_series(1,1000000);
SELECT 1000000
postgres=# create index ii1 on tt1(generate_series );
CREATE INDEX

Yes, that creates temporary files as well:

2020-03-13 02:54:00.933 CET - 33 - 10609 - [local] - postgres@postgres LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.0.sharedfileset/0.0", size 12394496
2020-03-13 02:54:00.933 CET - 34 - 10609 - [local] - postgres@postgres STATEMENT:  create index ii1 on tt1(generate_series );
2020-03-13 02:54:00.934 CET - 35 - 10609 - [local] - postgres@postgres LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.0.sharedfileset/1.0", size 7659520
2020-03-13 02:54:00.934 CET - 36 - 10609 - [local] - postgres@postgres STATEMENT:  create index ii1 on tt1(generate_series );
2020-03-13 02:54:00.948 CET - 37 - 10609 - [local] - postgres@postgres LOG:  duration: 1149.625 ms  statement: create index ii1 on tt1(generate_series );

Foreign keys?

postgres=# create table ttt1 as select * from generate_series(1,1000000) a;
SELECT 1000000
postgres=# create unique index iii1 on ttt1(a);
CREATE INDEX
postgres=# insert into ttt2 select a,a from generate_series(1,1000000) a;
INSERT 0 1000000
postgres=# alter table ttt2 add constraint fk_t foreign key (b) references ttt1(a);
ALTER TABLE

Yes, that as well:

2020-03-13 03:01:07.127 CET - 65 - 10609 - [local] - postgres@postgres LOG:  duration: 1127.768 ms  statement: alter table ttt2 add constraint fk_t foreign key (b) references ttt1(a);
2020-03-13 03:01:15.375 CET - 66 - 10609 - [local] - postgres@postgres LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.6", size 67374

So quite a few operations that generate temporary files in background. Separating this on a separate mount point actually can make a lot of sense. From a performance perspective (if I/O spread on the storage layer as well) but also from a security perspective as huge operations that require temporary files will not affect “normal” operations on the instance.

There is another case generating temporary files which is not maybe not clear to everybody. Consider this:

postgres=# create materialized view mv1 as select ttt1.a, ttt2.b from ttt1, ttt2 where ttt1.a = ttt2.b;
SELECT 1000000

This will create many temporary files in the background as well:

...
2020-03-13 03:11:03.721 CET - 4146 - 10609 - [local] - postgres@postgres LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.1289", size 22704
2020-03-13 03:11:03.721 CET - 4147 - 10609 - [local] - postgres@postgres STATEMENT:  create materialized view mv1 as select ttt1.a, ttt2.b from ttt1, ttt2 where ttt1.a = ttt2.b;
2020-03-13 03:11:03.722 CET - 4148 - 10609 - [local] - postgres@postgres LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.143", size 23136
2020-03-13 03:11:03.722 CET - 4149 - 10609 - [local] - postgres@postgres STATEMENT:  create materialized view mv1 as select ttt1.a, ttt2.b from ttt1, ttt2 where ttt1.a = ttt2.b;
2020-03-13 03:11:03.723 CET - 4150 - 10609 - [local] - postgres@postgres LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.1166", size 23136
2020-03-13 03:11:03.723 CET - 4151 - 10609 - [local] - postgres@postgres STATEMENT:  create materialized view mv1 as select ttt1.a, ttt2.b from ttt1, ttt2 where ttt1.a = ttt2.b;
2020-03-13 03:11:03.723 CET - 4152 - 10609 - [local] - postgres@postgres LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.792", size 23640
...

And even refreshes consume temporary files:

postgres=# create unique index mv_i1 on mv1(a);
CREATE INDEX
postgres=# refresh materialized view concurrently mv1;
REFRESH MATERIALIZED VIEW
postgres=# 

From the log file:

...
2020-03-13 03:14:05.866 CET - 20543 - 10609 - [local] - postgres@postgres STATEMENT:  refresh materialized view concurrently mv1;
2020-03-13 03:14:05.866 CET - 20544 - 10609 - [local] - postgres@postgres LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.8193", size 26768
2020-03-13 03:14:05.866 CET - 20545 - 10609 - [local] - postgres@postgres CONTEXT:  SQL statement "CREATE TEMP TABLE pg_temp_3.pg_temp_16513_2 AS SELECT mv.ctid AS tid, newdata FROM public.mv1 mv FULL JOIN pg_temp_3.pg_temp_16513 newdata ON (newdata.a OPERATOR(pg_catalog.=) mv.a AND newdata OPERATOR(pg_catalog.*=) mv) WHERE newdata IS NULL OR mv IS NULL ORDER BY tid"
2020-03-13 03:14:05.866 CET - 20546 - 10609 - [local] - postgres@postgres STATEMENT:  refresh materialized view concurrently mv1;
2020-03-13 03:14:05.866 CET - 20547 - 10609 - [local] - postgres@postgres LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.6147", size 28487
2020-03-13 03:14:05.866 CET - 20548 - 10609 - [local] - postgres@postgres CONTEXT:  SQL statement "CREATE TEMP TABLE pg_temp_3.pg_temp_16513_2 AS SELECT mv.ctid AS tid, newdata FROM public.mv1 mv FULL JOIN pg_temp_3.pg_temp_16513 newdata ON (newdata.a OPERATOR(pg_catalog.=) mv.a AND newdata OPERATOR(pg_catalog.*=) mv) WHERE newdata IS NULL OR mv IS NULL ORDER BY tid"
...

There are more operations that require temporary files in the background but for the scope of this post we stop here. One last thing: The amount of temporary files generated can also be limited by temp_file_limit:

postgres=# set temp_file_limit='1MB';
SET
postgres=# refresh materialized view concurrently mv1;
ERROR:  temporary file size exceeds temp_file_limit (1024kB)
postgres=# 

Cet article About temp_tablespaces in PostgreSQL est apparu en premier sur Blog dbi services.

Upgrading SUSE Linux Enterprise online

Tue, 2020-03-10 10:48

One really good feature from SUSE Linux Enterprise (SLE) is that it allows you to upgrade from your running system to a new service pack online.
Let’s say from SLE 12 SP3 to SLE 12 SP4. No new installation is needed. All existing data are kept intact.
In the following blog post, I will show you how easy it is.

Preparation

Of course, before upgrading we need to do some checks to be sure the system is properly prepared.
We first start with the version.

SUSE12SP3:~ # cat /etc/os-release
NAME="SLES"
VERSION="12-SP3"
VERSION_ID="12.3"
PRETTY_NAME="SUSE Linux Enterprise Server 12 SP3"
ID="sles"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:12:sp3"

Then the current system must absolutely be updated to the latest patch level, otherwise it won’t be supported.

SUSE12SP3:~ # zypper patch
Refreshing service 'SUSE_Linux_Enterprise_Server_12_SP3_x86_64'.
Refreshing service 'SUSE_Linux_Enterprise_Software_Development_Kit_12_SP3_x86_64'.
Loading repository data...
Reading installed packages...
Resolving package dependencies...
Nothing to do.
Backups

For safety reasons and in case we should rollback, we back up all used repositories and also all installed packages as following

SUSE12SP3:~ # zypper lr -e repositories.bak
Repositories have been successfully exported to ./repositories.bak.repo
SUSE12SP3:~ # rpm -qa --queryformat '%{NAME}\n' > installed-software.bak
Migration plugin

As a prerequite and as we will perform from the command line, we need to know if the Zypper subcommand for online migration to new service pack is installed

SUSE12SP3:~ # zypper se zypper-migration-plugin
Refreshing service 'SUSE_Linux_Enterprise_Server_12_SP3_x86_64'.
Refreshing service 'SUSE_Linux_Enterprise_Software_Development_Kit_12_SP3_x86_64'.
Loading repository data...
Reading installed packages...

S | Name | Summary | Type
--+-------------------------+----------------------------------------+--------
i | zypper-migration-plugin | Zypper subcommand for online migration | package

As it is already installed we can carry on.

Migration

Everything is in place, we can start the upgrade process

SUSE12SP3:~ # zypper migration
Executing 'zypper refresh'
Repository 'SLES12-SP3-Pool' is up to date.
Repository 'SLES12-SP3-Updates' is up to date.
Repository 'SLE-SDK12-SP3-Pool' is up to date.
Repository 'SLE-SDK12-SP3-Updates' is up to date.
All repositories have been refreshed.

Executing 'zypper --no-refresh patch-check --updatestack-only'

Loading repository data...
Reading installed packages...

0 patches needed (0 security patches)

Available migrations:

1 | SUSE Linux Enterprise Server 12 SP5 x86_64
SUSE Linux Enterprise Software Development Kit 12 SP5 x86_64

2 | SUSE Linux Enterprise Server 12 SP4 x86_64
SUSE Linux Enterprise Software Development Kit 12 SP4 x86_64

[num/q]: 2

Executing 'snapper create --type pre --cleanup-algorithm=number --print-number --userdata important=yes --description 'before online migration''

Upgrading product SUSE Linux Enterprise Server 12 SP4 x86_64.
Upgrading product SUSE Linux Enterprise Software Development Kit 12 SP4 x86_64.

Executing 'zypper --releasever 12.4 ref -f'

Warning: Enforced setting: $releasever=12.4
Forcing raw metadata refresh
Retrieving repository 'SLES12-SP4-Pool' metadata .......................................................................[done]
Forcing building of repository cache
Building repository 'SLES12-SP4-Pool' cache ............................................................................[done]
Forcing raw metadata refresh
Retrieving repository 'SLES12-SP4-Updates' metadata ....................................................................[done]
Forcing building of repository cache
Building repository 'SLES12-SP4-Updates' cache .........................................................................[done]
Forcing raw metadata refresh
Retrieving repository 'SLE-SDK12-SP4-Pool' metadata ....................................................................[done]
Forcing building of repository cache
Building repository 'SLE-SDK12-SP4-Pool' cache .........................................................................[done]
Forcing raw metadata refresh
Retrieving repository 'SLE-SDK12-SP4-Updates' metadata .................................................................[done]
Forcing building of repository cache
Building repository 'SLE-SDK12-SP4-Updates' cache ......................................................................[done]
All repositories have been refreshed.

Executing 'zypper --releasever 12.4 --no-refresh dist-upgrade --no-allow-vendor-change '

Warning: Enforced setting: $releasever=12.4
Warning: You are about to do a distribution upgrade with all enabled repositories. Make sure these repositories are compatible before you continue. See 'man zypper' for more information about this command.
Loading repository data...
Reading installed packages...
Computing distribution upgrade...

The following 31 NEW packages are going to be installed:
Mesa-dri Mesa-dri-32bit crash-kmp-default-7.2.1_k4.12.14_95.13-4.6.2 kernel-default-4.12.14-95.48.1 libGeoIP1 libLLVM6
libLLVM6-32bit libXfont2-2 libbind9-160 libdns169 libebl1 libfreebl3-hmac libirs160 libisc166 libisc166-32bit libisccc160
libisccfg160 libjson-c2-32bit liblwres160 libmlx4-1 libnghttp2-14 libnghttp2-14-32bit libsoftokn3-hmac libssh4 libssh4-32bit
libwebkit2gtk3-lang openssl-1_0_0 python-bind python-ply ruby2.1-rubygem-simpleidn sles-release-POOL

The following application is going to be REMOVED:
"Firefox Web Browser"

The following 3 packages are going to be REMOVED:
bind-libs crash-kmp-default-7.1.8_k4.4.175_94.79-4.11.1 libwicked-0-6

The following 464 packages are going to be upgraded:
Mesa Mesa-32bit Mesa-libEGL1 Mesa-libEGL1-32bit Mesa-libGL1 Mesa-libGL1-32bit Mesa-libGLESv2-2 Mesa-libglapi0
Mesa-libglapi0-32bit MozillaFirefox MozillaFirefox-branding-SLE MozillaFirefox-translations-common SUSEConnect aaa_base
aaa_base-extras accountsservice accountsservice-lang apparmor-docs apparmor-parser apparmor-profiles apparmor-utils audit
autofs autoyast2 autoyast2-installation bind-utils binutils bluez bzip2 ca-certificates-mozilla cifs-utils cpio cpio-lang
cpp48 crash crda cron cronie cups cups-client cups-filters cups-filters-cups-browsed cups-filters-foomatic-rip
cups-filters-ghostscript cups-libs cups-libs-32bit curl cyrus-sasl cyrus-sasl-32bit cyrus-sasl-digestmd5 cyrus-sasl-gssapi
cyrus-sasl-gssapi-32bit cyrus-sasl-plain cyrus-sasl-plain-32bit dbus-1 dbus-1-x11 desktop-file-utils device-mapper
device-mapper-32bit dhcp dhcp-client e2fsprogs elfutils ethtool evince evince-browser-plugin evince-lang
evince-plugin-pdfdocument evince-plugin-psdocument expat gdb gdk-pixbuf-lang gdk-pixbuf-loader-rsvg gdk-pixbuf-query-loaders
ghostscript ghostscript-x11 glib2-lang glib2-tools glibc glibc-32bit glibc-i18ndata glibc-locale glibc-locale-32bit
google-noto-fonts-doc gpg2 gpg2-lang grub2 grub2-i386-pc grub2-snapper-plugin grub2-systemd-sleep-plugin iproute2 irqbalance
kdump kernel-firmware kexec-tools kmod kmod-compat kpartx krb5 krb5-32bit libIlmImf-Imf_2_1-21 libQt5Core5 libQt5DBus5
libQt5Gui5 libQt5Network5 libQt5Widgets5 libXi6 libXi6-32bit libXvnc1 libaccountsservice0 libapparmor1 libarchive13 libasm1
............................................................................................................................
xdm xf86-input-evdev xf86-input-synaptics xf86-input-void xf86-input-wacom xf86-video-fbdev xf86-video-vesa
xf86-video-vmware xfsprogs xorg-x11-Xvnc yast2 yast2-add-on yast2-apparmor yast2-auth-client yast2-bootloader
yast2-ca-management yast2-control-center yast2-core yast2-country yast2-country-data yast2-firewall yast2-ftp-server
yast2-installation yast2-iscsi-client yast2-kdump yast2-migration yast2-network yast2-nfs-client yast2-ntp-client
yast2-packager yast2-pkg-bindings yast2-proxy yast2-registration yast2-ruby-bindings yast2-samba-client yast2-storage
yast2-sudo yast2-theme-SLE yast2-users yast2-vpn zypp-plugin-python zypper zypper-log

The following 2 products are going to be upgraded:
SUSE Linux Enterprise Server 12 SP3 12.3-0 -> 12.4-0
SUSE Linux Enterprise Software Development Kit 12 SP3 12.3-0 -> 12.4-0

The following 29 packages are going to be downgraded:
dracut iscsiuio libLLVM libLLVM-32bit libopeniscsiusr0_2_0 libxatracker2 libyui-ncurses-pkg7 libyui-qt-pkg7 open-iscsi
patterns-sles-32bit patterns-sles-Minimal patterns-sles-Minimal-32bit patterns-sles-apparmor patterns-sles-apparmor-32bit
patterns-sles-base patterns-sles-base-32bit patterns-sles-documentation patterns-sles-documentation-32bit
patterns-sles-laptop patterns-sles-laptop-32bit patterns-sles-x11 patterns-sles-x11-32bit patterns-sles-yast2
patterns-sles-yast2-32bit perl-Bootloader ruby2.1-rubygem-cfa_grub2 xorg-x11-server xorg-x11-server-extra yast2-snapper

The following 15 patterns are going to be downgraded:
32bit Minimal apparmor base documentation laptop sles-Minimal-32bit sles-apparmor-32bit sles-base-32bit
sles-documentation-32bit sles-laptop-32bit sles-x11-32bit sles-yast2 x11 yast2

The following 15 patterns are going to be downgraded:
32bit Minimal apparmor base documentation laptop sles-Minimal-32bit sles-apparmor-32bit sles-base-32bit
sles-documentation-32bit sles-laptop-32bit sles-x11-32bit sles-yast2 x11 yast2

The following package is going to change architecture:
openssl x86_64 -> noarch

The following 2 packages have no support information from their vendor:
sle-sdk-release sle-sdk-release-POOL

464 packages to upgrade, 29 to downgrade, 31 new, 3 to remove, 1 to change arch.
Overall download size: 544.7 MiB. Already cached: 0 B. After the operation, additional 692.3 MiB will be used.
Continue? [y/n/...? shows all options] (y):
Continue? [y/n/...? shows all options] (y): ?

y - Yes, accept the summary and proceed with installation/removal of packages.
n - No, cancel the operation.
v - Toggle display of package versions.
a - Toggle display of package architectures.
r - Toggle display of repositories from which the packages will be installed.
m - Toggle display of package vendor names.
d - Toggle between showing all details and as few details as possible.
g - View the summary in pager.

[y/n/...? shows all options] (y): y

(Use the Enter or Space key to scroll the text by lines or pages.)

In order to install 'SUSE Linux Enterprise Software Development Kit 12 SP4' (product), you must agree 
to terms of the following license agreement:

SUSE302256 Linux Enterprise End User License Agreement for the following
SUSE products:

SUSE Linux Enterprise Server 12 Service Pack 4
SUSE Linux Enterprise Server for SAP Applications 12 Service Pack 4
SUSE Linux High Availability Extension 12 Service Pack 4
SUSE Linux Enterprise HPC 12 Service Pack 4
SUSE Linux Enterprise Desktop 12 Service Pack 4
SUSE Linux Enterprise Workstation Extension 12 Service Pack 4
SUSE Linux Enterprise Live Patching 12 Service Pack 4
SUSE Linux Enterprise Software Development Kit 12 Service Pack 4

U.S. Government Restricted Rights. Use, duplication, or disclosure of any
Deliverables by the U.S. Government is subject to the restrictions in FAR
52.227-14 (Dec 2007) Alternate III (Dec 2007), FAR 52.227-19 (Dec 2007),
or DFARS 252.227-7013(b)(3) (Nov 1995), or applicable successor clauses.

:version:2018-10-01:001:

(Press 'q' to exit the pager.)

Do you agree with the terms of the license? [yes/no] (no): yes
Retrieving package google-noto-fonts-doc-20170919-1.12.noarch (1/524), 8.5 KiB ( 4.2 KiB unpacked)
Retrieving: google-noto-fonts-doc-20170919-1.12.noarch.rpm .............................................................[done]
Retrieving package sles-admin_en-pdf-12.4-1.3.noarch (2/524), 5.9 MiB ( 7.4 MiB unpacked)
Retrieving: sles-admin_en-pdf-12.4-1.3.noarch.rpm ........................................................[done (864.0 KiB/s)]
Retrieving package sles-deployment_en-pdf-12.4-1.3.noarch (3/524), 5.7 MiB ( 6.5 MiB unpacked)
Retrieving: sles-deployment_en-pdf-12.4-1.3.noarch.rpm .....................................................[done (2.6 MiB/s)]
Retrieving package sles-installquick_en-pdf-12.4-1.3.noarch (4/524), 1.4 MiB ( 1.5 MiB unpacked)....................................................................

Checking for file conflicts: ...........................................................................................[done]
( 1/524) Installing: google-noto-fonts-doc-20170919-1.12.noarch .......................................................[done]
( 2/524) Installing: sles-admin_en-pdf-12.4-1.3.noarch ................................................................[done]
( 3/524) Installing: sles-deployment_en-pdf-12.4-1.3.noarch ...........................................................[done]

dracut: *** Generating early-microcode cpio image ***
dracut: *** Constructing GenuineIntel.bin ****
dracut: *** Store current command line parameters ***
dracut: Stored kernel commandline:
dracut: resume=UUID=2966dd8c-52f1-489e-8818-1d9ac18943e7
dracut: root=UUID=36c541f9-d384-48d5-b291-e2bfcb4d7a02 rootfstype=btrfs rootflags=rw,relatime,space_cache,subvolid=259,subvol=/@/.snapshots/1/snapshot,subvol=@/.snapshots/1/snapshot
dracut: *** Creating image file '/boot/initrd-4.4.180-94.100-default' ***
dracut: *** Creating initramfs image file '/boot/initrd-4.4.180-94.100-default' done ***

There are some running programs that might use files deleted by recent upgrade. You may wish to check and restart some of them. Run 'zypper ps -s' to list these programs.

The upgrade process is now over, lest’s check if it was successful

SUSE12SP3:~ # cat /etc/os-release
NAME="SLES"
VERSION="12-SP4"
VERSION_ID="12.4"
PRETTY_NAME="SUSE Linux Enterprise Server 12 SP4"
ID="sles"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:12:sp4"
Conclusion

The upgrade was successful and it needed less than half an hour, we can now restart the system in order to use the new kernel
Easier, it can’t be

Cet article Upgrading SUSE Linux Enterprise online est apparu en premier sur Blog dbi services.

ysql_bench: the YugaByteDB version of pgbench

Mon, 2020-03-09 03:00
By Franck Pachot

.
This follows the previous post on testing YugaByteDB 2.1 performance with pgbench: https://blog.dbi-services.com/yugabytedb-2-1/
A distributed database needs to reduce inter-node synchronization latency and then replaces two-phase pessimistic locking by optimistic concurrency control in many places. This means more serialization errors where a transaction may have to be re-tried. But the PostgreSQL pgbench does not have this possibility and this makes benchmarking distributed database very hard. For example when CERN tested CoackroachDB the conclusion was: “comparative benchmarking of CockroachDB was not possible with the current tools used”.

pgBench retry: the PL/pgSQL solution

In a previous blog post my workaround for this was to implement the retry in a PL/pgSQL procedure.

Here is the PL/pgSQL procedure:


ysqlsh franck <10 then
        raise notice 'Give Up after % retries. tid=%',retries,p_tid;
        raise;
       end if;
       -- continue the retry loop
   end;
   retries=retries+1;
  end loop;
  commit;
  if retries > 2 then
   raise notice 'Required % retries (tid=%)',retries,p_tid;
  end if;
 END;
 $$ language plpgsql;
PSQL

Here is the code to call it (same functionality as the “simple update” builtin):


cat > /tmp/simple-update.sql <<'CAT'
 -- simple-update 
 \set aid random(1, 100000 * :scale)
 \set bid random(1, 1 * :scale)
 \set tid random(1, 10 * :scale)
 \set delta random(-5000, 5000)
  call SIMPLE_UPDATE_RETRY(:aid, :bid, :tid, :delta);
  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
CAT

And how I run it for 30 seconds:


pgbench --no-vacuum --protocol=prepared --file=/tmp/simple-update.sql --time 30 --jobs=10 --client=10 -h localhost -p 5433 -U postgres franck

As I mentioned in my blog post, some serialization errors can still happen because of the limitations in PostgreSQL transaction control in procedures: I cannot retry the errors encountered at commit. This has been raised in the PostgreSQL hackers’s list: https://www.postgresql.org/message-id/flat/54C2FEA5-86FF-4A3C-A84B-4470A6998ACA%40thebuild.com#5a5a575ed5451603c906ba3167d043a1

pgBench retry: the “max-tries” patch

Around the same time when I came with the PL/pgSQL workaround, YugabyteDB has implemented the mentioned patch in their fork of the postgres code: https://github.com/yugabyte/yugabyte-db/commit/af25ba570e11c59e41a1243126ff9f9edae9d422, which is a much better solution. Adding a patch to the community PostgreSQL is hard because this database engine is widely used and they must be conservative to ensure stability. That’s different for a startup company building a new database engine. And what’s awesome with YugaByteDB is that their fork is Open Source and their work can then easily be given back to the community. What YugaByteDB is improving in PostgreSQL is public, documented and open-source. And, good news, this postgres fork is shipped, with all tools, in the YugaByteDB distribution. Did you see in the previous post that I’ve set my PATH with ~/yugabyte-2.1.0.0/postgresql/bin in addition to ~/yugabyte-2.1.0.0/bin? This is where you find them, with the command line tools renamed. ysql_bench is the YugaByteDB version of pgBench. Here is a comparison of the community PostgreSQL and the one compiled with YugaByteDB:

The YugaByteDB version of pgbench has the following differences in version 2.1:

  • The default init steps do not include vacuum (as it makes no sense here): -I, –init-steps=[dtgvpf]+ (default “dtgvp”)
  • The primary keys are declared in the CREATE TABLE rather than an ALTER TABLE
  • The initialization can do intermediate commits, controlled with –batch-size
  • The number of retries (before ending the client thread) is controlled by –max-tries which defaults to zero (no retries)
  • The -r option reports not only the transactions that exceed –latency-limit but also the errors trapped for retry, and is renamed from –report-latencies to –report-per-command
  • Finally, they also changed the table names: ysql_bench_% instead of pgbench_%

To work on the same table names, I continue with a script file:


cat > /tmp/simple-update.sql <<'CAT'                                                                                                                         -- simple-update 
 \set aid random(1, 100000 * :scale)
 \set bid random(1, 1 * :scale)
 \set tid random(1, 10 * :scale)
 \set delta random(-5000, 5000)
 BEGIN;
   UPDATE pgbench_accounts SET abalance = abalance + :delta
    WHERE aid = :aid;
   SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
    INSERT INTO pgbench_history (tid, bid, aid, delta, mtime)
     VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
 END;
CAT

and I run with a “max-tries” settings and “-r” to report the number of retries:


ysql_bench -r --max-tries=10 --no-vacuum --protocol=prepared --file=/tmp/simple-update.sql --time 30 --jobs=10 --client=10 -h localhost -p 5433 -U postgres franck

That is 493 transactions per second. Do not compare with the PL/pgSQL version because here we have more client-server roundtrips.

In order to validate that I have the same result with the builtin script, I run it after renaming the tables because the YugaByte builtin scripts expect ysql_bench_% tables:


ysql_bench -r --max-tries=10 --no-vacuum --protocol=prepared --builtin=simple-update --time 30 --jobs=10 --client=10 -h localhost -p 5433 -U postgres franck

We are in still about 493 transactions per second here.

Compared with “monolith” PostgreSQL

Of course, those numbers are far from what is possible with a monolithic database. The distributed architecture, the cross-shard foreign key validation, the replication to other nodes, have a huge overhead when implemented as remote procedure calls. When you scale-up within one server without the need to scale out, the throughput is higher with the community PostgreSQL:


sudo su - postgres
pg_ctl start
time pgbench --initialize --init-steps=dtgvpf --host localhost -p 5432 -U postgres
pgbench -r --no-vacuum --protocol=prepared --builtin=simple-update --time 30 --jobs=10 --client=10 -h localhost -p 5432 -U postgres

The advantage of a distributed database comes with the possibility to scale out to multiple servers, data centers and even regions. Note that I had no replication with the PostgreSQL test above and running YugaByteDB with no replication doubles the throughput that I had with replication factor 3:

So, there’s no fair comparison possible. Just use what you need: monolith to speed-up, or distributed to scale-out.

The big advantage of YugaByteDB is that the YSQL API is more than just compatibility with PostgreSQL like what CockroachDB does. YugaByte re-uses the PostgreSQL upper layer. Then an application built for a PostgreSQL database with the best one-node performance can scale-out without changing the application when moving to YugaByteDB. And vice-versa. This looks similar to what Amazon is doing with AWS Aurora except that Aurora runs only on AWS but YugaByteDB is open-source and can run anywhere.

There are many comparative benchmarks published but I think that being able to use pgbench is very important to compare a specific workload between PostgreSQL and YugabyteDB in order to make the right deployment decisions. My goal was also to emphasize the need to have a good exception handling strategy in your applications, with retry possibility.

Cet article ysql_bench: the YugaByteDB version of pgbench est apparu en premier sur Blog dbi services.

YugaByteDB 2.1: the Open Source multi-region distributed database with PostgreSQL API is in GA with huge performance improvement

Sun, 2020-03-08 09:48
By Franck Pachot

.
9 months ago I was looking at YugaByteDB which was still in beta version for its ‘YSQL’ API. I published my first test on Medium: https://medium.com/@FranckPachot/running-pgbench-on-yugabytedb-1-3-3a15450dfa42. I have been very enthusiastic about the idea, the architecture, the way they open-sourced it and how all was documented in their blog. I’ve even met them in Sunnyvale when I traveled to California for Oracle Open World. Great people with a great vision on the future of databases. From this first test, I was not impressed by the performance but it was an early beta and a multi-master/multi-index database has many challenges to solve before tuning the details of implementation. This tuning task has been done for the General Availability version 2.1 released in February. I was eager to test it, but this first month back to dbi-services consulting was very busy. So finally here it is.

This post takes the same test I did last July and the result is impressive: the pgbench initialization time is fully fixed and the pgbench run shows 9x higher throughput.

I’m on the same VM which, as funny as it might sound, is an Oracle Linux 7 running on the Oracle Cloud.

Install and start:

I install this version 2.1 in the same way I installed the 1.3 in the past. All is documented: https://docs.yugabyte.com/latest/quick-start/install/linux/


wget -O yugabyte-2.1.0.0-linux.tar.gz  https://downloads.yugabyte.com/yugabyte-2.1.0.0-linux.tar.gz
tar -xzvf yugabyte-2.1.0.0-linux.tar.gz
yugabyte-2.1.0.0/bin/post_install.sh
export PATH="~/yugabyte-2.1.0.0/bin:~/yugabyte-2.1.0.0/postgresql/bin:$PATH"

I create a 3 nodes cluster with replication factor 3, with all nodes in the same host for this test:


yb-ctl --rf 3 create
yb-ctl status

I create database for this:


ysqlsh
 \timing on
 drop database if exists franck;
 create database franck;
 \q

pgbench initialization

I use pgbench that I have from a “normal” PostgreSQL installation on the same server.


[opc@db192 ~]$ type pgbench
pgbench is hashed (/usr/pgsql-11/bin/pgbench)
time pgbench --initialize --host localhost -p 5433 -U postgres franck

If you compare with the previous post on version 1.3 you will see many differences.

  • No “ERROR: DROP multiple objects not supported yet” here. This issue has been fixed.
  • No “ERROR: VACUUM not supported yet” but just a warning because there’s no vaccum here in the YugaByteDB storage layer
  • And the best: 8 seconds instead of the 2 minutes we had before

However the “ALTER TABLE” to add the constraints afterward is still not supported so I run the same manually with the FOREIGN KEY declaration in the CREATE TABLE:


ysqlsh franck
 drop table if exists pgbench_history;
 drop table if exists pgbench_tellers;
 drop table if exists pgbench_accounts;
 drop table if exists pgbench_branches;
CREATE TABLE pgbench_branches (
    bid integer NOT NULL
   ,bbalance integer
   ,filler character(88)
   ,CONSTRAINT pgbench_branches_pkey PRIMARY KEY (bid)
 );
CREATE TABLE pgbench_accounts (
    aid integer NOT NULL
   ,bid integer references pgbench_branches
   ,abalance integer
   ,filler character(84)
   ,CONSTRAINT pgbench_accounts_pkey PRIMARY KEY (aid)
 );
CREATE TABLE pgbench_tellers (
    tid integer NOT NULL
   ,bid integer references pgbench_branches
   ,tbalance integer
   ,filler character(84)
   ,CONSTRAINT pgbench_tellers_pkey PRIMARY KEY (tid)
 );
CREATE TABLE pgbench_history (
    tid integer references pgbench_tellers
   ,bid integer references pgbench_branches
   ,aid integer references pgbench_accounts
   ,delta integer
   ,mtime timestamp without time zone
   ,filler character(22)
 );
\q

Now remains to insert the rows there. This was very long (about an hour) in 1.3:


time pgbench --initialize --init-steps=g -h localhost -p 5433 -U postgres franck

No foreign key error and 6 seconds only!

Do you remember that I had to switch to SERIALIZABLE isolation level? I don’t have to here:

This has been fixed with the support of SELECT locks, so no need to go to optimistic locking with SERIALIZABLE (which requires that the application implements a ‘retry’ logic).

Simple Update in 1 thread

Then, as I did before, I run a Simple Update workload from one session during 30 seconds:


pgbench --no-vacuum --builtin=simple-update --protocol=prepared --time 30 -h localhost -p 5433 -U postgres franck

When compared with the previous run on 1.3 I’ve updated 9x more transactions. Yes, that’s exactly what has been announced for this version: huge performance imprevements:

What we've been up to over the last few month – YugabyteDB 2.1 has a ton of perf improvements and a lot of other goodies! https://t.co/wwwhVG7bPd

PS: Please do give it a spin, would love your feedback.

— Karthik Ranganathan (@karthikr) February 25, 2020

Simple Update in 10 threads

Then, as I did before I’m running with 10 concurrent threads


pgbench --no-vacuum --protocol=prepared --builtin=simple-update --time 30 --jobs=10 --client=10 -h localhost -p 5433 -U postgres franck

Again, that 7x better than my test on 1.3 and still in read commited isolation level.

Note that I was lucky here but it can happen that we get a serialization error even in read commited like in this second run of the same:

The community implementation of pgbench has no retry logic. Once an error is enountered the thead finishes, and that’s unusable for a benchmark. The patch proposed many times was, unfortunately, always rejected: https://commitfest.postgresql.org/18/1645/.

But YugaByteDB has a solution for that, which I’ll show in the next post: https://blog.dbi-services.com/ysql_bench/

Cet article YugaByteDB 2.1: the Open Source multi-region distributed database with PostgreSQL API is in GA with huge performance improvement est apparu en premier sur Blog dbi services.

Collations in PostgreSQL – basics

Thu, 2020-03-05 10:44

When you work with databases sooner or later you will need to get in touch witch collations. It might be that you never needed to think about it as initdb picked a value that just worked for you and nobody in your database ever tried to use a collation that is not available in your installation. Once someone is trying that or there is the requirement to use a specific collation from the beginning you need to know at least the basics and this is the topic of this post. In contrast to Oracle, PostgreSQL relies on the operating system for collation support. As always, a demo says more than thousands words, so lets go.

I am running CentOS 8 (8.1.1911 (Core)), minimal installation plus the packages required for installing PostgreSQL from source code. To get a default environment lets create brand new user:

postgres@centos8pg:/home/postgres/ [pgdev] sudo useradd pg
postgres@centos8pg:/home/postgres/ [pgdev] sudo su - pg

Once we switch to that user and check the LANG and LC_* variable this is what we get:

[pg@centos8pg ~]$ env | grep LC
[pg@centos8pg ~]$ env | grep LANG
LANG=en_US.UTF-8

When we run initdb to create a new cluster this is what gets picked up and will become the default for the databases:

[pg@centos8pg ~]$ /u01/app/postgres/product/12/db_0/bin/initdb -D /var/tmp/pg
[pg@centos8pg ~]$ export PGPORT=9999
[pg@centos8pg ~]$ /u01/app/postgres/product/12/db_0/bin/pg_ctl -D /var/tmp/pg start
[pg@centos8pg ~]$ /u01/app/postgres/product/12/db_0/bin/psql -p 9999 postgres
psql (12.0)
Type "help" for help.

postgres=# \l
                              List of databases
   Name    | Owner | Encoding |   Collate   |    Ctype    | Access privileges 
-----------+-------+----------+-------------+-------------+-------------------
 postgres  | pg    | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 template0 | pg    | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/pg            +
           |       |          |             |             | pg=CTc/pg
 template1 | pg    | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/pg            +
           |       |          |             |             | pg=CTc/pg
(3 rows)

With PostgreSQL you can define the default collation for a database at the time you create a database so creating a new database with the same collation as the ones already existing is not an issue at all:

[pg@centos8pg ~]$ /u01/app/postgres/product/12/db_0/bin/psql -p 9999 postgres
postgres=# create database tmp with LC_COLLATE = "en_US.UTF-8";
CREATE DATABASE

The trouble starts when you want to use a collation that is not currently available from the operating system:

postgres=# create database tmp2 with LC_COLLATE = "de_DE.UTF-8";
ERROR:  invalid locale name: "de_DE.UTF-8"

At this point PostgreSQL can not create the database for you as it does not know anything about the German locale specified in the create database command. Apparently my CentOS installation does not have support for this collation. How then can you check what the operating system currently provides? The command to use on Linux flavors that use systemd is localectl:

[pg@centos8pg ~]$ localectl list-locales
C.utf8
en_AG
en_AU
en_AU.utf8
en_BW
en_BW.utf8
en_CA
en_CA.utf8
en_DK
en_DK.utf8
en_GB
en_GB.iso885915
en_GB.utf8
en_HK
en_HK.utf8
en_IE
en_IE.utf8
en_IE@euro
en_IL
en_IN
en_NG
en_NZ
en_NZ.utf8
en_PH
en_PH.utf8
en_SC.utf8
en_SG
en_SG.utf8
en_US
en_US.iso885915
en_US.utf8
en_ZA
en_ZA.utf8
en_ZM
en_ZW
en_ZW.utf8

Given that output I can only use the various “en_*” and the “C” collations but nothing else. Let’s try to use a collation from the list above:

postgres=# create database tmpX with lc_collate='en_NZ.utf8';
ERROR:  new collation (en_NZ.utf8) is incompatible with the collation of the template database (en_US.UTF-8)
HINT:  Use the same collation as in the template database, or use template0 as template.

That fails because there already could be some data in template1 which is not compatible with the locale specific in the create database command. As template0 is read only (more on the template databases here) this one should be used when you want to create database using a locale which is different from template1:

postgres=# create database tmpX with lc_collate='en_NZ.utf8' template=template0;
CREATE DATABASE
postgres=# create database tmpXX with lc_collate='C' template=template0;
CREATE DATABASE
postgres=# \l
                              List of databases
   Name    | Owner | Encoding |   Collate   |    Ctype    | Access privileges 
-----------+-------+----------+-------------+-------------+-------------------
 postgres  | pg    | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 template0 | pg    | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/pg            +
           |       |          |             |             | pg=CTc/pg
 template1 | pg    | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/pg            +
           |       |          |             |             | pg=CTc/pg
 tmp       | pg    | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 tmpx      | pg    | UTF8     | en_NZ.utf8  | en_US.UTF-8 | 
 tmpxx     | pg    | UTF8     | C           | en_US.UTF-8 | 
(7 rows)

Basically we can use every locale that the operating provides but as there is nothing for German we can not use it. The question now is how we can install support for other locales on the operating system. For Red Hat 8 or CentOS 8 based systems this is quite easy as dnf can be used to search for, and then install the language packs you need:

[pg@centos8pg ~]$ dnf search locale | grep de
Last metadata expiration check: 1:23:33 ago on Wed 04 Mar 2020 01:53:43 AM CET.
perl-Encode-Locale.noarch : Determine the locale encoding
perl-Locale-Codes.noarch : Distribution of modules to handle locale codes
perl-Locale-US.noarch : Two letter codes for state identification in the United States and vice versa
rubygem-locale.noarch : Pure ruby library which provides basic APIs for localization
glibc-langpack-de.x86_64 : Locale data for de
glibc-langpack-de.i686 : Locale data for de
langtable.noarch : Guessing reasonable defaults for locale, keyboard layout, territory, and language.

Once you’ve identified the correct package just install it:

[pg@centos8pg ~]$ sudo dnf install -y glibc-langpack-de

If we ask the operating system supported for supported locales once more, the German ones now show up:

[pg@centos8pg ~]$ localectl list-locales | grep de
de_AT
de_AT.utf8
de_AT@euro
de_BE
de_BE.utf8
de_BE@euro
de_CH
de_CH.utf8
de_DE
de_DE.utf8
de_DE@euro
de_IT
de_IT.utf8
de_LI.utf8
de_LU
de_LU.utf8
de_LU@euro

This should give PostgreSQL all it needs to create the database with the German locale. Before we proceed with creating that database have a look at the pg_collation catalog table:

postgres=# select * from pg_collation ;
  oid  |    collname     | collnamespace | collowner | collprovider | collisdeterministic | collencoding |   collcollate   |    collctype    | collversion 
-------+-----------------+---------------+-----------+--------------+---------------------+--------------+-----------------+-----------------+-------------
   100 | default         |            11 |        10 | d            | t                   |           -1 |                 |                 | 
   950 | C               |            11 |        10 | c            | t                   |           -1 | C               | C               | 
   951 | POSIX           |            11 |        10 | c            | t                   |           -1 | POSIX           | POSIX           | 
 12326 | ucs_basic       |            11 |        10 | c            | t                   |            6 | C               | C               | 
 12327 | C.utf8          |            11 |        10 | c            | t                   |            6 | C.utf8          | C.utf8          | 
 12328 | en_AG           |            11 |        10 | c            | t                   |            6 | en_AG           | en_AG           | 
 12329 | en_AU           |            11 |        10 | c            | t                   |            8 | en_AU           | en_AU           | 
 12330 | en_AU.utf8      |            11 |        10 | c            | t                   |            6 | en_AU.utf8      | en_AU.utf8      | 
 12331 | en_BW           |            11 |        10 | c            | t                   |            8 | en_BW           | en_BW           | 
 12332 | en_BW.utf8      |            11 |        10 | c            | t                   |            6 | en_BW.utf8      | en_BW.utf8      | 
 12333 | en_CA           |            11 |        10 | c            | t                   |            8 | en_CA           | en_CA           | 
 12334 | en_CA.utf8      |            11 |        10 | c            | t                   |            6 | en_CA.utf8      | en_CA.utf8      | 
 12335 | en_DK           |            11 |        10 | c            | t                   |            8 | en_DK           | en_DK           | 
 12336 | en_DK.utf8      |            11 |        10 | c            | t                   |            6 | en_DK.utf8      | en_DK.utf8      | 
 12337 | en_GB           |            11 |        10 | c            | t                   |            8 | en_GB           | en_GB           | 
 12338 | en_GB.iso885915 |            11 |        10 | c            | t                   |           16 | en_GB.iso885915 | en_GB.iso885915 | 
 12339 | en_GB.utf8      |            11 |        10 | c            | t                   |            6 | en_GB.utf8      | en_GB.utf8      | 
 12340 | en_HK           |            11 |        10 | c            | t                   |            8 | en_HK           | en_HK           | 
 12341 | en_HK.utf8      |            11 |        10 | c            | t                   |            6 | en_HK.utf8      | en_HK.utf8      | 
 12342 | en_IE           |            11 |        10 | c            | t                   |            8 | en_IE           | en_IE           | 
 12343 | en_IE@euro      |            11 |        10 | c            | t                   |           16 | en_IE@euro      | en_IE@euro      | 
 12344 | en_IE.utf8      |            11 |        10 | c            | t                   |            6 | en_IE.utf8      | en_IE.utf8      | 
 12345 | en_IL           |            11 |        10 | c            | t                   |            6 | en_IL           | en_IL           | 
 12346 | en_IN           |            11 |        10 | c            | t                   |            6 | en_IN           | en_IN           | 
 12347 | en_NG           |            11 |        10 | c            | t                   |            6 | en_NG           | en_NG           | 
 12348 | en_NZ           |            11 |        10 | c            | t                   |            8 | en_NZ           | en_NZ           | 
 12349 | en_NZ.utf8      |            11 |        10 | c            | t                   |            6 | en_NZ.utf8      | en_NZ.utf8      | 
 12350 | en_PH           |            11 |        10 | c            | t                   |            8 | en_PH           | en_PH           | 
 12351 | en_PH.utf8      |            11 |        10 | c            | t                   |            6 | en_PH.utf8      | en_PH.utf8      | 
 12352 | en_SC.utf8      |            11 |        10 | c            | t                   |            6 | en_SC.utf8      | en_SC.utf8      | 
 12353 | en_SG           |            11 |        10 | c            | t                   |            8 | en_SG           | en_SG           | 
 12354 | en_SG.utf8      |            11 |        10 | c            | t                   |            6 | en_SG.utf8      | en_SG.utf8      | 
 12355 | en_US           |            11 |        10 | c            | t                   |            8 | en_US           | en_US           | 
 12356 | en_US.iso885915 |            11 |        10 | c            | t                   |           16 | en_US.iso885915 | en_US.iso885915 | 
 12357 | en_US.utf8      |            11 |        10 | c            | t                   |            6 | en_US.utf8      | en_US.utf8      | 
 12358 | en_ZA           |            11 |        10 | c            | t                   |            8 | en_ZA           | en_ZA           | 
 12359 | en_ZA.utf8      |            11 |        10 | c            | t                   |            6 | en_ZA.utf8      | en_ZA.utf8      | 
 12360 | en_ZM           |            11 |        10 | c            | t                   |            6 | en_ZM           | en_ZM           | 
 12361 | en_ZW           |            11 |        10 | c            | t                   |            8 | en_ZW           | en_ZW           | 
 12362 | en_ZW.utf8      |            11 |        10 | c            | t                   |            6 | en_ZW.utf8      | en_ZW.utf8      | 
 12363 | en_AU           |            11 |        10 | c            | t                   |            6 | en_AU.utf8      | en_AU.utf8      | 
 12364 | en_BW           |            11 |        10 | c            | t                   |            6 | en_BW.utf8      | en_BW.utf8      | 
 12365 | en_CA           |            11 |        10 | c            | t                   |            6 | en_CA.utf8      | en_CA.utf8      | 
 12366 | en_DK           |            11 |        10 | c            | t                   |            6 | en_DK.utf8      | en_DK.utf8      | 
 12367 | en_GB           |            11 |        10 | c            | t                   |           16 | en_GB.iso885915 | en_GB.iso885915 | 
 12368 | en_GB           |            11 |        10 | c            | t                   |            6 | en_GB.utf8      | en_GB.utf8      | 
 12369 | en_HK           |            11 |        10 | c            | t                   |            6 | en_HK.utf8      | en_HK.utf8      | 
 12370 | en_IE           |            11 |        10 | c            | t                   |            6 | en_IE.utf8      | en_IE.utf8      | 
 12371 | en_NZ           |            11 |        10 | c            | t                   |            6 | en_NZ.utf8      | en_NZ.utf8      | 
 12372 | en_PH           |            11 |        10 | c            | t                   |            6 | en_PH.utf8      | en_PH.utf8      | 
 12373 | en_SC           |            11 |        10 | c            | t                   |            6 | en_SC.utf8      | en_SC.utf8      | 
 12374 | en_SG           |            11 |        10 | c            | t                   |            6 | en_SG.utf8      | en_SG.utf8      | 
 12375 | en_US           |            11 |        10 | c            | t                   |           16 | en_US.iso885915 | en_US.iso885915 | 
 12376 | en_US           |            11 |        10 | c            | t                   |            6 | en_US.utf8      | en_US.utf8      | 
 12377 | en_ZA           |            11 |        10 | c            | t                   |            6 | en_ZA.utf8      | en_ZA.utf8      | 
 12378 | en_ZW           |            11 |        10 | c            | t                   |            6 | en_ZW.utf8      | en_ZW.utf8      | 
(56 rows)

You will not see any of the collations you installed after the cluster was created. To actually make them visible in the catalog you need to manually add the collations from operating system:

postgres=# select pg_import_system_collations('pg_catalog');
 pg_import_system_collations 
-----------------------------
                          24
(1 row)

You actually can create databases with new collations without importing them into the catalog table but we recommend to keep the catalog and the collations provided by the operating system in sync. Asking the catalog table once more the new “de_* collations will show up:

postgres=# select * from pg_collation where collname ~ '^de_';
  oid  |  collname  | collnamespace | collowner | collprovider | collisdeterministic | collencoding | collcollate | collctype  | collversion 
-------+------------+---------------+-----------+--------------+---------------------+--------------+-------------+------------+-------------
 16386 | de_AT      |            11 |        10 | c            | t                   |            8 | de_AT       | de_AT      | 
 16387 | de_AT@euro |            11 |        10 | c            | t                   |           16 | de_AT@euro  | de_AT@euro | 
 16388 | de_AT.utf8 |            11 |        10 | c            | t                   |            6 | de_AT.utf8  | de_AT.utf8 | 
 16389 | de_BE      |            11 |        10 | c            | t                   |            8 | de_BE       | de_BE      | 
 16390 | de_BE@euro |            11 |        10 | c            | t                   |           16 | de_BE@euro  | de_BE@euro | 
 16391 | de_BE.utf8 |            11 |        10 | c            | t                   |            6 | de_BE.utf8  | de_BE.utf8 | 
 16392 | de_CH      |            11 |        10 | c            | t                   |            8 | de_CH       | de_CH      | 
 16393 | de_CH.utf8 |            11 |        10 | c            | t                   |            6 | de_CH.utf8  | de_CH.utf8 | 
 16394 | de_DE      |            11 |        10 | c            | t                   |            8 | de_DE       | de_DE      | 
 16395 | de_DE@euro |            11 |        10 | c            | t                   |           16 | de_DE@euro  | de_DE@euro | 
 16396 | de_DE.utf8 |            11 |        10 | c            | t                   |            6 | de_DE.utf8  | de_DE.utf8 | 
 16397 | de_IT      |            11 |        10 | c            | t                   |            8 | de_IT       | de_IT      | 
 16398 | de_IT.utf8 |            11 |        10 | c            | t                   |            6 | de_IT.utf8  | de_IT.utf8 | 
 16399 | de_LI.utf8 |            11 |        10 | c            | t                   |            6 | de_LI.utf8  | de_LI.utf8 | 
 16400 | de_LU      |            11 |        10 | c            | t                   |            8 | de_LU       | de_LU      | 
 16401 | de_LU@euro |            11 |        10 | c            | t                   |           16 | de_LU@euro  | de_LU@euro | 
 16402 | de_LU.utf8 |            11 |        10 | c            | t                   |            6 | de_LU.utf8  | de_LU.utf8 | 
 16403 | de_AT      |            11 |        10 | c            | t                   |            6 | de_AT.utf8  | de_AT.utf8 | 
 16404 | de_BE      |            11 |        10 | c            | t                   |            6 | de_BE.utf8  | de_BE.utf8 | 
 16405 | de_CH      |            11 |        10 | c            | t                   |            6 | de_CH.utf8  | de_CH.utf8 | 
 16406 | de_DE      |            11 |        10 | c            | t                   |            6 | de_DE.utf8  | de_DE.utf8 | 
 16407 | de_IT      |            11 |        10 | c            | t                   |            6 | de_IT.utf8  | de_IT.utf8 | 
 16408 | de_LI      |            11 |        10 | c            | t                   |            6 | de_LI.utf8  | de_LI.utf8 | 
 16409 | de_LU      |            11 |        10 | c            | t                   |            6 | de_LU.utf8  | de_LU.utf8 | 
(24 rows)

Creating the new database using one of “de_*” collations does work now:

postgres=# create database tmp2 with LC_COLLATE = "de_DE.UTF-8" template=template0;
CREATE DATABASE
postgres=# \l
                              List of databases
   Name    | Owner | Encoding |   Collate   |    Ctype    | Access privileges 
-----------+-------+----------+-------------+-------------+-------------------
 postgres  | pg    | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 template0 | pg    | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/pg            +
           |       |          |             |             | pg=CTc/pg
 template1 | pg    | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/pg            +
           |       |          |             |             | pg=CTc/pg
 tmp       | pg    | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 tmp2      | pg    | UTF8     | de_DE.UTF-8 | en_US.UTF-8 | 
 tmpx      | pg    | UTF8     | en_NZ.utf8  | en_US.UTF-8 | 
 tmpxx     | pg    | UTF8     | C           | en_US.UTF-8 | 

With PostgreSQL you can also specify the collation on a column level, so by now we can have a column using an “en” location and another one using a “de” collation:

postgres=# create table t1 ( a text COLLATE "de_DE.utf8"
postgres(#                 , b text COLLATE "en_US.utf8"
postgres(#                 );
CREATE TABLE
postgres=# \d t1
                Table "public.t1"
 Column | Type | Collation  | Nullable | Default 
--------+------+------------+----------+---------
 a      | text | de_DE.utf8 |          | 
 b      | text | en_US.utf8 |          | 

If you want to use another collation which is currently not provided by the operating system the same rules apply as above:

postgres=# create table t1 ( a text COLLATE "de_DE.utf8"
postgres(#                 , b text COLLATE "en_US.utf8"
postgres(#                 , c text COLLATE "fr_FR.utf8"
postgres(#                 );
ERROR:  collation "fr_FR.utf8" for encoding "UTF8" does not exist
LINE 3:                 , c text COLLATE "fr_FR.utf8"

You need to install support for that before you can use it. The same applies for queries, of course:

postgres=# SELECT a < b COLLATE "de_DE" FROM t1;
 ?column? 
----------
(0 rows)

postgres=# SELECT a < b COLLATE "fr_FR" FROM t1;
ERROR:  collation "fr_FR" for encoding "UTF8" does not exist
LINE 1: SELECT a < b COLLATE "fr_FR" FROM t1;

That’s it for the basics. More to come in another post.

Cet article Collations in PostgreSQL – basics est apparu en premier sur Blog dbi services.

Pages