Скрытые параметры essbase в 11.1.2.4

В процессе обновления essbase до 11.1.2.4 (c 11.1.2.3.500) столкнулись с несколькими проблемами. Хочу поделиться частью из них.

Во-первых, это (вполне закономерно) новый параметр QUERYRESULTLIMIT. Значение по-умолчанию 1,000,000 и этого слишком мало. Параметр ограничивает количество возвращаемых запросом строк. Кажется, что 1,000,000 – это уже очень много и столько данных форма показать не способна. Проблема здесь в том, что это прогноз возвращаемых данных ещё до выполнения запроса. В нашем случае этот прогноз давал неверные оценки (в реальности возвращалось несколько чисел), так что пришлось увеличить значение параметра до 3,000,000.

Во-вторых, гибридный режим вычислений. Узнать о том, что essbase производит вычисления в гибридном режиме, можно по записям в его логе

Hybrid Aggregation Mode enabled.
MaxL DML Execution Elapsed Time : [0.23] seconds

Однако гибридный режим работы в релизе 11.1.2.3.500 поддерживает не все функции. И после вхождения в гибридный режим в случае использования такой функции essbase может из этого режима выйти и дальнейшие вычисления производить классически. Но в лог уже об этом ничего не пишет.
У нас была одна такая форма, где использовалась shared иерархия с функциями, т.к. считалась численность:


SELECT {{CROSSJOIN({[Period].[DEC],[Period].[YearTotal]},CROSSJOIN({[Level].[LV_Total]},{[Currency].[RC_RUB]}))}} ON COLUMNS,
NONEMPTYBLOCK {CROSSJOIN({Descendants([BudOwnerCONS].[BO_PnL],[BudOwnerCONS].Levels(0))},CROSSJOIN({Descendants([LegalEntity].[LE_PnL],[LegalEntity].Levels(0))},CROSSJOIN({Descendants([CostCenterCONS].[CC_PnL],[CostCenterCONS].Levels(0))},CROSSJOIN({Descendants([Add].[Add_PnL],[Add].Levels(0))},CROSSJOIN({Descendants([Program].[PRG_PnL],[Program].Levels(0))},CROSSJOIN({Descendants([Service].[SE_PnL],[Service].Levels(0))},{Descendants([Reserved].[LO_PnL],[Reserved].Levels(0))}))))))} ON ROWS
FROM H_ALLOC.H_ALLOC
WHERE ([Years].[FY16],[Scenario].[BUD],[Version].[VR_Work],[AccountCONS].[XSTATHEADT],[Channel].[CH_C0],[LegalEntityICP].[ICO_0000],[ReservedALLOC].[R2_ALLOC_NA],[Contractor].[CO_NONE],[Detail].[DET_NA])

Узнали мы об этой особенности работы гибридного режима уже только после перехода на 11.1.2.4🙂 В этом релизе список функций, поддерживающих гибридный режим, был существенно расширен. Соответственно система переключалась в гибрид и там и оставалась! Запрос продолжал выполняться бесконечно долго (при этом данных он не возвращал – специально сделали такой test case для правильного описания ситуации в Service Request).
После всестороннего анализа получили такую вот интересную рекомендацию от support: выставить значение параметра

__QPDF_SIMU app_name db_name 128

Это скрытый параметр в essbase! В oracle database мы уже все привыкли к скрытым параметрам, но в essbase я такое встречаю впервые🙂
Технически проблема с запросом была в сочетании факторов: в этом MDX потенциально могло быть до 7,7*10^13 ячеек, а также была простая формула для [Period].[YearTotal] плюс cross-dimensional зависимости с измерением Account. В блоках данных в действительности данных не было и формула вычислялась в пустое множество. Значение параметра специфично для нашего outline (т.е. правильное значение может подсказать только support), описания его я не нашел, но, насколько я могу понять, это лимиты размерности количества ячеек для переключения между разными режимами обработки (гибрид и классический).
Небольшое исследование файлов *.so показало. что такой параметр в essbase не один:

# strings /opt/hyperion/EPMSystem11R1/products/Essbase/EssbaseServer/bin/libesscfgu.so | grep __
__ASOFRMLQRYMEM
__ASOFRMLSUBQRYMEM
__ASOFRMLINDXTHRESHOLD1
__ASOFRMLINDXTHRESHOLD2
__ASOFRMLINDXMEMPCT
__ALLOWSMARTCOMPRESSIONCHOICE
__ASO__DISABLE_CORRECTION
__ASOANONYMOUSEXPORT
__ASOCACHEOVERFLOWPERCENT
__COMPRESSIONSTATSFILE
__DLASOFLUSHASYNC
__HEALTHMONITORCRASHAGENT
__MAINTENANCEWARNING
__NO_CALCPARALLEL__
__NOFULLSECFILEWRITE
__NOLOGOUTATRESTRUCT
__NOWARNINGNEWERAPIVERSION
__QPDF_ARR_LIMIT
__QPDF_REVERT_NAMES_FILE
__QPDF_TRACE
__QPDF_SIMU
__RECORDKERNELQUERIES
__SECFILEFAULTINJECTION
__SERVERTHREADS
__SM__ADAPTIVE_PAGE_SPLIT
__SM__DUAL_FILE_OPEN
__SM__HIGH_DIRTY_BLOCK_THRESH
__SM__LOW_DIRTY_BLOCK_THRESH
__SM__MIN_DAT_FRAG_SIZE
__SM__NO_TRANS_MODE
__SM__PAGE_PRESPLIT
__SM__PRE_INIT_BLOCK_RATIO
__SM__SERIAL_MODE
__SM__SWAP_DAT_PH_BUFFERS
__SM__UKRAINE
__USRLICCHECKOUTATSTART
__BSOKRNL1019019__
__UDAOPT__
__SSL_NETDELAY
__NUMBLOCKSTORESERVE__
__ENABLEXREFOPTIMIZATION__
__DLABORTHIGHEXP
__LARGENUMAPAGE
__DISABLELOGMSGINDEXWRITE__
__NODEADLOCKDETECTTIMEOUT__
__HEALTHMONITORCRASHAGENTTHRESHOLDTIME
__NO_SECFILEWRITE_TILLSHUTDOWN__
__NO_SECFILEWRITEFULL_ON_SHUTDOWN__
__SM__HIGH_DIRTY_PHBLOCK_THRESH
__SM__LOW_DIRTY_PHBLOCK_THRESH

Не уверен, что все их можно менять через essbase.cfg. Но для __QPDF_SIMU есть даже специальный параметр __QPDF_TRACE – вероятно, для отладки. Правда правильно воспользоваться им не получилось.

В-третьих, в тестинге (уже практически перед установкой в production) обнаружили, что при ad-hoc анализе некоторые данные отображаются некорректно: неверно считаются total либо данные показываются не в тех строках. Причем проявляется это весьма странно: запрос не всегда отрабатывает некорректно, а после определенного набора действий. Оказалось, что всему виной этот скрытый параметр и его пришлось исключить. Завели Bug “DYNAMICALLY CALCULATED MEMBERS ARE RETRIEVING DIFFERENT VALUES BETWEEN REFRESH”, а иерархию пришлось переделывать, убрать функции из проблемных мест.

Такая вот история пока что без Happy End.

P.S. Большое спасибо слать Игорю Хатько и Георгию Лихолетову, которые раскопали все это.

Скрытые параметры essbase в 11.1.2.4

DBMS Cost-Based Optimizers: Back Story and State of the Art

Query optimizers for relational databases have come a long evolutionary way to grow into complicated algorithms for assessing the costs of various options. However, the new generation databases pay almost zero attention to a query cost. What is it, a step back or two steps ahead? Do we even need new cost optimizers for the big data world at all?

В журнале Открытые системы №1, 2016 вышла моя статья Стоимостные оптимизаторы для СУБД: вчера и сегодня

В ней я пытаюсь рассуждать о стоимостной оптимизации в Oracle, Vertica и SparkSQL. Кажется, в области стоимостной оптимизации предстоит решить ещё множество интересных задач.

DBMS Cost-Based Optimizers: Back Story and State of the Art

iSCSI target for Oracle RAC cluster with SSD and 10 Gb Ethernet

It’s already an old one story🙂 More then year ago we was ripe for migrating from SATA to new SSD disk storage. The problem was our infrastructure wasn’t ready – I mean when I launched local performance measure test it showed 180K IOPS (in 8192 blocks) whilst remote (using iSCSI) performance measure test showed only 18K IOPS. 10 times is altogether bad.

After some investigation we found the bottleneck was iSCSI target. And since IET hasn’t being developed any more we decided to move out to SCST. Below is our experience with getting the maximum performance using SCST as iSCSI target.

1. About the measurements.

We used ORION for testing IO throughout. How to do this has been skilfully explained by Alex Gorbachev.

I made a bunch of experiments but executing just a single test appeared to be enough:


orion -run advanced -duration 60 -matrix row -num_large 0 -num_disks 300 -simulate raid0 -type rand

It lasts for about 96 minutes and gradually (in reality logarithmically from 1 to 1500) increases a number of parallel workers so we could see:
1. What is the throughput of a single session
2. What is the maximum throughput
3. The system behaviour: how many parallel session needs for reaching the peak, what happens after reaching the peak, what is the velocity of changing and so on

Other tests of course might be very useful too but in my case have added nothing new.

2. System-wide settings

Migrating from IET to SCST increased throughput from 18K IOPS to 30K IOPS. Good but not enough. The next step was in tuning OS and hardware devices. The similar was described in Tuning 10Gb network cards on Linux

Below are the results of my experiments

img1

On the pictuire:

  • Y-axis is the overall throughput in 8K blocks;
  • X-axis is the amount of parallel read processes;
  • Green line – disk performance from one host without any changes, roughly 30K IOPS;
  • Red and blue lines – disk performance after changing the queue and scheduler settings (see for example tuning noop scheduler) for SSD devices on a target nodes, about 45K IOPS;
  • Yellow line – the same as red + changing the ixgb driver settings, increase the number of parallel processing queues, RSS and VMDQ (see details in README in the latest ixgbe Intel driver). As you can see the peak performance is the same but for small amount of parallel processes (less than 20) performance is higher on 5-10K IOPS, so we could state that the system is more responsive;
  • Brown line – same as yellow + NUMA affinity setttings (see for example IRQ affinity and CPU affinity) , as you can see performance is only slightly better, practically the same 45K IOPS;
  • Sky Blue – the total performance from 2 initiator hosts and 1 target host. Peak performance is 80K IOPS but periodically falls down to 60K IOPS;
  • Black line – “raw” performance from target host, without iSCSI, just for comparison.

3. Tuning iSCSI initiator

The difference between raw and iSCSI devices is almost 2 times, it’s really huge. After a series of experiments I did the test from target, when host is both initiator and target and got the same results. It was strange (network wasn’t used and a such degradation), I was confused and began checking all settings for each component. And at this moment I found post Bart Van Assche about parallel sessions and everything fell into place. My problem was not in target but in initiator. According to open-iscsi site maximum performance for one session is 50K IOPS. I’ve read it but didn’t relate to my configuration – as you could see in previous graph I’ve reached even 60K IOPS, but it was a theoretical limit for one open-iSCSI session. Cure is simple, I just changed my scst config to using multiple targets (and thus sessionsб 3 disks per 1 session):


TARGET iqn.2001-04.com.yandex:targettest01e_1.yandex.ru {
LUN 0 T01L200
LUN 1 T01L201
LUN 2 T01L202
}

TARGET iqn.2001-04.com.yandex:targettest01e_2.yandex.ru {
LUN 0 T01L203
LUN 1 T01L204
LUN 2 T01L205
}

And it increased IOPS twice to 100K IOPS from one host. Below is the new series of tests (sorry for unnecessary lines I made this graph for internal usage)

img2

On a graph:

  • Y-axis is the overall throughput in 8K blocks;
  • X-axis is the amount of parallel read processes;
  • Blue and Red lines – “raw” performance from target host, without iSCSI, just for comparison;
  • Green and Yellow lines – multiple sessions from one host, about 80K IOPS;

4. Changing linux kernel

It is need to mention the last strange thing: the overall instability. During the measurements I saw that net performance varied from 4 to 8 Gb/s from time to time and the similar is being watched on the graph above (Green and Yellow lines). Surprisingly, it was solved as easily as in a case of iSCSI initiator. I just changed kernel from UEK R2 (2.6.39.400) to UEK R3 (3.8.13) and you can see

  • Sky Blue – performance with UEK R3, about 100K IOPS.

To sum up, we got 100K IOPS from one host over iSCSI – it’s 8 Gb/s, I think for 10 Gb Ethernet it’s very close to the best result.

iSCSI target for Oracle RAC cluster with SSD and 10 Gb Ethernet

Patches for 12c Oracle Database in EBS environment

After migration to Oracle Database 12c and OeBS 12.2 we’ve faced a strange problem with searching database patches.

I’m used to a simple paradigm: we have our own set of patches. It could be a huge list but if we meet a problem it happens quite rarely and I had a plenty of time to investigate it.

But in 12.2 everything has changed. Now we have 2 set of patches. First is the old set that comes from our peculiarities – customizations, hardware and workload. Second is the new set which is recommended by OeBS development team (see Oracle E-Business Suite Release 12.2: Consolidated List of Patches and Technology Bug Fixes (Doc ID 1594274.1) ).

The problem is in an overall number of patches:

1. Our own list of bugs consists of about 10 items (among them are mainly patches for wrong query results (like 18650065: WRONG RESULTS ON QUERY WITH SUBQUERY USING OR EXISTS ) and RAC-specific issues (like 19124336 – ORA-600 [kjblcwscn:!wr]can occur when handling a stale but valid write completion) ).
2. List of OeBS patches (that should be installed too) from Doc ID 1594274.1 consists of 34 bugs and 22 patches (for 12.1.0.2).
3. In addition to that 4 times a year is released Patch Set Update. It’s highly advisable install it (and in our case I can’t imagine how we would maintain our system without PSU).

All would be nothing, but some of the patches conflicts each other or PSU (and some conflicts both). So in order to meet all the requirements I should install PSU, all patches that haven’t been included in PSU, overlay patches that conflict PSU, merge patches that conflict each other and overlay merge patches that conflicts all of them.

I’ve made it once – for April 2015 PSU – checked each bug, find all patches and opened a couple of SR for missing ones.

After half a year I returned to the task and was horrified – number of bugs increased twice! I commenced to work with a list of bugs but this time was fortunate enough – in a one SR met engineer who gave me the proper list of links. Here it is:

1. There is no need to search recommended OeBS patches for each PSU by yourself. Just read Database Patch Set Update Overlay Patches Required for Use with PSUs and Oracle E-Business Suite ( Doc ID 1147107.1 ) – in my case all patches was in ‘Table 1 Release 12.1.0.2 PSUs and Patch Lists for EBS r12.2 customers not using the In-Memory option’

2. There is no need to check list of patches on conflicts by hand. My Oracle Support has Conflict Checker Tool that could be used not only for analyzing but also for requesting patches. See How to Use the My Oracle Support Conflict Checker Tool for Patches Installed with OPatch [Video] ( Doc ID 1091294.1 )

So I installed October 2015 PSU, get my oracle inventory, list of patches from set 1. and Doc ID 1147107.1, perform “Analyze for Conflicts” steps, click “Request Patch” and made special query for missing merge overlay patch. I didn’t expect, but my merge patch was ready next day! And it’s all, I just downloaded patches from Download table and installed it in my system.

Patches for 12c Oracle Database in EBS environment

The necessity of right tool for checking Coherence cluster configuration

I faced with a quite interesting problem after upgrading Coherence in my Oracle Access Manager cluster.

The issue appeared right after upgrading, all log files of OAM cluster were full of messages like:


<2015-08-28 12:37:09.878/1513.280 Oracle Coherence GE 3.7.1.13 (thread=Cluster, member=n/a): Delaying formation of a new cluster; IpMonitor failed to verify the reachability of senior Member(Id=1, Timestamp=2015-08-26 13:38:36.265, Address=xxx.xxx.xxx.xxx:9095, MachineId=3353, Location=site:,machine:oam1,process:12886, Role=WeblogicServer); if this persists it is likely the result of a local or remote firewall rule blocking either ICMP pings, or connections to TCP port 7>

At a first glance, it’s well known issue and relates in a some kind of network problems as described in Doc ID 1530288.1 IpMontor Failed To Verify The Reachability Of Senior Member. The only problem that I absolutely sure there is no firewall between coherence instances and all hosts in a cluster are reachable. Of course I’ve made all recommended tests: java ping test from Doc ID 1936105.1, multicast test from Doc ID 1936452.1, datagram test from 1936575.1.

I’ve even written a special program for checking whether method InetAddress.isReachable() works or not (as it’s described in What Is The Purpose Of IpMonitor In A Coherence Cluster? ( Doc ID 1526745.1 ))

It’s quite simple:

import java.net.*;

public class alive{
public static void main(String args[]){
try{
InetAddress ia = InetAddress.getByName("oam2");;
boolean b = ia.isReachable(10000);
if(b){
System.out.println("Reachable");
}
else{
System.out.println("Unreachable");
}

}catch(Exception e){
System.out.println("Exception: " + e.getMessage());
}
}
}

What I tried is to check availability of host oam2 from host oam1. And it worked.


[oracle@oam2 ~]$ /opt/oracle/jrockit/bin/java alive
Reachable

Also I colud see icmp packets in tcpdump:


12:32:40.656815 IP (tos 0x0, ttl 64, id 16265, offset 0, flags [DF], proto ICMP (1), length 72)
oam1 > oam2: ICMP echo request, id 25757, seq 1, length 52
12:32:40.656942 IP (tos 0x60, ttl 64, id 49076, offset 0, flags [none], proto ICMP (1), length 72)
oam2 > oam1: ICMP echo reply, id 25757, seq 1, length 52

The particularly strange was I couldn’t see any network packets (nor icmp nor tcp port 7) from working oam1 server whereas out file was full of error messages “IpMonitor failed to verify the reachability of senior Member”.

So OAM_Server seemed not to have sent network packets (I mean IpMonitor, there was a lot of traffic between servers on the other ports) at all.

After some investigation I found out a couple of interesting strings in a strace output:


27348 socket(PF_INET, SOCK_RAW, IPPROTO_ICMP) = -1 EPERM (Operation not permitted)
27348 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 665
27348 bind(665, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.128.20.100")}, 16) = 0
27348 connect(665, {sa_family=AF_INET, sin_port=htons(7), sin_addr=inet_addr("xxx.xxx.xxx.xxx")}, 16

Here you can see that program tried to send network packets to public address from a private address. In that moment it became clear the matter is in our specific network configuration.

But at first, let’s reproduce the issue. My code was wrong:


InetAddress ia = InetAddress.getByName("oam2");;
boolean b = ia.isReachable(10000);

And wrong was all coherence tests (like java ping test)
It is not how ipMonitor calls isReachable method. In Coherence 3.7.1.19 it selects network interface for sending a ping packet:


InetAddress ia = InetAddress.getByName("oam2");;
NetworkInterface netIntfc = NetworkInterface.getByInetAddress(InetAddress.getByName("oam1")) ;
boolean b = ia.isReachable(netIntfc, 0 , 10000);

Let’s see in jdk source code, I mean InetAddress source code and isReachable method implementation.
Here if we launch isReachable and pass the parameter NetworkInterface (as it does Coherence) it’ll take the first IPv4address found on interface and uses it as a source address.


cat ./java/net/Inet4AddressImpl.java
class Inet4AddressImpl implements InetAddressImpl {
...
private native boolean isReachable0(byte[] addr, int timeout, byte[] ifaddr, int ttl) throws IOException;
...
public boolean isReachable(InetAddress addr, int timeout, NetworkInterface netif, int ttl) throws IOException {
byte[] ifaddr = null;
if (netif != null) {
/*
* Let's make sure we use an address of the proper family
*/
java.util.Enumeration it = netif.getInetAddresses();
InetAddress inetaddr = null;
while (!(inetaddr instanceof Inet4Address) &&
it.hasMoreElements())
inetaddr = it.nextElement();
if (inetaddr instanceof Inet4Address)
ifaddr = inetaddr.getAddress();
}
return isReachable0(addr.getAddress(), timeout, ifaddr, ttl);
}

It works on windows (where interface could have only addresses from one network), but Linux kernel has source routing. We have 2 addresses in one interface: public and private. Jdk took first address it found (I checked, the same behavior is in last jdk 8u60), and in our case it was private address. And tried to send packets in a public network. Such types of network packets normally drop by kernel (see Reverse Path Filtering http://tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.kernel.rpf.html) because they are meaningless. Programs should either do not use source address or use the right public address for send packets.

As a workaround I added rule for rewriting source addresses of those packets.

I’m not sure whether it’s bug or feature. And also I think our case is quite unique. But it’s once again reminded me the simple truth: if you test something you should do it that way as you do it in a production.

P.S. During investigation I realized that OAM Coherence is not synonymous with the Oracle Coherence (see Doc ID 1579874.1 Questions about OAM 11g Coherence Configuration)🙂 It has their own settings in oam-config.xml, they difference from Oracle Coherence settings.

P.P.S. Java could send icmp requests instead of tcp packets. You could read about it at stackoverflow or at stackexchange. We don’t use it.

The necessity of right tool for checking Coherence cluster configuration

ACFS and OeBS 12.2

I know it’s an unsupported configuration (and what’s a supported configuration then?) but if you use ACFS as a shared filesystem for OeBS 12.2 you might face with some strange problem by using new adop functionality.

I run adop phase=fs_clone and get the error on my system:

ERROR while running FSCloneApply...
ERRORMSG: /opt/acfs/R12.2/fs1/EBSapps/comn/adopclone_appltest01e/bin/adclone.pl did not go through successfully.

[APPLY PHASE]
AutoConfig could not successfully execute the following scripts:
ouicli.pl INSTE8_APPLY 1

ERROR: RC-50013: Fatal: Instantiate driver did not complete successfully.
/opt/acfs/R12.2/fs2/EBSapps/10.1.2/appsutil/driver/regclone.drv
ERROR: RC-50004: Fatal: Error occurred in ApplyAppsTechStack:
RC-50013: Fatal: Failed to instantiate driver /opt/acfs/R12.2/fs2/EBSapps/10.1.2/appsutil/driver/regclone.drv

The truly error message could be found in ohclone.log:

Executing command sh -c "/opt/acfs/R12.2/fs2/EBSapps/10.1.2/oui/bin/runInstaller -printdiskusage -ignoreDiskWarning -debug -clone -silent -force -nolink -waitForCompletion -invPtrLoc /etc/oraInst.loc session:ORACLE_HOME=/opt/acfs/R12.2/fs2/EBSapps/10.1.2
...
Error returned: Value too large for defined data type
There is not enough space on the volume you have specified. Oracle Universal Installer has detected that you currently have 0 MB available on the chosen volume. 750 MB of space is required for the software.

It helps but not a lot. We could find document on MOS: rapidwiz File System Upgrade Fails With “There is not enough space on the volume you have specified.” Even Though There Is Sufficient Space (Doc ID 1942808.1)
There it was recommended to decrease size of NFS-mounted disk from 36 Tb to 2 Tb. The problem is I have 7 Tb ext4 volume where all works great. It looks like ACFS-specific issue.

After a couple of days investigation I noticed some strange strings in the strace output:

23174 statfs("/opt/acfs/R12.2/fs2/EBSapps/10.1.2", 0x5621f54c) = -1 EOVERFLOW (Value too large for defined data type)
23174 write(2, "Error returned: Value too large for defined data type\n", 54) = 54

So the error message seems to have gotten after statfs system call. I reproduced the issue by using a simple C code:

#include
#include
#include
#include
#include
#include
#include

main(int argc, char **argv) {
char fn[]="/opt/acfs/R12.2/fs2/EBSapps/10.1.2";
char *init_d;
struct statfs info;
int file_descriptor;
init_d = *++argv;
printf("Filesystem to check: %s\n", init_d);
file_descriptor = statfs(init_d, &info);
int errsv = errno;
printf(" Result: %d error string: %s\n", file_descriptor,strerror(errsv));
printf(" Type of filesystem: %08x\n", (int) info.f_type);
printf(" Optimal transfer block size: %lld\n", (long long) info.f_blocks);
printf("Total data blocks in filesystem: %lld\n", (long long) info.f_blocks);
printf(" Free blocks in filesystem: %lld\n", (long long) info.f_bfree);
printf(" Free blocks available: %lld\n", (long long) info.f_bavail);
printf(" Total file nodes in filesystem: %d\n", (int) info.f_files);
printf(" Free file nodes in filesystem: %d\n", (int) info.f_ffree);
printf(" Maximum length of filenames: %d\n", (int) info.f_namelen);
printf(" Fragment size: %d\n", (int) info.f_frsize);
printf(" Mount flags of filesystem: %d\n", (int) info.f_flags);
}

This code should be compiled as a 32-bit application (in a 64 bit mode it works well).

The error output:

[oracle@appltest01e.oebs.yandex.net ~]$ ./test_statfs /opt/acfs
Filesystem to check: /opt/acfs
Result: -1 error string: Value too large for defined data type
Type of filesystem: 00000000
Optimal transfer block size: 1224734968
Total data blocks in filesystem: 1224734968
Free blocks in filesystem: 1
Free blocks available: 13238272
Total file nodes in filesystem: 0
Free file nodes in filesystem: 15774463
Maximum length of filenames: 20468
Fragment size: 1226387928
Mount flags of filesystem: 0

The real problem here is the number of inodes. I compared ext4 and acfs and the number of inodes in our ext4 filesystems are usually small and could be written in unsigned longint:

$ df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/md1 242872320 5583659 237288661 3% /opt

whereas for acfs:

$ df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/asm/acfs-116 4294967296 836130432 3458836864 20% /opt/acfs

The differences are very lucidly explained in Doc ID 2026700.1 Does ACFS Use inode architecture?:

ACFS filesystems does not use the inode architecture.
ACFS filesystems does not have a pre-allocated inode table.
Therefore, “df -i” command returns the number of inodes that are theoretically possible given the space remaining.
On ACFS filesystems, the inode table grows dynamically. Any free storage is eligible for inode creation.

The number of inodes in my case (4294967296) cannot be stored as unsigned longint. And it is twice more than disk volume size.

So we just resized ACFS volume

acfsutil size -100G /opt/acfs

Now all works fine. There is only one unpleasant feature – 2 Tb limit of disk volume. Hope OeBS team will change their mind about ACFS support someday and we will be able to store all the data in a one volume.

ACFS and OeBS 12.2

Upgrade Identity Management

It was the spring when we upgraded our IDM system for EBS authentication. Now a few months have passed and all seems to have been working fine so I’d like to share a brief notes about issues we met and a final configuration.

We have a classical for identity management 3-Nodes configuration and now we’re running on:
OEL6 with UEK R3 kernel
Oracle Database 12.1.0.2 + PSU3
Java 1.7_75
Weblogic 10.3.6.11
OAM 11.1.2.2 + PSU5
Webgate 11.1.2.2
OID/OVD 11.1.1.7 + PSU3
AccessGate 1.2.3.4

The most difficult (for me) question was where could I find documentation for upgrading? Here what I collected:
Database upgrade guide
Oracle® Fusion Middleware Upgrade Guide for Oracle Identity and Access Management 11g Release 2 (11.1.2.2.0)
Patch Management of an Oracle Identity Management Deployment
OAM Bundle Patch Release History [ID 736372.1]
How to Upgrade OID/ OVD 11.1.1.5 To 11.1.1.7 (IDM PatchSet 6) (Doc ID 1962045.1)
Considerations When Applying Patch Sets to FMW 11g Release 1 Identity Management (Doc ID 1298815.1)
Master Note on Fusion Middleware Proactive Patching – Patch Set Updates (PSUs) and Bundle Patches (BPs) (Doc ID 1494151.1)
Oracle Internet Directory (OID) Version 11.1.1.7 Bundle Patches For Non-Fusion Applications Customers (Doc ID 1614114.1)
Integrating Oracle E-Business Suite Release 12 with Oracle Access Manager 11gR2 (11.1.2) using Oracle E-Business Suite AccessGate [ID 1484024.1]

I cannot say that there weren’t any problems at all but the vast majority of SR I opened were on OAM. They were mostly related to misconfigurations in oam-config.xml file. Either because of upgrade or because of old errors. All this is our local particularity and not so interesting.

All works great in a test environment but the first day after upgrade production system was quite strained:

1. It appears that index IDX_JPS_PARENTDN in schema OID_OPSS was replaced by composite index. The old non-unique index was extended by unique key entity_id. This led to that the index grew from 300 to 600 MB. Explain plans also changed. All still works fine for a single client session. But when clients commenced connect at the same time execution time of search queries increased significantly (because of heavy disk IO) and method “ldapbind” was timing out. I recreated the old index and the majority of problems has gone.

2. Some queries still used the new one (composite) index. After some investigation we noticed too high cost of their explain plans. It was new 12c query re-optimization functionality named “SQL Plan Directives”. Igor Usoltsev wrote about it. We just disabled those SQL Directives.

3. Http GET response significantly increased. For some type of browsers (IE11 for example) it became more than 8K. We terminated SSL/TLS at a proxy side. And proxy (we use nginx) drops large packets by default. Large packets could get from both client and server sites. For nginx we increased buffer sizes (there are several settings, one for client side and other for server).

P.S. It would be useful also read this community thread. We haven’t faced with Coherence issues yet, our 3.7.1.1 versions works fine but folks reports that 3.7.1.19 is more stable.

Upgrade Identity Management