Feed aggregator

performance impact of CLOB's in table

Tom Kyte - Sat, 2017-08-19 17:46
Hi, If I create a table of 20 columns and 2 of them are clob datatype. It seems that selection from that table is very slow even I am not selecting the clob data. Is it possible. Any example to check this.
Categories: DBA Blogs

performance tuning

Tom Kyte - Fri, 2017-08-18 23:26
Hi , My question is about performance tuning , i faced issue where most of session was waiting on enq:tm contention around 80%+ timing was utilize for this wait event and rest was enq:tx row lock contention blocking session in my ash report ...
Categories: DBA Blogs

How to eliminate TABLE ACCESS FULL

Tom Kyte - Fri, 2017-08-18 23:26
> I am running following query : select hh.cash_receipt_id, hh.gl_date, hh.amount, hh.status, hh.account_code_combination_id, h...
Categories: DBA Blogs

Sql Plan Managment (SPM) 12c - Evolve Procedure stuck on some SQL's

Tom Kyte - Fri, 2017-08-18 23:26
Hi Tom, we introduced SPM since 11g and now migrated to 12c (where SPM get envolved during the main. window) Our evolve procedure stuck however on some certain statements so that after 4 hrs (during of the main window) the SPM EVOLVE get interrupted...
Categories: DBA Blogs

Character conversion from one character set to another

Tom Kyte - Fri, 2017-08-18 23:26
How to know whether utf16 to utf8 character conversion allowed using convert function, similarly F7DEC to US7ASCII etc. ?
Categories: DBA Blogs

Unsupported type PL/SQL datatype in dot net program

Tom Kyte - Fri, 2017-08-18 23:26
Hi there, I would like to ask a question regarding upgrading Oracle version in the future. Last time we do an upgrade from 11g to 12c and I found there is an error of ?Unsupported PL/SQL datatype? in our dot net program. But somehow, we managed to...
Categories: DBA Blogs

Fundamentals: Create Trigger and Create Sequence

Tom Kyte - Fri, 2017-08-18 23:26
I do not have DBA status so, I have to ask for privileges when needed in developing an Oracle Back End with MS ACCESS 2010 front end. I now have CREATE TRIGGER privilege. I realize I have to ask for the Create Sequence privilege as well. To all...
Categories: DBA Blogs

Linux Containers - Container LInux: Core OS - a minimal linux? A quick overview: diskspace, mounts, commands, kernel, ...

Dietrich Schroff - Fri, 2017-08-18 13:43
Virtualization techniques can be grouped into two categories
  1. os virtualization like VMware, Virtualbox, etc.
  2. container virtualization like Docker, etc.

Everybody is familiar with the first category. Creation of virtual machines is well known and most of us have created already tens or hundreds of such machines. (and deleted and lost them after some weeks). One of the benefits is, that it is easy to run additional machines on our laptops / workstations.
The second category is rarely used on laptops / workstations. This is something for computer rooms or data centers only.

Both categories reduce the amount of hardware needed - but management of all theses virtual entities has still to be done.

But for some workloads solutions are proposed like Kubernetes or Apache Mesos. Never heard about? ;-)

Kubernetes orchestrates containers and Core OS is providing a minimal Container Linux operating system, which can be used by Kubernetes.
Here my first expiriences with this minimal Linux system (for installation please refer to this posting):
$ uname -a
Linux localhost 4.11.9-coreos #1 SMP ......
Diskspace used is only 244 megabytes:

Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        361M     0  361M   0% /dev
tmpfs           499M     0  499M   0% /dev/shm
tmpfs           499M   13M  486M   3% /run
tmpfs           499M     0  499M   0% /sys/fs/cgroup
tmpfs           499M  8.0M  491M   2% /
/dev/loop0      244M  244M     0 100% /usr
tmpfs           499M     0  499M   0% /media
tmpfs           499M     0  499M   0% /tmp
tmpfs           100M     0  100M   0% /run/user/500Filesystems:
tmpfs on / type tmpfs (rw,relatime,seclabel,mode=755)
/usr.squashfs (deleted) on /usr type squashfs (ro,relatime,seclabel)
selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime)
tmpfs on /media type tmpfs (rw,nosuid,nodev,noexec,relatime,seclabel)
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,seclabel)
Available commands /bin (/bin is a softlink to /usr/bin):
core@localhost ~ $ ls /bin
'['                                 ip                 rview
actool                              ipcmk              rvim
arch                                ipcrm              scanelf
arping                              ipcs               scanmacho
attr                                iptables-xml       sclient
aulast                              isosize            scmp_sys_resolver
aulastlog                           join               scp
ausyscall                           journalctl         script
autopoint                           jq                 scriptreplay
auvirt                              js-config          sdiff
awk                                 k5srvutil          sechecker
base32                              kadmin             secon
base64                              kbxutil            sed
basename                            kdestroy           sediff
bash                                kernel-install     seinfo
bashbug                             keyctl             semodule_deps
block-until-url                     kill               semodule_expand
bootctl                             kinit              semodule_link
bsdcat                              klist              semodule_package
bsdcpio                             kmod               semodule_unpackage
bsdiff                              kpasswd            seq
bsdtar                              krb5-config        sesearch
bspatch                             ksba-config        setarch
bunzip2                             ksu                setfacl
busctl                              kswitch            setfattr
bzcat                               ktutil             setsid
bzcmp                               kvno               setterm
bzdiff                              last               sftp
bzegrep                             lastb              sg
bzfgrep                             lastlog            sh
bzgrep                              ldapadd            sha1sum
bzip2                               ldapcompare        sha224sum
bzip2recover                        ldapdelete         sha256sum
bzless                              ldapexop           sha384sum
bzmore                              ldapmodify         sha512sum
c_rehash                            ldapmodrdn         shlibsign
cal                                 ldappasswd         shred
captest                             ldapsearch         shuf
captoinfo                           ldapurl            sim_client
cat                                 ldapwhoami         slabtop
catchsegv                           ldbadd             sleep
cgpt                                ldbdel             sntp
chacl                               ldbedit            sort
chage                               ldbmodify          sotruss
chattr                              ldbrename          split
chcon                               ldbsearch          sprof
chfn                                ldd                sqlite3
chgrp                               lddtree            ssh
chkcon                              less               ssh-add
chmod                               lessecho           ssh-agent
chown                               lesskey            ssh-copy-id
chroot                              lesspipe           ssh-keygen
chrt                                libassuan-config   ssh-keyscan
chsh                                libgcrypt-config   sss_ssh_authorizedkeys
cksum                               liblinear-predict  sss_ssh_knownhostsproxy
clear                               liblinear-train    stat
cmp                                 link               stdbuf
col                                 linux32            strace
colcrt                              linux64            strace-log-merge
colrm                               ln                 stty
column                              locale             su
comm                                localectl          sudo
containerd                          localedef          sudoedit
containerd-shim                     locksmithctl       sudoreplay
coredumpctl                         logger             sum
coreos-cloudinit                    login              symcryptrun
coreos-install                      loginctl           symtree
coreos-metadata                     logname            sync
cp                                  look               systemctl
csplit                              ls                 systemd-analyze
ctr                                 lsattr             systemd-ask-password
curl                                lsblk              systemd-cat
curl-config                         lscpu              systemd-cgls
cut                                 lsipc              systemd-cgtop
date                                lslocks            systemd-delta
dbus-binding-tool                   lslogins           systemd-detect-virt
dbus-cleanup-sockets                lsmod              systemd-escape
dbus-daemon                         lsns               systemd-firstboot
dbus-launch                         lsof               systemd-hwdb
dbus-monitor                        lsscsi             systemd-inhibit
dbus-run-session                    lsusb              systemd-machine-id-setup
dbus-send                           lzcat              systemd-mount
dbus-test-tool                      lzcmp              systemd-notify
dbus-update-activation-environment  lzdiff             systemd-nspawn
dbus-uuidgen                        lzegrep            systemd-path
dd                                  lzfgrep            systemd-resolve
delv                                lzgrep             systemd-run
df                                  lzless             systemd-socket-activate
dgawk                               lzma               systemd-stdio-bridge
diff                                lzmadec            systemd-sysusers
diff3                               lzmainfo           systemd-tmpfiles
dig                                 lzmore             systemd-tty-ask-password-agent
dir                                 machinectl         systemd-umount
dircolors                           makedb             tabs
dirmngr                             mayday             tac
dirmngr-client                      mcookie            tail
dirname                             md5sum             tailf
dmesg                               mkdir              tar
dnsdomainname                       mkfifo             taskset
dnssec-dsfromkey                    mknod              tdbbackup
dnssec-importkey                    mkpasswd           tdbdump
dnssec-keyfromlabel                 mktemp             tdbrestore
dnssec-keygen                       modinfo            tdbtool
dnssec-revoke                       more               tee
dnssec-settime                      mount              test
dnssec-signzone                     mountpoint         tic
dnssec-verify                       mpicalc            tickadj
docker                              msgattrib          timedatectl
docker-containerd                   msgcat             timeout
docker-containerd-shim              msgcmp             tload
docker-proxy                        msgcomm            toe
docker-runc                         msgconv            toolbox
dockerd                             msgen              top
domainname                          msgexec            touch
du                                  msgfilter          tpmd
dumpelf                             msgfmt             tpmown
dumpsexp                            msggrep            tput
e2size                              msginit            tr
echo                                msgmerge           true
egrep                               msgunfmt           truncate
eject                               msguniq            tset
env                                 mv                 tsort
envsubst                            namei              tty
etcd2                               ncat               tzselect
etcdctl                             ncurses5-config    udevadm
ex                                  ncursesw5-config   ul
expand                              net                umount
expiry                              netcap             uname
expr                                netstat            uname26
factor                              networkctl         uncompress
faillog                             newgidmap          unexpand
fallocate                           newgrp             uniq
false                               newrole            unlink
fgrep                               newuidmap          unlzma
filecap                             ngettext           unshare
find                                nice               unxz
findcon                             nisdomainname      unzip
findmnt                             nl                 unzipsfx
fleet                               nohup              update-ssh-keys
fleetctl                            nproc              update_engine_client
fleetd                              npth-config        uptime
flock                               nsenter            usb-devices
fmt                                 nslookup           usbhid-dump
fold                                nspr-config        users
free                                nss-config         utmpdump
funzip                              nsupdate           uuclient
gapplication                        ntp-keygen         uuidgen
gawk                                ntpdc              vdir
gdbus                               ntpq               vi
gencat                              ntptime            view
getconf                             numfmt             vim
getent                              od                 vimdiff
getfacl                             oldfind            vmstat
getfattr                            openssl            w
getopt                              passwd             watch
gettext                             paste              watchgnupg
gettext.sh                          pathchk            wc
gettextize                          pcap-config        wdctl
gflags_completions.sh               pcprofiledump      wget
gio-querymodules                    pcre-config        whereis
git                                 pcregrep           which
git-cvsserver                       pcretest           who
git-receive-pack                    pg                 whoami
git-shell                           pgawk              whois
git-upload-archive                  pgrep              x86_64
git-upload-pack                     pidof              x86_64-cros-linux-gnu-curl-config
glib-compile-resources              pinentry           x86_64-cros-linux-gnu-gpg-error-config
glib-compile-schemas                pinentry-curses    x86_64-cros-linux-gnu-krb5-config
glib-genmarshal                     pinentry-tty       x86_64-cros-linux-gnu-libgcrypt-config
glib-gettextize                     ping               x86_64-cros-linux-gnu-nspr-config
gmsgfmt                             ping6              x86_64-cros-linux-gnu-nss-config
gobject-query                       pinky              x86_64-cros-linux-gnu-pcre-config
gpasswd                             pkaction           x86_64-cros-linux-gnu-xml2-config
gpg                                 pkcheck            xargs
gpg-agent                           pkexec             xenstore
gpg-connect-agent                   pkill              xgettext
gpg-error                           pkttyagent         xml2-config
gpg-error-config                    pldd               xmlcatalog
gpg2                                pmap               xmllint
gpgconf                             pr                 xmlwf
gpgparsemail                        printenv           xtrace
gpgscm                              printf             xz
gpgsm                               prlimit            xzcat
gpgtar                              protoc             xzcmp
gpgv                                ps                 xzdec
gpgv2                               pscap              xzdiff
grep                                pspax              xzegrep
gresource                           ptx                xzfgrep
groups                              pwd                xzgrep
gsettings                           pwdx               xzless
gss-client                          quota              xzmore
gtester                             quotasync          yes
gunzip                              randstat           ypdomainname
gzexe                               rbash              zcat
gzip                                readlink           zcmp
head                                realpath           zdiff
hexdump                             recode-sr-latin    zdump
hmac256                             rename             zegrep
host                                renice             zfgrep
hostid                              replcon            zforce
hostname                            reset              zgrep
hostnamectl                         rev                zic
i386                                rkt                zip
iconv                               rm                 zipcloak
id                                  rmdir              zipgrep
ifconfig                            rngtest            zipinfo
igawk                               rootdev            zipnote
indexcon                            route              zipsplit
infocmp                             rpcgen             zless
infotocap                           rsync              zmore
install                             runc               znew
ionice                              runcon
 Available commands /sbin (/sbin is a softlink to /usr/sbin):
core@localhost ~ $ ls /sbin/
adcli                getfilecon         modinfo                          sm-notify
addgnupghome         getpcaps           modprobe                         ss
addpart              getpidcon          mount.nfs                        sserver
agetty               getsebool          mount.nfs4                       sshd
applygnupgdefaults   getseuser          mpathpersist                     sss_cache
arp                  groupadd           multipath                        sss_debuglevel
auditctl             groupdel           multipathd                       sss_groupadd
augenrules           groupmems          nameif                           sss_groupdel
autrace              groupmod           newusers                         sss_groupmod
avcstat              grpck              nfsidmap                         sss_groupshow
badblocks            grpconv            nfsstat                          sss_override
blkdeactivate        grpunconv          nologin                          sss_seed
blkdiscard           gss-server         nscd                             sss_useradd
blkid                halt               nstat                            sss_userdel
blockdev             hwclock            ntpd                             sss_usermod
brctl                iconvconfig        ntpdate                          sssd
bridge               ifcfg              nvme                             start-statd
btrfs                ifstat             open_init_pty                    sulogin
btrfs-convert        init               osd_login                        swaplabel
btrfs-debug-tree     insmod             pam_tally                        swapoff
btrfs-find-root      ip6tables          pam_tally2                       swapon
btrfs-image          ip6tables-apply    pam_timestamp_check              switch_root
btrfs-map-logical    ip6tables-restore  parted                           sysctl
btrfs-select-super   ip6tables-save     partprobe                        tc
btrfs-show-super     ipmaddr            partx                            tcsd
btrfs-zero-log       ipset              pdata_tools                      telinit
btrfsck              iptables           pivot_root                       testsaslauthd
btrfstune            iptables-apply     plipconfig                       thin_check
cache_check          iptables-restore   pluginviewer                     thin_dump
cache_dump           iptables-save      policyvers                       thin_metadata_size
cache_metadata_size  iptunnel           poweroff                         thin_repair
cache_repair         iscsi-iname        pvchange                         thin_restore
cache_restore        iscsi_discovery    pvck                             thin_rmap
capsh                iscsiadm           pvcreate                         togglesebool
cfdisk               iscsid             pvdisplay                        tpm_hostpolicy
cgdisk               iscsistart         pvmove                           traceroute6
chcpu                kadmin.local       pvremove                         tune2fs
checkarray           kadmind            pvresize                         umount.nfs
chgpasswd            kdb5_util          pvs                              umount.nfs4
chpasswd             kdump              pvscan                           unix_chkpwd
compute_av           kexec              pwcheck                          unix_update
compute_create       key.dns_resolver   pwck                             update-ca-certificates
compute_member       kpartx             pwconv                           update_engine
compute_relabel      kprop              pwunconv                         update_engine_stub
compute_user         kpropd             quot                             useradd
consoletype          kproplog           quotacheck                       userdel
convertquota         krb5-send-pr       quotaoff                         usermod
coreos-postinst      krb5kdc            quotaon                          uuidd
coreos-setgoodroot   ldattach           quotastats                       uuserver
coreos-tmpfiles      ldconfig           rarp                             veritysetup
cryptsetup           lnstat             raw                              vgcfgbackup
ctrlaltdel           load_policy        readprofile                      vgcfgrestore
ctstat               logoutd            realm                            vgchange
debugfs              logrotate          reboot                           vgck
delpart              logsave            repquota                         vgconvert
depmod               losetup            request-key                      vgcreate
dhcpcd               lshw               resize2fs                        vgdisplay
dmeventd             lsmod              resizepart                       vgexport
dmsetup              lspci              restorecon                       vgextend
dmstats              lvchange           rmmod                            vgimport
dumpe2fs             lvconvert          rngd                             vgimportclone
e2freefrag           lvcreate           routef                           vgmerge
e2fsck               lvdisplay          routel                           vgmknodes
e2image              lvextend           rpc.idmapd                       vgreduce
e2label              lvm                rpc.mountd                       vgremove
e2undo               lvmchange          rpc.nfsd                         vgrename
e4defrag             lvmconf            rpc.rquotad                      vgs
ebtables             lvmconfig          rpc.statd                        vgscan
ebtables-restore     lvmdiskscan        rpcbind                          vgsplit
edquota              lvmdump            rpcdebug                         vigr
era_check            lvmetad            rpcinfo                          vipw
era_dump             lvmsadc            rtacct                           visudo
era_invalidate       lvmsar             rtcwake                          vmcore-dmesg
era_restore          lvreduce           rtmon                            warnquota
ethtool              lvremove           rtpr                             wipefs
exportfs             lvrename           rtstat                           xfs_admin
fdisk                lvresize           run_init                         xfs_bmap
filefrag             lvs                runlevel                         xfs_copy
findfs               lvscan             runuser                          xfs_db
fixparts             matchpathcon       saslauthd                        xfs_estimate
fsadm                mcstransd          sefcontext_compile               xfs_freeze
fsck                 mdadm              selinux_check_securetty_context  xfs_fsr
fsck.btrfs           mdassemble         selinuxexeccon                   xfs_growfs
fsck.cramfs          mdmon              semodule                         xfs_info
fsck.ext2            mii-tool           sestatus                         xfs_io
fsck.ext3            mke2fs             setcap                           xfs_logprint
fsck.ext4            mkfs               setenforce                       xfs_mdrestore
fsck.ext4dev         mkfs.bfs           setfilecon                       xfs_metadump
fsck.minix           mkfs.btrfs         setfiles                         xfs_mkfile
fsck.xfs             mkfs.cramfs        setpci                           xfs_ncheck
fsfreeze             mkfs.ext2          setquota                         xfs_quota
fstrim               mkfs.ext3          setsebool                        xfs_repair
gdisk                mkfs.ext4          sfdisk                           xfs_rtcp
genhomedircon        mkfs.ext4dev       sgdisk                           xqmstats
genl                 mkfs.minix         showmount                        xtables-multi
getcap               mkfs.xfs           shutdown                         zramctl
getconlist           mkhomedir_helper   sim_server
getdefaultcon        mklost+found       slattach
getenforce           mkswap             sln

APEX Interactive Grids - what I've learned so far

Tony Andrews - Fri, 2017-08-18 06:09
My latest project has been building new pages on APEX 5.1, and quite a few of them have involved the new Interactive Grids (IGs).  I've learned I few things about them in the process, which I thought I'd record here both for my own benefit and in case it helps others. They are addictive I've seen a video of a David Peake presentation about IGs where he warns that developers will be tempted to Tony Andrewshttp://www.blogger.com/profile/16750945985361011515noreply@blogger.com1http://tonyandrews.blogspot.com/2017/08/apex-interactive-grids-what-ive-learned.html

Internal workings of APEX

Tom Kyte - Fri, 2017-08-18 05:06
Hi, I have made an APEX application; the objects are in a separate schema. No grants have been issued to any user. Nevertheless, APEX (running as user APEX_PUBLIC_USER) can access the objects. My question is: how? I do not see any system privil...
Categories: DBA Blogs

Advantages of Exadata for a PL/SQL developer.

Tom Kyte - Fri, 2017-08-18 05:06
I'm a PlSql developer currently working on Oracle 12c with Exadata. I came across some of the advantages of exadata as like flash cache and Storage Indexes. However it seems all these concepts are DBA related. Do we have any changes or advantages for...
Categories: DBA Blogs

RR Date format for two digit year

Tom Kyte - Fri, 2017-08-18 05:06
I have a case where RR date format is not returning expected result... to_date('960504','rrmmdd') returns 2096, not 1996. I seem to recall back in Y2k an inti parm that influenced the action of the RR format.... but I can't find it. Am I imagin...
Categories: DBA Blogs

reading single file in a directory -- For UTL_FILE in Oracle EBS application

Tom Kyte - Fri, 2017-08-18 05:06
I am registering this package as concurrent program in Oracle. Here in the directory 'DATA_NOT_RETAIL_INCOME' i will have only one file but I don't know the full name of the file. While registering as concurrent program in oracle how can I pas...
Categories: DBA Blogs

LGWR waiting on "control file sequential read"

Tom Kyte - Fri, 2017-08-18 05:06
Hi TOM, Recently, we faced an issue in one of our production databases where we identified 100s of sessions waiting on library cache locks and the holder was LGWR process which itself was waiting on "control file sequential read". Trying to unders...
Categories: DBA Blogs

DBMS_PARALLEL_EXECUTE.RUN_TASK is running sequentialy

Tom Kyte - Fri, 2017-08-18 05:06
Hello, I'm doing some migration between two databases with user provided subsets. The problem is that the parallel_level, doesn't seems to work. It always run one chunk at time. First the procedure thrown an error saying that I need the Create_...
Categories: DBA Blogs

pivot records based on duplicate values in one column

Tom Kyte - Fri, 2017-08-18 05:06
Hi Tom, Is it possible to pivot only records with duplicate values in a given column. Here is the scenario create table test(id number, cname varchar2(10), cvalue varchar2(10)) insert into test values(1, 'a', '10'); insert into test valu...
Categories: DBA Blogs

How to convert a number column to time

Tom Kyte - Fri, 2017-08-18 05:06
Hello, I have a number column in a table named TXNTIME which the data_type is a number: I want to select the output of this column from a number to a time(HH24:MI:SS). TXNTIME ------- 164741 164742 93935 172350 73016 174742 174743 ....
Categories: DBA Blogs

Dropping of Column Group Statistics

Tom Kyte - Fri, 2017-08-18 05:06
Hello, I have couple of things to be clarified related to extended statistics type Column Group Statistics. When we alter some table column, I think it's always a good idea to drop the extended statistics before we proceed with the change. We...
Categories: DBA Blogs

Team based To Do Management

Anthony Shorten - Thu, 2017-08-17 19:39

One of the interesting discussions I have with customers and partners about the To Do functionality in the Oracle Utilities Application Framework based products is team management. Most partners and customers think that the To Do functionality is limited to one role per To Do type. This is due to the fact that most examples they see in training or in demonstrations shows one role per To Do type. There is "more than meets the eye" to the functionality.

The To Do functionality can be configured in different ways to implement different allocation mechanisms. Let me discuss and alternative configuration that may appeal to some implementations.

  • Create a To Role for each organizational team in your organization. These do not have to be whole parts of your organization, they can simply be groups of people with similar skills or work responsibilities. You decide the numbers of groups and their composition. I will use the word "team" rather than To Do Role in the rest of this article to emphasize the alternative view.
  • By using teams you actually might reduce your maintenance costs as you will probably have less numbers of teams than the number of To Do types to manage. At the moment remember people think that you can only have one team per To Do Type.
  • Allocate people to those teams. Now you have full flexibility here. A person can be a member of any team you wish and of course they can be members of multiple teams (even overlapping ones - more about his later). 
  • Allocate the teams to the To Do Types they will be working on. Now that you have teams you can allocate multiple teams per To Do type. Remember one of the teams should be allocated as the Default so that your algorithms, batch jobs etc have a default to allocate.

Now your implementation will be using teams of people rather than using one role per To Do Type. This means you can allocate to teams (or individuals) and supervisors can manage teams.

Remember the use of a capability in the product is not restricted to what is shown in demonstrations. Think outside the box.

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

Rittman Mead Consulting - Thu, 2017-08-17 10:54
How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

I don't trust statistics and personally believe that at least 74% of them are wrong.... but I bet nearly 100% of people with any interest in fantasy (or just any) TV shows are watching the 7th series of Game of Thrones (GoT) by HBO.
If you are one of those, join me in the analysis of the latest tweets regarding the subject. Please be also aware that, if you're not on the latest episode, some spoilers may be revealed by this article. My suggestion is then to go back and watch the episodes first and then come back here for the analysis!

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

If you aren't part of the above group then ¯\_(ツ)_/¯. Still this post contains a lot of details on how to perform analysis on any tweet with Tableau and BigQuery together with Kafka sources and sink configurations. I'll leave to you to find another topic to put all this in practice.

Overall Setup

As described in my previous post on analysing Wimbledon tweets I've used Kafka for the tweet extraction phase. In this case however, instead of querying the data directly in Kafka with Presto, I'm landing the data into a Google BigQuery Table. The last step is optional, since as in last blog I was directly querying Kafka, but in my opinion represents the perfect use case of all technologies: Kafka for streaming and BigQuery for storing and querying data.
The endpoint is represented by Tableau, which has a native connector to BigQuery. The following image represents the complete flow

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

One thing to notice: at this point in time I'm using a on-premises installation of Kafka which I kept from my previous blog. However since source and target are natively cloud application I could easily move also Kafka in the cloud using for example the recently announced Confluent Kafka as a Service.

Now let's add some details about the overall setup.


For the purpose of this blog post I've switched from the original Apache Kafka distribution to the Confluent open source one. I've chosen the Confluent distribution since it includes the Kafka Connect which is

A framework for scalably and reliably streaming data between Apache Kafka and other data systems

Using this framework anybody can write a connector to push data from any system (Source Connector) to Kafka or pull data from it (Sink Connector). This is a list of available connectors developed and maintained either from Confluent or from the community. Moreover Kafka Connect provides the benefit of parsing the message body and storing it in Avro format which makes it easier to access and faster to retrieve.

Kafka Source for Twitter

In order to source from Twitter I've been using this connector. The setup is pretty easy: copy the source folder named kafka-connect-twitter-master under $CONFLUENT_HOME/share/java and modify the file TwitterSourceConnector.properties located under the config subfolder in order to include the connection details and the topics.

The configuration file in my case looked like the following:


# Set these required values

Few things to notice:

  • process.deletes=false: I'll not delete any message from the stream
  • kafka.status.topic=rm.got: I'll write against a topic named rm.got
  • filter.keywords=#got,gameofthrones,stark,lannister,targaryen: I'll take all the tweets with one of the following keywords included. The list could be expanded, this was just a test case.

All the work is done! the next step is to start the Kafka Connect execution via the following call from $CONFLUENT_HOME/share/java/kafka-connect-twitter

$CONFLUENT_HOME/bin/connect-standalone config/connect-avro-docker.properties config/TwitterSourceConnector.properties

I can see the flow of messages in Kafka using the avro-console-consumer command

./bin/kafka-avro-console-consumer --bootstrap-server localhost:9092 --property schema.registry.url=http://localhost:8081 --property print.key=true --topic twitter --from-beginning

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

You can see (or maybe it's a little bit difficult from the GIF) that the message body was transformed from JSON to AVRO format, the following is an example

"Text":{"string":"RT @haranatom: Daenerys Targaryen\uD83D\uDE0D https://t.co/EGQRvLEPIM"},
Kafka Sink to BigQuery

Once the data is in Kafka, the next step is push it to the selected datastore: BigQuery. I can rely on Kafka Connect also for this task, with the related code written and supported by the community and available in github.

All I had to do is to download the code and change the file kcbq-connector/quickstart/properties/connector.properties

# The name of the BigQuery project to write to
# The name of the BigQuery dataset to write to (leave the '.*=' at the beginning, enter your
# dataset after it)
# The location of a BigQuery service account JSON key file

The changes included:

  • the topic name to source from Kafka
  • the project, dataset and Keyfile which are the connection parameters to BigQuery. Note that the Keyfile is automatically generated when creating a BigQuery service.

After verifying the settings, as per Kafka connect instructions, I had to create the tarball of the connector and extract it's contents

cd /path/to/kafka-connect-bigquery/  
./gradlew clean confluentTarBall
mkdir bin/jar/ && tar -C bin/jar/ -xf bin/tar/kcbq-connector-*-confluent-dist.tar  

The last step is to launch the connector by moving into the kcbq-connector/quickstart/ subfolder and executing


Note that you may need to specify the CONFLUENT_DIR if the Confluent installation home is not in a sibling directory

export CONFLUENT_DIR=/path/to/confluent  

When everything start up without any error a table named rm_got (the name is automatically generated) appears in the BigQuery dataset I defined previously and starts populating.

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

A side note: I encountered a Java Heap Space error during the run of the BigQuery sink. This was resolved by increasing the heap space setting of the connector via the following call

export KAFKA_HEAP_OPTS="-Xms512m -Xmx1g"  

BigQuery, based on Dremel's paper, is Google's proposition for an enterprise cloud datawarehouse which combines speed and scalability with separate pricing for storage and compute. If the cost of storage is common knowledge in the IT world, the compute cost is a fairly new concept. What this means is that the cost of the same query can vary depending on how the data is organized. In Oracle terms, we are used to associating the query cost to the one defined in the explain plan. In BigQuery that concept is translated from "performance cost" to also "financial cost" of a query: the more data a single query has to scan, the higher is the cost for it. This makes the work of optimizing data structures not only visible performance wise but also on the financial side.

For the purpose of the blog post, I had almost 0 settings to configure other than creating a Google Cloud Platform, creating a BigQuery project and a dataset.

During the Project creation phase, a Keyfile is generated and stored locally on the computer. This file contains all the credentials needed to connect to BigQuery from any external application, my suggestion is to store it in a secure place.

  "type": "service_account",
  "project_id": "<PROJECT_ID>",
  "private_key_id": "<PROJECT_KEY_ID>",
  "private_key": "<PRIVATE_KEY>",
  "client_email": "<E-MAIL>",
  "client_id": "<ID>",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://accounts.google.com/o/oauth2/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "<URL>"

This file is used in the Kafka sink as we saw above.


Once the data is landed in BigQuery, It's time to analyse it with Tableau!
The Connection is really simple: from Tableau home I just need to select Connect-> To a Server -> Google BigQuery, fill in the connection details and select the project and datasource.

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

An important feature to set is the Use Legacy SQL checkbox in the datasource definition. Without this setting checked I wasn't able to properly query the BigQuery datasource. This is due to the fact that "Standard SQL" doesn't support nested columns while Legacy SQL (also known as BigQuery SQL) does, for more info check the related tableau website.

Analysing the data

Now it starts the fun part: analysing the data! The integration between Tableau and BigQuery automatically exposes all the columns of the selected tables together with the correctly mapped datatypes, so I can immediately start playing with the dataset without having to worry about datatype conversions or date formats. I can simply include in the analysis the CreatedAt date and the Number of Records measure (named # of Tweets) and display the number of tweets over time.

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

Now I want to analyse where the tweets are coming from. I can use using the the Place.Country or the Geolocation.Latitude and Geolocation.Longitude fields in the tweet detail. Latitute and Longitude are more detailed while the Country is rolled up at state level, but both solutions have the same problem: they are available only for tweets with geolocation activated.

After adding Place.Country and # of Tweets in the canvas, I can then select the map as visualization. Two columns Latitude (generated) and Longitude (generated) are created on the fly mapping the country locations and the selected visualization is shown.

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

However as mentioned before, this map shows only a subset of the tweets since the majority of tweets (almost 99%) has no location.

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

The fields User.Location and User.TimeZone suffer from a different problem: either are null or the possible values are not coming from a predefined list but are left to the creativity of the account owner which can type whatever string. As you can see, it seems we have some tweets coming from directly from Winterfell, Westeros, and interesting enough... Hogwarts!

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

Checking the most engaged accounts based on User.Name field clearly shows that Daenerys and Jon Snow take the time to tweet between fighting Cercei and the Whitewalkers.

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

The field User.Lang can be used to identify the language of the User. However, when analysing the raw data, it can be noticed that there are language splits for regional language settings (note en vs en-gb). We can solve the problem by creating a new field User.Lang.Clean taking only the first part of the string with a formula like

IF  FIND([User.Lang],'-') =0  
    THEN [User.Lang] 

With the interesting result of Italian being the 4th most used language, overtaking portuguese, and showing the high interest in the show in my home country.

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

Character and House Analysis

Still with me? So far we've done some pretty basic analysis on top of pre-built fields or with little transformations... now it's time to go deep into the tweet's Text field and check what the people are talking about!

The first thing I wanted to do is check mentions about the characters and related houses. The more a house is mentioned, the more should be relevant correct?
The first text analysis I want to perform was Stark vs Targaryen mention war: showing how many tweets were mentioning both, only one or none of two of the main houses. I achieved it with the below IF statement

IF contains(upper([Text]), 'STARK') AND contains(upper([Text]),'TARGARYEN')  
 THEN 'Both' 
 ELSEIF contains(upper([Text]), 'STARK') 
  THEN 'Stark' 
 ELSEIF contains(upper([Text]), 'TARGARYEN') 
  THEN 'Targaryen' 
 ELSE 'None' 

With the results supporting the house Stark

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

I can do the same at single character level counting the mentions on separate columns like for Jon Snow

IIF(contains(upper([Text]), 'JON')  
OR contains(upper([Text]),'SNOW'), 1,0)  

Note the OR condition since I want to count as mentions both the words JON and SNOW since those can uniquely be referred at the same character. Similarly I can create a column counting the mentions to Arya Stark with the following formula

IIF(contains(upper([Text]), 'ARYA'), 1,0)  

Note in this case I'm filtering only the name (ARYA) since Stark can be a reference to multiple characters (Sansa, Bran ...). I created several columns like the two above for some characters and displayed them in a histogram ordered by # of Mentions in descending order.

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

As expected, after looking at the Houses results above, Jon Snow is leading the user mentions with a big margin over the others with Daenerys in second place.

The methods mentioned above however have some big limitations:

  • I need to create a different column for every character/house I want to analyse
  • The formula complexity increases if I want to analyse more houses/characters at the same time

My goal would be to have an Excel file, where I set the research Key (like JON and SNOW) together with the related character and house and mash this data with the BigQuery table.

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

The joining key would be like

CONTAINS([BigQuery].[Text], [Excel].[Key]) >0  

Unfortunately Tableau allows only = operators in text joining conditions during data blending making the above syntax impossible to implement. I have now three options:

  • Give Up: Never if there is still hope!
  • Move the Excel into a BigQuery table and resolve the problem there by writing a view on top of the data: works but increases the complexity on BigQuery side, plus most Tableau users will not have write access to related datasources.
  • Find an alternative way of joining the data: If the CONTAINS join is not possible during data-blending phase, I may use it a little bit later...

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

Warning: the method mentioned below is not the optimal performance wise and should be used carefully since it causes data duplication if not handled properly.

Without the option of using the CONTAINS I had to create a cartesian join during data-blending phase. By using a cartesian join every row in the BigQuery table is repeated for every row in the Excel table. I managed to create a cartesian join by simply put a 1-1 condition in the data-blending section.

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

I can then apply a filter on the resulting dataset to keep only the BigQuery rows mentioning one (or more) Key from the Excel file with the following formula.


This formula filters the tweet Id where the Excel's [Key] field is contained in the UPPER([Text]) coming from Twitter. Since there are multiple Keys assigned to the same character/house (see Jon Snow with both keywords JON and SNOW) the aggregation for this column is count distinct which in Tableau is achieved with COUNTD formula.
I can now simply drag the Name from the Excel file and the # of Mentions column with the above formula and aggregation method as count distinct.

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

The beauty of this solution is that now if I need to do the same graph by house, I don't need to create columns with new formulas, but simply remove the Name field and replace it with Family coming from the Excel file.

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

Also if I forgot a character or family I simply need to add the relevant rows in the Excel lookup file and reload it, nothing to change in the formulas.

Sentiment Analysis

Another goal I had in mind when analysing GoT data was the sentiment analysis of tweets and the average sentiment associated to a character or house. Doing sentiment analysis in Tableau is not too hard, since we can reuse already existing packages coming from R.

For the Tableau-R integration to work I had to install and execute the RServe package from a workstation where R was already installed and set the connection in Tableau. More details on this configuration can be found in Tableau documentation

Once configured Tableau to call R functions it's time to analyse the sentiment. I used Syuzhet package (previously downloaded) for this purpose. The Sentiment calculation is done by the following formula:

r<-(get_sentiment(.arg1,method = 'nrc'))",  


  • SCRIPT_INT: The method will return an integer score for each Tweet with positives sentiments having positives scores and negative sentiments negative scores
  • get_sentiment(.arg1,method = 'nrc'): is the function used
  • ATTR([Text]): the input parameter of the function which is the tweet text

At this point I can see the score associated to every tweet, and since that R package uses dictionaries, I limited my research to tweets in english language (filtering on the column User.Lang.Clean mentioned above by en).

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

The next step is to average the sentiment by character, seems an easy step but devil is in the details! Tableau takes the output of the SCRIPT_INT call to R as aggregated metric, thus not giving any visual options to re-aggregate! Plus the tweet Text field must be present in the layout for the sentiment to be calculated otherwise the metric results NULL.

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

Fortunately there are functions, and specifically window functions like WINDOW_AVG allowing a post aggregation based of a formula defining the start and end. The other cool fact is that window function work per partition of the data and the start and end of the window can be defined using the FIRST() and LAST() functions.

We can now create an aggregated version of our Sentiment column with the following formula

WINDOW_AVG(FLOAT([Sentiment]), FIRST(), LAST())  

This column will be repeated with the same value for all rows within the same "partition", in this case the character Name.

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

Be aware that this solution doesn't re-aggregate the data, we'll still see the data by single tweet Text and character Name. However the metric is calculated at total per character so graphs can be displayed.

I wanted to show a Scatter Plot based on the # of Mentions and Sentiment of each character. With the window functions and the defined above it's as easy as dragging the fields in the proper place and select the scatter plot viz.

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

The default view is not very informative since I can't really associate a character to its position in the chart until I go over the related image. Fortunately Tableau allows the definition of custom shapes and I could easily assign character photos to related names.

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

If negative mentions for Littlefinger and Cercei was somehow expected, the characters with most negative sentiment are Sansa Stark, probably due to the mysterious letter found by Arya in Baelish room, and Ellaria Sand. On the opposite side we strangely see the Night King and more in general the WhiteWalkers with a very positive sentiment associated to them. Strange, this needs further investigation.

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

Deep Dive on Whitewalkers and Sansa

I can create a view per Character with associate tweets and sentiment score and filter it for the WhiteWalkers. Looks like there are great expectations for this character in the next episodes (the battle is coming) which are associated with positive sentiments.

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

When analysing the detail of the number of tweets falling in each sentiment score category it's clear why Sansa and Whitewalkers have such a different sentiment average. Both appear as normal distributions, but the center of the Whitewalkers curve is around 1 (positive sentiment) while for Sansa is between -1 and 0 (negative sentiment).

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

This explanation however doesn't give me enough information, and want to understand more about what are the most used words included in tweets mentioning WhiteWalkers or Night King.

Warning: the method mentioned above is not the optimal performance wise and should be used carefully since it causes data duplication if not handled properly.

There is no easy way to do so directly in Tableau, even using R since all the functions expect the output size to be 1-1 with the input, like sentiment score and text.
For this purpose I created a view on top of the BigQuery table directly in Tableau using the New Custom SQL option. The SQL used is the following

SELECT  ID, REPLACE(REPLACE(SPLIT(UPPER(TEXT),' '),'#',''),'@','')  word FROM [Dataset.rm_got]  

The SPLIT function divides the Text field in multiple rows one for every word separated by space. This is a very basic split and can of course be enhanced if needed. On top of it the SQL removes references to # and @. Since the view contains the tweet's Id field, this can be used to join this dataset with the main table.

The graph showing the overall words belonging to characters is not really helpful since the amount of words (even if I included only the ones with more than e chars) is too huge to be analysed properly.

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

When analysing the single words in particular tweets I can clearly see that the Whitewalkers sentiment is driven by words like King, Iron, Throne having a positive sentiment. On the other hand Sansa stark is penalized by words like Kill and Fight probably due to the possible troubles with Arya.

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

One thing to mention is that the word Stark is classified with a negative sentiment due to the general english dictionary used for the scoring. This affects all the tweets and in particular the average scores of all the characters belonging to the House Stark. A new "GoT" dictionary should be created and used in order to avoid those kind of misinterpretations.

Also when talking about "Game of Thrones", words like Kill or Death can have positive or negative meaning depending on the sentence, a imaginary tweet like

Finally Arya kills Cercei

Should have a positive sentiment for Arya and a negative for Cercei, but this is where automatic techniques of sentiment classification show their limits. Not even a new dictionary could help in this case.

The chart below shows the percentage of words classified with positive (score 1 or 2) or negative (score -1 or -2) for the two selected characters. We can clearly see that Sansa has more negative words than positive as expected while Whitewalkers is on the opposite side.

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

Furthermore the overall sentiment for the two characters may be explained by the following graph. This shows for every sentence sentiment category (divided in bins Positive, Neutral, Negative), an histogram based on the count of words by single word sentiment. We can clearly see how words with positive sentiment are driving the Positive sentence category (and the opposite).

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

Finally the last graph shows the words that have mostly impacted the overall positive and negative sentiment for both characters.

How was Game Of Thrones S07 E05? Tweet Analysis with Kafka, BigQuery and Tableau

We can clearly see that Sansa negative sentiment is due to Stark, Hate and Victim. On the other side Whitewalkers positive sentiment is due to words like King (Night King is the character) and Finally probably due to the battle coming in the next episode. As you can see there are also multiple instances of the King word due to different punctualization preceeding or following the world. I stated above that the BigQuery SQL extracting the words via the SPLIT function was very basic, we can now see why. Little enhancements in the function would aggregate properly the words.

Are you still there? Do you wonder what's left? Well there is a whole set of analysis that can be done on top of this dataset, including checking the sentiment behaviour by time during the live event or comparing this week's dataset with the next episode's one. The latter may happen next week so... Keep in touch!

Hope you enjoyed the analysis... otherwise... Dracarys!


Categories: BI & Warehousing


Subscribe to Oracle FAQ aggregator