Kubilay Çilkara

Subscribe to Kubilay Çilkara feed
Database Systems is a blog about Databases, Oracle, Salesforce and Data of any size and shapeKubilay Tsil Karanoreply@blogger.comBlogger134125
Updated: 3 hours 39 min ago

Apache Phoenix, SQL is getting closer to Big Data

Tue, 2015-06-30 16:50

Here is a post about another project in the Big Data world, like Apache Hive from my previous post, enables you to do SQL on Big Data. It is called Apache Phoenix.

Phoenix is a bit different, a bit closer to my heart too, as I read the documentation on Apache Phoenix, the word 'algebra' and 'relational algebra' came across few times, and that mean only one thing, SQL! The use of the word algebra in the docs did give me a lot of confidence. SQL has closure, is based on a database systems model which has it's roots in logic and maths and especially a subset of algebra, The Set Theory.

Apache Phoenix is developed in Salesforce and is now one of the popular projects in Apache. Apache Phoenix is a SQL skin on top of HBase, the columnar (NoSQL) database of the Hadoop ecosystem, capable of storing very large tables and data and query them via 'scans'. HBase is part of the Hadoop ecosystem and the file system it uses is usually HDFS. Apache Phoenix is using JDBC on the client as a driver.

In the race to bring the easiest to use tools for Big Data, I think Apache Phoenix is very close. It is the SQL we know used since the 1970s. The Apache Phoenix team seems to be committed and willing to introduce all of the missing parts of SQL, including transaction processing with different isolation levels.  Making Phoenix a fully operational Relational Database layer on HBase. Have a look in their roadmap. The amount of current and suggested future SQL compatibility is remarkable, and this makes me take them really seriously.
  • Transactions
  • Cost-based Query Optimization! (Wow)
  • Joins
  • OLAP
  • Subqueries
  • Striving for full SQL-92 compliance
In addition to all this, it is also possible to turn an existing HBase table to an Apache Phoenix table using CREATE TABLE or even CREATE VIEW, the DDL statements that we know. How handy is that? Suddenly you can SQL enable your existing HBase database!

How to install and use Phoenix

The SQL skin can be installed to an existing Hadoop HBase installation very quickly. All you need to do is to download and extract the tarball. You can setup a standalone Hadoop environment, look at my previous blog post for that, and then install HBase and install Apache Phoenix

Once the Apache  Phoenix software is installed, then you can start it and query it with SQL like this.

From within the bin/ directory of Phoenix install directory run

$ ./sqlline.py  localhost

That will bring you to the phoenix prompt

0: jdbc:phoenix:localhost> select * from mytable;

Categories: DBA Blogs

Hive (HiveQL) SQL for Hadoop Big Data

Thu, 2015-06-25 14:30

In this  post I will share my experience with an Apache Hadoop component called Hive which enables you to do SQL on an Apache Hadoop Big Data cluster.

Being a great fun of SQL and relational databases, this was my opportunity to set up a mechanism where I could transfer some (a lot)  data from a relational database into Hadoop and query it with SQL. Not a very difficult thing to do these days, actually is very easy with Apache Hive!

Having access to a Hadoop cluster which has the Hive module installed on, is all you need. You can provision a Hadoop cluster yourself by downloading and installing it in pseudo mode on your own PC. Or you can run one in the cloud with Amazon AWS EMR in a pay-as-you-go fashion.

There are many ways of doing this, just Google it and you will be surprised how easy it is. It is easier than it sounds. Next find links for installing it on your own PC (Linux).  Just download and install Apache Hadoop and Hive from Apache Hadoop Downloads

You will need to download and install 3 things from the above link.

  • Hadoop (HDFS and Big Data Framework, the cluster)
  • Hive (data warehouse module)
  • Sqoop (data importer)
You will also need to put the connector of the database (Oracle, MySQL...) you want to extract data from in the */lib folder in your Sqoop installation. For example the MySQL JDBC connector can be downloaded from hereDon't expect loads of tinkering installing Apache Hadoop and Hive or Sqoop, just unzipping binary extracts and few line changes on some config files in directories, that's all. Is not a big deal, and is Free. There are tones of tutorials on internet on this, here is one I used from another blogger bogotobogo.

What is Hive?

Hive is Big Data SQL, the Data Warehouse in Hadoop. You can create tables, indexes, partition tables, use external tables, Views like in a relational database Data Warehouse. You can run SQL to do joins and to query the Hive tables in parallel using the MapReduce framework. It is actually quite fun to see your SQL queries translating to MapReduce jobs and run in parallel like parallel SQL queries we do on Oracle EE Data Warehouses and other databases. :0) The syntax looks very much like MySQL's SQL syntax.

Hive is NOT an OLTP transactional database, does not have transactions of INSERT, UPDATE, DELETE like in OLTP and doesn't conform to ANSI SQL and ACID properties of transactions.

Direct insert into Hive with Apache Sqoop:

After you have installed Hadoop and have hive setup and are able to login to it, you can use Sqoop - the data importer of Hadoop - like in the following command and directly import a table from MySQL via JDBC into Hive using MapReduce.

$  sqoop import -connect jdbc:mysql://mydatbasename -username kubilay -P -table mytablename --hive-import --hive-drop-import-delims --hive-database dbadb --num-mappers 16 --split-by id

Sqoop import options explained:
  •  -P will ask for the password
  • --hive-import which makes Sqoop to import data straight into hive table which it creates for you
  • --hive-drop-import-delims Drops \n\r, and \01 from string fields when importing to Hive. 
  • --hive-database tells it which database in Hive to import it to, otherwise it goes to the default database. 
  • --num-mappers number of parallel maps to run, like parallel processes / threads in SQL
  • --split-by  Column of the table used to split work units, like in partitioning key in database partitioning. 
The above command will import any MySQL table you give in place of mytablename into Hive using MapReduce from a MySQL database you specify.

Once you import the table then you can login to hive and run SQL to it like in any relational database. You can login to Hive in a properly configured system just by calling hive from command line like this:

$ hive

More Commands to list jobs:

Couple of other commands I found useful when I was experimenting with this:

List running Hadoop jobs

hadoop job -list

Kill running Hadoop jobs

hadoop job -kill job_1234567891011_1234

List particular table directories in HDFS

hadoop fs -ls mytablename

More resources & Links

Categories: DBA Blogs


Wed, 2015-03-04 12:15
In this post I will try to show you how I used the Oracle Apex and the APEX_WEB_SERVICE  PL/SQL package to quickly send a request to a public Internet API and how I handled the response. The code below was written during a 'Hackday' and hasn't been extensively tested.

My use case is integrating Oracle Apex with the public Mendeley REST API for Mendeley Catalog Search.

The idea was to build an application in Oracle Apex to query the Mendeley REST API Catalog with a keyword. Mendeley REST API gives JSON response so I used PL/JSON to parse it.  I hear in Oracle 12c JSON is going to be a native data-type. My Oracle Apex host is running Oracle 11g and I had to use PL/JSON for ease.

To cut it short here is how the Mendeley Catalog Search on Oracle Apex application look  like. (Click image to go to app or visit http://apex2.enciva.co.uk/apex/f?p=864:2

To integrate with Mendeley REST API from Oracle Apex, I used one PL/SQL function and one procedure.

I used the function to obtain the Mendeley REST API Client Credentials Authorisation flow token and the procedure to do make the API request to Mendeley Catalog Search and to handle the response.

Here is the MENDELEY_CALL PL/SQL function I created:

This function returns the Client Credentials Authorisation Flow token from the Mendeeley REST API

create or replace function mendeley_call (p_id in varchar2)
return varchar2
v_token varchar2(1000);
token varchar2(1000);
jtoken json;
v_grant_type varchar2(400:= 'client_credentials';
v_client_id varchar2(500:= p_id;
v_client_secret varchar2(500:= '<put_your_mendeley_client_secret_here>';
v_scope varchar2(300:= 'all';

/*----------Setting Headers----------------------------------------*/                                      
apex_web_service.g_request_headers(1).name := 'Content-Type';
apex_web_service.g_request_headers(1).Value := 'application/x-www-form-urlencoded; charset=utf-8';

token := apex_web_service.make_rest_request
      p_url         => 'https://api.mendeley.com/oauth/token'
    p_http_method => 'POST'
    p_parm_name   => apex_util.string_to_table('grant_type:client_id:client_secret:scope')
    p_parm_value  => apex_util.string_to_table(v_grant_type||':'||v_client_id||':'||v_client_secret||':'

    p_wallet_path => 'file:/home/app/oracle/product/11.2.0/dbhome_1/owm/wallets/oracle'
    p_wallet_pwd  => '<put_your_oracle_wallet_password_here>'
-- debug
-- dbms_output.put_line(token);
jtoken := json(token);
v_token := json_ext.get_string(jtoken,'access_token');
-- debug
-- dbms_output.put_line(v_token);
return v_token;
   raise_application_error(-20001,'An error was encountered - '||SQLCODE||' -ERROR- '||SQLERRM);

Here is the anonymous procedure which I put into a PL/SQL region on the Oracle Apex page:

This procedure incorporates the function above and makes the request and handles the response from the Mendeley REST API

Note how the procedure calls the function MENDELEY_CALL (above) to load the variable v_token. 

  v_token  VARCHAR2(599:= mendeley_call(put_your_mendeley_client_id_here);
  v_search VARCHAR2(500);
  mendeley_document NCLOB;
  v_status VARCHAR2(100);
  obj json_list;
  v_id VARCHAR2(100);
  v_title NVARCHAR2(1000);
  v_abstract NCLOB;--varchar2(32000);
  v_link     VARCHAR2(1000);
  v_source   VARCHAR2(500);
  v_type     VARCHAR2(100);
  v_pct_hit  VARCHAR2(10);
  v_rows     NUMBER(10);
  v_batch_id NUMBER(10);
  -- Oracle Wallet

  -- Set Authorisation headers and utf8
  -- the following lilne is necessary if you need to use languages other than latin and 
  -- you will use APEX_WEB_SERVICE package 
  -- build the Authorisation header
  apex_web_service.g_request_headers(1).name  := 'Content-Type';
  apex_web_service.g_request_headers(1).value := 'application/jsonrequest';
  apex_web_service.g_request_headers(1).name  := 'Authorization';
  apex_web_service.g_request_headers(1).value := 'Bearer '||v_token||'';
  -- Make the request and load the response into a CLOB 
  mendeley_document := apex_web_service.make_rest_request 
        p_url => 'https://api.mendeley.com:443/search/catalog' 
      p_http_method => 'GET' 
      p_parm_name => apex_util.string_to_table('title:limit'
      p_parm_value => apex_util.string_to_table('Mendeley:10'
  -- Load the response to JSON_LIST PL/JSON object
  obj := json_list(mendeley_document);
  -- Start extracting values from the JSON and writhe some HTML
  -- Traverse over JSON_LIST extract elements you like
  FOR IN 1..obj.count
    v_id       := json_ext.get_string(json(obj.get(i)),'id');
    v_title    := json_ext.get_string(json(obj.get(i)),'title');
    v_abstract := json_ext.get_string(json(obj.get(i)),'abstract');
    v_link     := json_ext.get_string(json(obj.get(i)),'link');
    v_source   := json_ext.get_string(json(obj.get(i)),'source');
    v_type     := json_ext.get_string(json(obj.get(i)),'type');
    -- write extracted data
   dbms_output.put_line(v_title||' ==> '||v_abstract);


This shows how easy is, in this case using one function and one procedure to make a REST API request to an external Web Service from Oracle Apex. 
Categories: DBA Blogs

Test MySQL on AWS quickly

Thu, 2015-02-19 02:20
Using sysbench to performance test AWS RDS MySQL hardware is an easy three step  operation. Sysbench creates synthetic tests and they are done on a 1 mil row 'sbtest' table that sysbench creates in the MySQL database you indicate. The test doesn't intrude with your database schema, and it doesn't use your data, so it is quite safe. The test is an OLTP test trying to simulate event operations in the database as it runs various, SELECT, INSERT, UPDATE and DELETE requests on it's own 'sbtest' table.The results  of the tests are metrics like transactions per second, number of events, elapsed time etc. See man pages for description and Google it, it is a  popular testing tool. Other things you can set it up to do are to control how many requests (events) you want it to execute in a given time or you can tell it to keep on executing infinite requests until you stop it, or destruction testing. Is a very flexible testing tool with many options including throtling concurrency.You can be up and running with 3 commands on a unix system as follows.  Download sysbench tool (doing this on ubuntu)sudo apt-get install sysbenchCreate a table with 1 mil rowssysbench --test=oltp --oltp-table-size=1000000 --mysql-host={your rds host url} --db-driver=mysql --mysql-user={your rds root user} --mysql-password={password} --mysql-db={your mysql database name} prepareTest with different parameterssysbench --test=oltp --oltp-table-size=1000000 --mysql-host={your rds host url} --db-driver=mysql --mysql-user={your rds root user} --mysql-password={password} --mysql-db={your mysql database name} --max-time=60 --num-threads=550 runWarning: Synthetic  tests will just give you the ability of the hardware at a given standard set of requests and DML operations. There are no way an indication of what will happen to your database if the real workload increases beacause of the applications. Application Load Testing is something else, applications are  complex!  Database Workload is dependent on the application generated workload from real users using the system and is very hard to simulate that in a test. It is not imppossible If you use a database, such as Oracle which has the capability of recording and replaying its production database workload - called Automatic Workload Repository (AWR). In MySQL I couldn't find  so far a way to do this. But sysbench synthetic tests gave me the ability to quickly benchmark and baseline a MySQL database capabilities on different AWS Amazon hardware, something is better than nothing I suppose. 
Categories: DBA Blogs

MySQL Locking

Tue, 2015-01-06 03:53
MySQL and Row Level Locking? Or why are you getting the error:

ERROR 1205 (HY000) : Lock wait timeout exceeded; try restarting transaction

You get the error because your allocated time to hold a DML lock in a transaction exceeds the set limit. Usually the default limit to hold a DML row lock, set by innodb_lock_wait_timeout db parameter, is 50 seconds. If your transaction doesn't commit/rollback within 50 seconds you will get this error. We don't want to hold locks for longer than 50 seconds anyway, throughput would be affected.

And yes MySQL in innodb uses row level locking. Since MySQL 5.1+ (time Oracle took over) it does row level locking in its InnoDB tables. That means only the rows which are selected . . . FOR UPDATE . . . are locked and not the  whole table. To see the threads (sessions) which are locking other threads and which queries are locking, use the following INFORMATION_SCHEMA dictionary SQL query as DBA.  You will be able to see blockers and waiters of transactions  waiting on locks. Run it as is using INFORMATION_SCHEMA schema, no modifications.

Use this SQL query to monitor locks and transactions and note that query will return data only when there are locks!

    r.trx_id AS wating_trx_id,
    r.trx_mysql_thread_id AS waiting_thread,
        CURRENT_TIMESTAMP) AS wait_time,
    r.trx_query AS waiting_query,
    l.lock_table AS waiting_table_lock,
    b.trx_id AS blocking_trx_id,
    b.trx_mysql_thread_id AS blocking_thread,
        INSTR(p.host, ':') - 1) AS blocking_host,
        INSTR(p.host, ':') + 1) AS blocking_port,
    IF(p.command = 'Sleep', p.time, 0) AS idle_in_trx,
    b.trx_query AS blocking_query
    information_schema.innodb_lock_waits AS w
        INNER JOIN
    information_schema.innodb_trx AS b ON b.trx_id = w.blocking_trx_id
        INNER JOIN
    information_schema.innodb_trx AS r ON r.trx_id = w.requesting_trx_id
        INNER JOIN
    information_schema.innodb_locks AS l ON w.requested_lock_id - l.lock_id
        LEFT JOIN
    information_schema.PROCESSLIST AS p ON p.id = b.trx_mysql_thread_id
ORDER BY wait_time DESC;

Categories: DBA Blogs

API Integration with Zapier (Gmail to Salesforce)

Sun, 2014-09-07 12:42
Recently I attended a training session with +General Assembly  in London titled, What and Why of APIs. It was a training session focusing on usage of APIs and it was not technical at all. I find these type of training sessions very useful as they describe concepts and controlling ideas behind technologies rather than the hands-on, involved implementation details.

What grabbed my attention from the many different and very useful public and private API tools, 'thingies', introduced in this training session was Zapier. - www.zapier.com

Zapier looked to me as a platform for integrating APIs with clicks rather than code, with declarative programming. Is a way of automating the internet. What you get when you sign up with them is the ability to use 'Zaps', or create your own zaps. Zaps are integration of endpoints, like connecting Foursquare to Facebook or Gmail to Salesforce and syncing them. One of the Zaps available does that, connects your Gmail emails to Salesforce using the Gmail and Salesforce APIs and lets you sync between them. Not only that, but Zapier Zaps also put triggers on the endpoints which allow you to sync only when certain conditions are true. For example the Gmail to Salesforce Zap can push your email into a Salesforce Lead only when an email with a certain subject arrives to your gmail inbox. This is what a Zapier platform looks like:

An individual Zap looks like this and is nothing more than a mapping of the Endpoints with some trigger actions and filters.

The environment is self-documenting and very easy to use. All you do is drag and drop gmail fields and match them with the Lead, or other custom object Salesforce fields. Then you configure the sync to happen only under certain conditions/filters. Really easy to set-up. The free version runs the sync every 5 hours, well good enough for me. The paid version runs the sync every 5 minutes. 

There is even capability to track historical runs and trigger a manual run via the Zap menu. See below the 'Run' command to run a Zap whenever you like. 

In my case I used the tool to create a Zap to do exactly what I just described. My Zap creates a Salesforce Lead automatically in my Salesforce org whenever a 'special' email is sent to me. Great automation!

This is a taste of the 'platform cloud' tools out there to do API to API and App to App integrations with clicks and not code. With tools like Zapier all you really need is, imagination!

More links:

Categories: DBA Blogs

MySQL on-premise to Amazon RDS migration tips

Mon, 2014-07-14 14:27
Things to watch and do when migrating MySQL databases from ‘on-premise’ to Amazon AWS RDS

  • Not all versions of databases can be migrated to RDS. Especially if you want to do a 0 downtime migration. Make sure you know which versions are possible, at this writing Amazon announced that it  will support any old version of MySQL 5.1 and above.
  • In a zero downtime migration to Amazon RDS you work with mysqldump or mydumper to  import the baseline data and and then you use MySQL Replication and the binary_log  position to apply the additional records created during the import, the delta.  That is it is possible to create a MySQL slave in the Amazon AWS Clouds!
  • So when you have confirmed the on-premise MySQL that you have is  compatible you can then use mysqldump with the --master-data parameter to export your data including the binlog  position coordinates at the time of  the export. You can use mydumper if yor database is big to do this with parallel streams. You will use the coordinates and MySQL replication to catch-up with the on-premise master database when creating the MySQL slave in RDS.
  • Use different database parameters for different databases.
  • As  you load the RDS database using myloader or  mysql the operation might take long time depending on the size of your database. If this is the case, disable backups, it stops logging, try using one  of the better spec RDS Instance classes and IOPS for the duration of the operation. You can always downsize the RDS instance after you have  completed the initial load.
  • After you have completed the initial load, use Multi AZ which is a synchronous standby (in Oracle parlour) and schedule the backups immediately before you open your applications to the database, as initial backup requires a reboot.
  • Beware there is no SSH access to RDS, that means you have no access to the file system.
  • Get the DB Secuirty groups right and make sure your applications can access the RDS instances
Categories: DBA Blogs

My MySQL database impressions

Mon, 2014-05-05 07:30
I have been in the data and database world for over a decade now, working with mainly Oracle and data integration projects with Salesforce.

I have also spent time working with MySQL a relational database with open source roots, now part of Oracle. In this post I want to talk about my impressions of MySQL and some tips and tricks I learned working with it.

First and for all, you will have to be ready to get your hands dirty. I couldn't find a package structure for database administration in MySQL - like DBMS libraries of packages and procedures in Oracle. That means you will have to do most of the things on your own. Nevertheless good news is he database starts showing an Oracle banner when you login from version 5.0 onwards and some features like on-line Schema changes, more cost based optimisation and partitioning are added in versions 5.6 - a sign of good things to come.

Some key points

  • Data Import/Export - You can use the native mysqldump utility to dump data with parameters, but it is slow. You can dump schemas and data. I couldn't get it to dump data fast (in parallel) though that is why I strongly recommend mydumper an open source utility written by guys in Oracle and MySQL to dump data using parallel threads and is very fast. Import can be done in parallel as well and it can give you that boost provided your hardware permits it. Don't try to disable constraints, drop indexes before imports as you will read in posts and suggestions on the net, mysqldump already does that for you.
  • Hot Backup - mylvmbackup seems like the de-facto script to take hot backups when the database in online. There are tools like XtraBackup from Percona too. It takes a snapshot of the disk where your datafiles and logfiles are. At restore it does a crash recovery using the logs and brings the database transactions forwards to the point of crash. Then if you have the logs after that, you can play them forwards and bring the database to a point in time after the backup. 
  • Parallel processing - Nada, there is none! I couldn't get it do anything in parallel. The only thing I managed to do in parallel was to export and import data with mydumper, that works! So if you have many CPUs you will be watching them being idle most of the time as one thread only will be chugging away. Unless you use mydumper for your import/export operations where you can make those CPUs sweat. 
  • DBMS packages - You fancy automating, do you need scripts to do repetitive tasks? Well there is no DBMS package library to help you administer the database in MySQL. Instead, you can use Percona Toolkit scripts, a consultancy specialising in helping MySQL DBAs to do great work with MySQL databases. They have a variety of scripts from comparing (diff), syncing databases, tables to extracting metadata and GRANTS structures.  
  • Hints, Explain Plan, Performance Tuning. I couldn't see much of Cost Based Optimisation in MySQL, the data dictionary (INFORMATION_SCHEMA) has metadata names but doesn't hold any dynamic statistics about objects, estimates of counts of rows in tables and indexes it holds can be up 50% wrong. The whole thing is based on heuristics, I suppose. The EXPLAIN PLAN is just a row where it says what the optimiser will do, there is no cost analysis or logical tree structure of execution plans yet.  I couldn't see much on Join orders either, no Nested Loops, HASH or MERGE joins yet. 

MySQL is a popular, relational database. The free version of this database is probably what a small website and a start-up needs. But having said that, many sites outgrow MySQL and still stay with it.

Oracle will probably turn it to a serious database too. Adding partitioning, multi threading to it in the recent releases, is a step forwards in becoming an Enterprise size and scale database.  I don't know much about the MySQL Cluster Version and MySQL Replication I know takes a load off from the reads. I want to see it doing more Performance Tuning science.

Top tools with MySQL that I used

MySQL Workbench - SQL IDE.
Mydumper - Fast logical backup and restore.
Mylvmbackup - Hot backup script
Pentaho Kettle - PDI is an all round data integration and middle-ware tool

Categories: DBA Blogs

DataDirect Access your favorite SaaS app with SQL

Fri, 2013-06-28 08:24
Database driver company +Progress DataDirect  are writing the next generation of data access. They are working on a platform called Datadirect which will enable you to access loads of data sources in the cloud, from the cloud, and some of them with SQL

Yes you have read correct, SQL!

All you have to do is visit their website and sign up for a trial account and access your favorite SaaS app - i.e Salesforce  or any other provided Cloud app with SQL.

Here is how their availability of connectors looks like


To sign up just follow this link.

I very much like this noble idea. SQL is the universal language for data manipulation and access. And things like Big Data, Hadoop in the horizon they are pro-active and seems like they are already working on it.

See some list of data sources they can/will SQL:

  • Twitter
  • Facebook
  • Hive
  • Eloqua
  • force.com
  • Salesforce
  • SQLAzure
  • Microsoft Dynamics CRM
  • and many more...

This is on my hot list of technology to watch. Well done DataDirect. 
Categories: DBA Blogs

Salesforce.com Real Time integration with Oracle using Informatica PowerCenter 9.5

Tue, 2013-06-04 17:03
In this post I will describe how you can integrate your Salesforce.com org with a relational database, like Oracle in real time, or better 'near' real time!

Many times I come across the requirement of quickly propagating changes from cloud platforms like Salesforce.com to on-premise data stores. You can do this with webservices, but that is not middleware and it requires coding.

How about doing with a data integration tool?

+Informatica Corporation's  Informatica PowerCenter can achieve this by using the CDC (Change Data Capture) feature of the Informatica PowerCenter Salesforce connector, when Salesforce is the source in a mapping.

The configuration is simple. All you really have to set up is 2 properties in the Mapping Tab of a Session Task in Informatica Workflow Manager.

These are the properties:
  • Time Limit property to -1
  • Flush interval property to 60 seconds (minimum 60 seconds)
See a picture from one of my settings

And here is what these two settings mean from the PowerCenter PowerExchange for Salesforce.com User Guide:

CDC Time Limit

Time period (in seconds) that the Integration Service reads changed Salesforce data. When you set the CDC Time Limit to a non-zero value, the Integration Service performs a full initial read of the source data and then captures changes to the Salesforce data for the time period you specify. Set the value to -1 to capture changed data for an infinite period of time. Default is 0. 

Flush Interval

Interval (in seconds) at which the Integration Service captures changed Salesforce data. Default is 300. If you set the CDC Time Limit to a non-zero value, the Integration Service captures changed data from the source every 300 seconds. Otherwise, the Integration Service ignores this value.

That's it, you don't have to configure anything else!

Once you set up these properties in the mapping tab of a session, save and restart the task in the workflow, the task will run continuously, non stop. The connector will poll the Salesforce org continuously and propagate any changes you do in Salesforce, downstream to the premise database system, including INSERT, UPDATE and DELETE operations.


More reading:

SFDC CDC implementation in Informatica PowerCenter

Categories: DBA Blogs

UTC timestamps for Salesforce from Oracle

Fri, 2013-05-03 01:00
I came across the requirement the other day to update Salesforce every 5 minutes with data from Oracle. 

 The data in Oracle was a simple table with few columns A,B,C  and a timestamp column T indicating the last modified date/time of the record. 

To my surprise whenever I sent data changes from Oracle, mapping the columns and the timestamp to their corresponding Salesforce fields, the Salesforce field T would record the time 1 hour ahead of the wall clock time!

Quickly I realized that Salesforce, no matter where you are in the wold entering data to it, it always saves your standard / custom date/time fields in UTC (think of it as new GMT). The UI takes care of displaying your local time by working out the difference of where you are from the UTC timestamp. 

The 1 hour difference I was experiencing was because of Time-Zones and DST - Daylight Saving Time. I live in London, United Kingdom and currently the DST here says BST which is British Summer Time, and that is GMT + 1 hour. So if you modify data in Salesforce at 14:00, according to wall clocks in London, in your custom timestamp field in Salesforce it will actually record 13:00! (UTC time).

So when sending timestamps to Salesforce you have to send them in UTC!  

In case of my Oracle table A,B,C,T the date/time column was saying:

01/05/2013 17:07:20

If you send this timestamp to Salesforce as is, it will record this as a UTC value, but then because of the Salesforce UI and because of BST (GMT+1 DST) it will display:

01/05/2013 18:07:20 (1 hour ahead in the future!)

So the solution was to calculate the date/time in Oracle as UTC always, as this is the accepted value for Salesforce. To do exactly that and to provide the date/time field always in UTC no matter of DST I used the Oracle SQL CAST function to obtain the given date/time value at GMT (UTC) like this:

select to_char(cast(my_date as timestamp with local time zone) at time zone 'GMT', 'DD/MM/YYYY HH24:MI:SS') utc_my_date from my_table;

The formatting above will always return a GMT (UTC) date/time no matter which timezone or DST you are in. Then you can safely pass that as a timestamp to Salesforce and be sure that the correct time is going to be saved in Salesforce.

Categories: DBA Blogs

SQL Access to Salesforce data

Thu, 2013-02-14 13:13
In this post I will talk about an ODBC/JDBC driver solution I discovered lately which enables you to Access your Salesforce data using the standard SQL query language.

The company which provides these drivers is called Progress|DataDirect 

Their JDBC/ODBC Salesforce Connect XE drivers, acts as translators between SQL and SOQL (The Salesforce proprietary query language). So you can write your joins, use expressions like SUBSTRING on your Salesforce data as if the data was in a relational database.

I found the concept quite interesting.  If you already know SQL -and there are many people which do- you can just query data in any Salesforce standard objects like Account, Contact and custom objects with the use of a SQL Client tool.

For my post I used SQuireL SQL Client and the JDBC driver. You can easily point your own SQL Client tool to these drivers too.  ODBC and JDBC are data access standards many tools comply with.

To get started

1. Download Progress|DataDirect Salesforce.com JDBC driver the file will be called something like this PROGRESS_DATADIRECT_CONNECT_JDBC_5.1.0.jar  ( I have downloaded the 15 day trial version)

2. Install the driver as per instructions found here and more generic info like User Guide is here

3. Configure SQuireL SQL Client to use the driver (to install SQuireL go here)

Once you have downloaded the driver and you have installed it. Start the SQuireL Client tool and register the Salsforce.com JDBC driver with SQuireL like this:

Go to Drivers tab on the right and click the + sign.

    Add a driver by using the plus sign (+) and fill in the driver details like below. You can find out the Class Name and the Website URLs, Extra Class Payt for the Progress|DataDirect Salesforce.com JDBC drivers here. You can find out more about connection properties here DataDirect Connect Series for JDBC User's Guide

      Once you have configured the driver is time to add an Alias SQuireL connection to your Salesforce Org. I used my developer org below. Just follow the wizard to add the alias. It will ask you for the driver to use. Choose the Salesforce driver you have crated above. 

      Once you have created the alias (connection) is time to test your alias and connect to Salesforce with the new JDBC driver. Provide your Salesforce credentials like this:

      Once you connect then you can issue a SQL join and write any SQL statement to your Salesforce Org as below.  Below I am writing a SQL join between Account with Contact Salesforce standard objects.

      What the driver really does is to translate your SQL to SOQL.

      To read more about the JDBC and other drivers go to the company site Datadirect.com

      There is potential here, imagine all the data profiling, data quality operations and data integrations, you can do 'in place' in SaaS and Cloud systems with SQL, without having to move the data around a lot. 

      More resources

      More blogs and resources on SQL Access to Salesforce can be found below. There is a very interesting blog post which shows you how you can access the Salesforce data from within Oracle directly here too.

      Categories: DBA Blogs

      Send tweets from Oracle with OAuth

      Thu, 2013-02-07 17:03
      Twitter is a great real time social platform. Timelines and hashtags are a great way to communicate to an audience of subscribers relevant information.

      There are lots of websites and applications which use twitter to provide content to their users. There are programming languages, Java, PHP, Python which have build Twitter API libraries to quickly send and receive content from Twitter but I haven't come accross a fully fledged library in PL/SQL, which would enable you to send a Tweet from PL/SQL using Twitter's latest oAuth security protocol. 

      There are blogs out there which show you how to send a tweet from Oracle, but not many using the new oAuth security protocol.

      The only blog post I have seen so far which uses PL/SQL and oAuth to send a tweet from Oracle is of Some coding hero's heroic mumblings an Oracle consultant. 

      I have spend time reviewing his code with the comments made on his post which he has wrote in 2010 and managed to trim it down to use only one of his PL/SQL procedures.  The procedure below sends the 140 characters tweet  for you using oAuth. To be able to do this, as Some Coding Hero says, you will have to create a Twitter Application for your twitter handle. Actually you don't create an application you just sign up and obtain 4 security codes! Once you register your application with Twitter,  you are given 4 oAuth security Codes as follows:

      With the above 4 oAuth security codes you can use Somecodingheros last block of code only to send a tweet from Oracle - I used Oracle Apex - like this:

      new Twitter API 1.1 update and the use of Oracle Wallet is added to the script in green below on  21-JUN-2013

      -- Twitter API 1.1 update
        l_oauth_request_token_url CONSTANT VARCHAR2 (500) := 'https://api.twitter.com/1.1/statuses/update.json';   
        l_oauth_consumer_key CONSTANT VARCHAR2 (500) := 'xxxxxxxxx';  
        l_oauth_token  CONSTANT VARCHAR2 (500) := 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxx';  
        l_oauth_secret CONSTANT VARCHAR2 (500) := 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';  
        l_oauth_nonce VARCHAR2 (500);  
        l_oauth_signature_method CONSTANT VARCHAR2 (10) := urlencode ('HMAC-SHA1');  
        l_oauth_timestamp VARCHAR2 (100);  
        l_oauth_version CONSTANT VARCHAR2 (5) := urlencode ('1.0');  
        l_oauth_consumer_secret CONSTANT VARCHAR2 (500) := 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';  
        l_http_method VARCHAR2 (5) := 'POST';  
        l_oauth_base_string VARCHAR2 (2000);  
        l_oauth_key VARCHAR2 (500) := l_oauth_consumer_secret || '&' || l_oauth_secret ;  
        l_sig_mac RAW (2000);  
        l_base64_sig_mac VARCHAR2 (100);  
        http_req UTL_HTTP.req;  
        http_resp UTL_HTTP.resp;  
        l_update_send VARCHAR2(2000);  
        l_oauth_header  VARCHAR2(2000);  
        l_line  VARCHAR2(1024);  
        resp_name  VARCHAR2(256);  
        resp_value VARCHAR2(1024);  
      -- put the tweet in the urlencode function below 
        l_content varchar2(140) := urlencode('@somecodinghero thank you');  
        l_random varchar2(25);  

       -- Oracle Wallet
          utl_http.set_wallet('file:/home/app/oracle/product/11.2.0/dbhome_1/owm/wallets/oracle', 'putyourwalletpasswordhere');

       -- Get the timestamp  
        SELECT urlencode ((SYSDATE - TO_DATE ('01-01-1970', 'DD-MM-YYYY'))  * (86400)) 
        INTO l_oauth_timestamp  
        FROM DUAL;  
        -- RANDOM oauth_nonce  
        SELECT dbms_random.string('A',25)  
        INTO l_random  
        FROM DUAL;  
        SELECT urlencode (UTL_ENCODE.base64_encode(UTL_I18N.string_to_raw (l_random, 'AL32UTF8')))  
        INTO l_oauth_nonce  
        FROM DUAL;  
        l_oauth_base_string := l_http_method   
                                || '&'  
                                || urlencode (l_oauth_request_token_url)  
                                || '&'  
                                || urlencode ( 'oauth_consumer_key'  
                                    || '='  
                                    || l_oauth_consumer_key  
                                    || '&'  
                                    || 'oauth_nonce'  
                                    || '='  
                                    || l_oauth_nonce  
                                    || '&'  
                                    || 'oauth_signature_method'  
                                    || '='  
                                    || l_oauth_signature_method  
                                    || '&'  
                                    || 'oauth_timestamp'  
                                    || '='  
                                    || l_oauth_timestamp  
                                    || '&'  
                                    || 'oauth_token'  
                                    || '='  
                                    || l_oauth_token  
                                    || '&'  
                                    || 'oauth_version'  
                                    || '='  
                                    || l_oauth_version  
                                    || '&'  
                                    || 'status'  
                                    || '='  
                                    || l_content);  
        DBMS_OUTPUT.put_line (l_oauth_base_string);  
        l_sig_mac := DBMS_CRYPTO.mac (  UTL_I18N.string_to_raw (l_oauth_base_string, 'AL32UTF8')  
                                      , DBMS_CRYPTO.hmac_sh1  
                                      , UTL_I18N.string_to_raw (l_oauth_key, 'AL32UTF8'));  
        DBMS_OUTPUT.put_line ('Combined sig: ' || l_oauth_key);  
        l_base64_sig_mac := UTL_RAW.cast_to_varchar2 (UTL_ENCODE.base64_encode (l_sig_mac));  
        DBMS_OUTPUT.put_line ('MAC Signature (Base64-encoded): ' ||  l_base64_sig_mac);  
        l_update_send := l_oauth_request_token_url || '?status=' || l_content;  
          http_req := UTL_HTTP.begin_request (  l_update_send  
                                              , l_http_method  
                                              , UTL_HTTP.http_version_1_1);  
         DBMS_OUTPUT.put_line ('UPDATE URL ' || l_update_send);  
         UTL_HTTP.set_response_error_check (TRUE);  
         UTL_HTTP.set_detailed_excp_support (TRUE);  
          l_oauth_header := 'OAuth oauth_nonce="' || l_oauth_nonce || '", '  
                            || 'oauth_signature_method="'|| l_oauth_signature_method || '", '  
                            || 'oauth_timestamp="'|| l_oauth_timestamp || '", '  
                            || 'oauth_consumer_key="'|| l_oauth_consumer_key || '", '  
                            || 'oauth_token="' || l_oauth_token || '", '  
                            || 'oauth_signature="' || urlencode (l_base64_sig_mac) || '", '  
                            || 'oauth_version="' || l_oauth_version || '"';  
          utl_http.set_header ( r => http_req,   
                                NAME => 'Authorization', VALUE => l_oauth_header);  
          DBMS_OUTPUT.put_line  ('HEADER: ' || l_oauth_header);                          
          utl_http.write_text(  r => http_req, DATA => l_content);   
          http_resp := utl_http.get_response(r => http_req);  
         FOR i IN 1..utl_http.get_header_count(http_resp) LOOP  
          utl_http.get_header(http_resp, i, resp_name, resp_value);  
          dbms_output.put_line(resp_name || ': ' || resp_value);  
         END LOOP;  
        DBMS_OUTPUT.put_line('Getting content:');  
              utl_http.read_line(http_resp, resp_value, TRUE);  
            END LOOP;  
            WHEN utl_http.end_of_body THEN  
              DBMS_OUTPUT.put_line('No more content.');  
         utl_http.end_response(r => http_resp);  
          when others then  
            DBMS_OUTPUT.put_line('HTTP ERROR: ' || utl_http.get_detailed_sqlerrm);  

      Thank you Somecodinghero!
      Categories: DBA Blogs

      Is DELETE necessary?

      Sat, 2013-01-26 04:35
      A  delete is a very destructive operation for a database. An operation which requires a lot of  thinking and preparation and most of the time is irrevocable. You can get in a lot of trouble if you delete the wrong set of rows and lose all that very important data!

      Maybe the DELETE command should be banned from SQL. It was probably invented when the disks were tiny and it might not be suitable for our times, especially nowadays when we talk about things like BIG DATA.

      Why do we have to delete anyway? We should invalidate and age out data and never destroy it. I am one of those who would say ‘keep it’ and never lose it. Even a typo, an error while entering somebody’s name in a text field, tells us a story. The number of attempts to get it right, the number of characters typed which were wrong, the time the wrong entry took place and much more, is data. Temporal databases come to mind. Why delete it?

      But, one can argue that not deleting can be dangerous, too! One can claim that by refusing to delete we might end up with a lot of old, duplicated, bad quality, uncleaned, irrelevant and untrustworthy data.

      Maybe the worse thing that can happen to data is to refuse to delete it enough. Perhaps because of this lack of enough deletes and fear of deletes we end up with all the 'bad data' found in some  database systems today.

      Whatever you choose to do, delete or not delete, one thing's for sure and that is that you should know what you are deleting or not deleting. Choosing confidently what to delete comes from understanding and knowing your data model, and the implications a delete will have on the data model and never because the data is not required for your project! 

      Categories: DBA Blogs