Skip navigation.

Feed aggregator

University of California’s $220 million payroll project reboot

Michael Feldstein - 17 hours 46 min ago

Chris Newfield has an excellent post at Remaking the University about the University of California’s budget situation and how it relates to the recent Moody’s negative outlook on higher education finances. The whole article is worth reading, but one section jumped off the page for me [emphasis added].

The sadder example of ongoing debt is the request for “external financing for the UCPath project.” UC Path was UCOP’s flagship solution to UC inefficiencies that were allegedly wasting taxpayers’ money–in other words, new enterprise software for the systemwide consolidation of payroll and human resources functions. This is boring, important back office stuff, hardly good material for a political campaign to show the state “UC means business,” but that’s what it became. Rather than funding each campus’s decades-old effort to upgrade its systems on its own, UCOP sought centralization, which predictably introduced new levels of cost, complexity, and inefficiency, since centralization is often not actually efficient.

I had heard nothing good about UC Path from people trying to implement it on campuses, and have tried to ignore it, but this week it has resurfaced as a problem at the Regental level. The project timeline has grown from 48 to 72 months, and its costs are said to be $220 million (it had spent $131 million by May 2014) . Worse, the repayment schedule has mushroomed from seven to twenty years. Annual payments are to be something like $25 million. Campuses are to be taxed to pay for 2015-era systems until 2035, which is like taking out a twenty year mortgage to pay for your refrigerator, except that your fridge will be working better in 2035 than next year’s PeopleSoft product. Since the concurrent budget document notes efficiency savings of $30 million per year (top of page 4), UCOP may be spending $220 million to save a net $5 million per year over a couple of decades–and going into debt to do it. In the end, an efficiency measure has turned into a literal liability.

What the hell – a $220 million project to save money? How did this project get in this much trouble?

The UCPath project concept originated in 2009 with the project announcement coming in late 2011. The goal is to replace the Payroll Personnel System (PPS) that runs separately for each of the 11 UC locations with Oracle’s PeopleSoft payroll and HR systems. PPS is over 30 years old, and there are major risk issues with such an old system as well as a host of inefficient processes. The original project plans were based on a $170 million budget1 with the first wave of go-live for the Office of the President and 3 campuses scheduled for early 2013. All campuses would be live on the new system by late 2014.2

In a presentation to the Board of Regents in January 2012:

Over the same period, cost reductions are expected to be approximately $750 million from technology efficiency gains, process standardization and consolidation of transactional activities into a UC-wide shared services center. Overall, the project has a net present value of approximately $230 million (at a nine percent discount rate) with breakeven in year 5.

Subsequent promises were made in March of 2012:

We think this project is likely to pay for itself within five years, and UC could be accruing over $100 million in annual savings by the eighth year,” said Peter Taylor, UC’s chief financial officer. “We also expect to deliver HR and payroll services with increased efficiency, accuracy and quality.”

At the Board of Regents’ meeting last week, the project team gave the first update to the regents since January 2012 (itself a troubling sign). See this Sharestream video from 2:56:10 – 3:22:40.

By Fall 2013 the project was in trouble, and UC leadership brought in new leadership for the project: Mark Cianca as Deputy CIO and Sabu Varghese as Program Director. Their first act was to do a health check on the project, and the results were not pretty (as described in last week’s Board of Regents’ meeting).

  • The project team and implementation partner (Oracle) had treated the project as a software replacement rather than a fundamental business transformation initiative.
  • The individual campuses had not been consulted on changes in business processes, and in fact they had not even been asked to sign off on future state business processes that each campus would have to run to stay in operation.
  • The new project team had to go through more than 100 future state processes with campuses and get agreement on how to proceed.

The result, as described by UC President Janet Napolitano at last week’s meeting, was the team having to “reboot the entire project”.

Based on the reboot, the current plan is $220 million with first wave complete by February 2016 and all campuses live by mid 2017. That’s $50 million over budget and 24 months over schedule.

Deployment Schedule Jul 2014

But the planning is not complete. They are working up their “final” replan of budget and timeline, which they will present in January 2015.

Topics for Jan 2015

How solid is the current estimate? The implementation schedule is listed as the highest risk, even with the delays.

Major Risks Jul 2014

The project financing has changed so much that UC is now facing the need to use external financing over a much longer term, as described in the material for last week’s board meeting.

Therefore, this item seeks approval to refinance the UCPath loan out of CapEquip and into external financing to achieve the financing customization required. As indicated above, the original repayment plan based on the $220.5 million budget was expected to have been repaid with annual debt service of $25 million. This would have resulted in a 12-year loan term once principal was to be repaid. In January 2015, UCPath project leadership plans to present a revised project timeline, a revised project budget and a revised estimated loan repayment schedule. Project leadership will work with the campus budget officers (and campus budget department staff) to develop: (1) an appropriate campus cost allocation strategy; (2) an estimated repayment schedule that will reflect commencement of principal repayments in conjunction with the final campus deployment (estimated to be early 2017); and (3) an estimated 15-20 year loan repayment period.

  • The new project team seems quite credible, and for the most part they addressed the right points during the briefing. Kudos to UC for making this change in leadership.
  • This is a major project turnaround (or reboot, in Napolitano’s words), but I’m not sure that UC had communicated the significance of the project changes to system campuses (and certainly not to the media).
  • I would view the current plan of $220 million and Q1 2017 full deployment as best case situation – the team told the regents that they were going to update the plan, and ERP project almost never come in earlier than planned.
  • The actual amount is much higher than $220 based on this footnote: “The $10 million in tenant improvements approved for the UCPath Center Riverside site as well as the $17.4 million purchase of the facility (UCPath is currently projected to use no more than 50 percent of the building) are not included in the figures above.”
  • How do you go 2.5 years between updates from what is now a quarter billion dollar project?
  • What about the current estimate of benefits – is it $30 million per year as Chris described or closer to $100 million per year? One big concern I have is that the information on project benefits was not updated, presented to the regents, or asked by the regents. While I question the $25 million financing and $30 million benefits numbers, I think Chris got it exactly right by noting how UC administration is failing to ask hard questions:

Moving forward, I’m afraid that officials are going to have to get much better at admitting mistakes like UCPath, and then actually undoing them. I couldn’t listen to the recording of the UCPath conversation, but Cloudminder made it sound like a lot of restrained finger-pointing with no solution in sight. Did anyone say, “well, this seemed like a good idea at the time, but it’s not. Let’s just cancel it, figure out where we went wrong, and come up with something better”?

It is possible that continuing with the rebooted project is the right answer, but UC is not even asking the question. Failing to ask whether 15-20 year financing of a new ERP makes sense seems like a major oversight. Won’t this lock UC into an Oracle system that is already antiquated for another two decades or more? It seems stunning to me that UC is planning to commit to $220 million of external financing without asking some basic questions.

  1. one regent last week stated the original request was actually $156 million.
  2. All public projects should fear the Wayback Machine for checking old web pages.

The post University of California’s $220 million payroll project reboot appeared first on e-Literate.

Oracle Data as a Service for Business - Launch Webcast

Running a data-driven enterprise is key to gaining competitive advantage, but many businesses still struggle with a myriad of point solutions that are siloed, varied in quality, and complex to...

We share our skills to maximize your revenue!
Categories: DBA Blogs

How To Correlate Oracle Database Transaction with GoldenGate

Pythian Group - Mon, 2014-07-21 13:19

So there I was troubleshooting GoldenGate issue and was puzzled as to why GoldenGate transactions were not seen from Oracle database.

I had the transaction XID correct; however, I was filtering by ACTIVE transaction from Oracle which was causing the issue.

Please allow me to share a test case so that you don’t get stumped like I did.

Identify current log and update table

ARROW:(SOE@san):PRIMARY> select max(sequence#)+1 from v$log_history;



1 row updated.


From GoldenGate, find opened transactions for duration of 10 minutes

$ ./ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 18343248 OGGCORE_11.
Linux, x64, 64bit (optimized), Oracle 11g on Apr  4 2014 15:18:36

Copyright (C) 1995, 2014, Oracle and/or its affiliates. All rights reserved.

GGSCI (arrow.localdomain) 1> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

EXTRACT     RUNNING     ESAN        00:00:00      00:00:05
EXTRACT     STOPPED     PSAN_LAS    00:00:00      68:02:14
REPLICAT    STOPPED     RLAS_SAN    00:00:00      68:02:12

GGSCI (arrow.localdomain) 2> send esan, status

Sending STATUS request to EXTRACT ESAN ...

  Current status: Recovery complete: At EOF

  Current read position:
  Redo thread #: 1
  Sequence #: 196
  RBA: 5861376
  Timestamp: 2014-07-21 10:52:59.000000
  SCN: 0.1653210
  Current write position:
  Sequence #: 7
  RBA: 1130
  Timestamp: 2014-07-21 10:52:52.621948
  Extract Trail: /u01/app/ggs01/dirdat/ss

GGSCI (arrow.localdomain) 3> send esan, showtrans duration 10m

Sending showtrans request to EXTRACT ESAN ...

Oldest redo log file necessary to restart Extract is:

Redo Log Sequence Number 196, RBA 4955152

XID:                  3.29.673
Items:                1
Extract:              ESAN
Redo Thread:          1
Start Time:           2014-07-21:10:41:41
SCN:                  0.1652053 (1652053)
Redo Seq:             196
Redo RBA:             4955152
Status:               Running

GGSCI (arrow.localdomain) 4>

Note the Redo Seq: 196 matches the sequence when the update was performed from Oracle database.
Also, note XID: 3.29.673

Let’s find the transaction from the database an notice the XID matches between GoldenGate and Oracle database.

ARROW:(SYS@san):PRIMARY> @trans.sql

START_TIME           XID              STATUS          SID    SERIAL# USERNAME           STATUS   SCHEMANAME         SQLID              CHILD
-------------------- ---------------- -------- ---------- ---------- ------------------ -------- ------------------ ------------- ----------
07/21/14 10:41:39    3.29.673         INACTIVE        105          9 SOE                INACTIVE SOE                6cmmk52wfnr7r          0

ARROW:(SYS@san):PRIMARY> @xplan.sql
Enter value for sqlid: 6cmmk52wfnr7r
Enter value for child: 0
SQL_ID  6cmmk52wfnr7r, child number 0

Plan hash value: 2141863993

| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | UPDATE STATEMENT   |              |       |       |     3 (100)|          |
|   1 |  UPDATE            | INVENTORIES  |       |       |            |          |
|*  2 |   INDEX UNIQUE SCAN| INVENTORY_PK |     1 |    14 |     2   (0)| 00:00:01 |

Predicate Information (identified by operation id):

   2 - access("PRODUCT_ID"=171 AND "WAREHOUSE_ID"=560)

20 rows selected.


For fun, switched logfile and perform another update.

ARROW:(MDINH@san):PRIMARY> select max(sequence#)+1 from v$log_history;


ARROW:(MDINH@san):PRIMARY> alter system switch logfile;

System altered.


System altered.


System altered.


System altered.

ARROW:(MDINH@san):PRIMARY> select max(sequence#)+1 from v$log_history;



883 rows updated.


Check GoldenGate transactions to find 2 open transactions, one from Redo Seq: 196 and one from Redo Seq: 200

GGSCI (arrow.localdomain) 1> send esan, showtrans

Sending SHOWTRANS request to EXTRACT ESAN ...

Oldest redo log file necessary to restart Extract is:

Redo Log Sequence Number 196, RBA 4955152

XID:                  3.29.673
Items:                1
Extract:              ESAN
Redo Thread:          1
Start Time:           2014-07-21:10:41:41
SCN:                  0.1652053 (1652053)
Redo Seq:             196
Redo RBA:             4955152
Status:               Running

XID:                  4.20.516
Items:                883
Extract:              ESAN
Redo Thread:          1
Start Time:           2014-07-21:11:03:20
SCN:                  0.1654314 (1654314)
Redo Seq:             200
Redo RBA:             5136
Status:               Running

GGSCI (arrow.localdomain) 2>

Let’s kill the transaction by SOE user.

ARROW:(SYS@san):PRIMARY> @trans.sql

START_TIME           XID              STATUS          SID    SERIAL# USERNAME           STATUS   SCHEMANAME         SQLID              CHILD
-------------------- ---------------- -------- ---------- ---------- ------------------ -------- ------------------ ------------- ----------
07/21/14 10:41:39    3.29.673         INACTIVE        105          9 SOE                INACTIVE SOE                6cmmk52wfnr7r          0
07/21/14 11:03:19    4.20.516         INACTIVE         18         53 MDINH              INACTIVE MDINH              a5qywm8993bqg          0

ARROW:(SYS@san):PRIMARY> @xplan.sql
Enter value for sqlid: a5qywm8993bqg
Enter value for child: 0
SQL_ID  a5qywm8993bqg, child number 0

Plan hash value: 1060265186

| Id  | Operation         | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | UPDATE STATEMENT  |                |       |       |    28 (100)|          |
|   1 |  UPDATE           | INVENTORIES    |       |       |            |          |
|*  2 |   INDEX RANGE SCAN| INV_PRODUCT_IX |   900 | 12600 |     4   (0)| 00:00:01 |

Predicate Information (identified by operation id):

   2 - access("PRODUCT_ID"=170)

20 rows selected.

ARROW:(SYS@san):PRIMARY> alter system kill session '105,9' immediate;

System altered.

ARROW:(SYS@san):PRIMARY> @trans.sql

START_TIME           XID              STATUS          SID    SERIAL# USERNAME           STATUS   SCHEMANAME         SQLID              CHILD
-------------------- ---------------- -------- ---------- ---------- ------------------ -------- ------------------ ------------- ----------
07/21/14 11:03:19    4.20.516         INACTIVE         18         53 MDINH              INACTIVE MDINH              a5qywm8993bqg          0


Verify transaction from killed session is removed from GoldenGate

GGSCI (arrow.localdomain) 1> send esan, status

Sending STATUS request to EXTRACT ESAN ...

  Current status: Recovery complete: At EOF

  Current read position:
  Redo thread #: 1
  Sequence #: 200
  RBA: 464896
  Timestamp: 2014-07-21 11:06:40.000000
  SCN: 0.1654584
  Current write position:
  Sequence #: 7
  RBA: 1130
  Timestamp: 2014-07-21 11:06:37.435383
  Extract Trail: /u01/app/ggs01/dirdat/ss

GGSCI (arrow.localdomain) 2> send esan, showtrans

Sending SHOWTRANS request to EXTRACT ESAN ...

Oldest redo log file necessary to restart Extract is:

Redo Log Sequence Number 200, RBA 5136

XID:                  4.20.516
Items:                883
Extract:              ESAN
Redo Thread:          1
Start Time:           2014-07-21:11:03:20
SCN:                  0.1654314 (1654314)
Redo Seq:             200
Redo RBA:             5136
Status:               Running

GGSCI (arrow.localdomain) 3>

-- trans.sql
set lines 200 pages 1000
col xid for a16
col username for a18
col schemaname for a18
col osuser for a12
select t.start_time, t.xidusn||'.'||t.xidslot||'.'||t.xidsqn xid, s.status,
decode(s.sql_id,null,s.prev_sql_id) sqlid, decode(s.sql_child_number,null,s.prev_child_number) child
from v$transaction t, v$session s
where s.saddr = t.ses_addr
order by t.start_time


Categories: DBA Blogs

Understanding and using tokens in Oracle #GoldenGate

DBASolved - Mon, 2014-07-21 10:53

Recently, I’ve been doing some work with a client where tokens need to be used.  It came to my attention that the basic usage of tokens is misunderstood.  Let’s see if I can clear this up a bit for people reading.

In Oracle GoldenGate, tokens are a way to capture and store data in the header of the trail file (more info on trail headers here).  Once a token has been defined, captured and stored in the header, it can be retrieved, on the apply side, and used in many ways to customize what information is delivered by Oracle GoldenGate.

Defining a token is pretty simple; however, keep these three points in mind:

  1. You define the token and associated data
  2. The token header in the trail file header permits up to a total of 2,000 bytes (token name, associated data, and length of data)
  3. Use the TOKEN option of the TABLE parameter in Extracts

In order to define a token in an extract, the definition should follow this basic syntax:


In the example above, the token will be populated with the timestamp of the last commit on the table it is defined against.  After restarting the extract, the token (SRC_CSN_TS) will be included in the header of the trail file.

Once the trail file is shipped to the target side and read by the replicat, the token is mapped to a column in the target table.

MAP <schema>.<table>, target <schema>.<table>,

Image 1, is a view of a table where I have mapped the token (SRC_CSN_TS) to a target table to keep track of the committed timestamps of a transaction on the source system.

Image 1:






Tokens are simple to create, use, and are a powerful feature for mapping data between environments.


twitter: @dbasolved



Filed under: Golden Gate
Categories: DBA Blogs

<b>Contributions by Angela Golla,

Oracle Infogram - Mon, 2014-07-21 10:10
Contributions by Angela Golla, Infogram Deputy Editor

Oracle Magazine July - August, 2014
Be sure to check out the July - August issue of Oracle Magazine.  It includes articles on the next generation of SPARC servers, the Cloud, Big Data Integration, the new Oracle Enterprise Manager  Database Express and much more. 

Exactly Wrong

Greg Pavlik - Mon, 2014-07-21 08:58
I normally avoid anything that smacks of a competitive discussion on what I consider to be a space for personal reflection. So while I want to disclose the fact that I am not disinterested in the points I am making from a professional standpoint, my main interest is to frame some architecture points that I think are extremely important for the maturation and success of the Hadoop ecosystem.

A few weeks back, Mike Olson of Cloudera spoke at Spark Summit on how Spark relates to the future of Hadoop. The presentation can be found here:

In particular I want to draw attention to the statement made at 1:45 in the presentation that describes Spark as the "natural successor to MapReduce" - it becomes clear very quickly that what Olson is talking about is batch processing. This is fascinating as everyone I've talked to immediately points out one obvious thing: Spark isn't a general purpose batch processing framework - that is not its design center. The whole point of Spark is to enable fast data access and interactivity.
The guys that clearly "get" Spark - unsurprisingly - are DataBricks. In talking with Ion and company, it's clear they understand the use cases where Spark shines - data scientist driven data exploration and algorithmic development, machine learning, etc. - things that take advantage of the memory mapping capabilities and speed of the framework. And they have offered an online service that allows users to rapidly extract value from cloud friendly datasets, which is smart.

Cloudera's idea of pushing SQL, Pig and other frameworks on to Spark is actually a step backwards - it is a proposal to recreate all the problems of MapReduce 1: it fails to understand the power of refactoring resource management away from the compute model. Spark would have to reinvent and mature models for multi-tenancy, resource managemnet, scheduling, security, scaleout, etc that are frankly already there today for Hadoop 2 with YARN.

The announcement of an intent to lead an implementation of Hive on Spark got some attention. This was something that I looked at carefully with my colleagues almost 2 years ago, so I'd like to make a few observations on why we didn't take this path then.

The first was maturity, in terms of the Spark implementation, of Hive itself, and Shark. Candidly, we knew Hive itself worked at scale but needed significant enhancement and refactoring for both new features on the SQL front and to work at interactive speeds. And we wanted to do all this in a way that did not compromise Hive's ability to work at scale - for real big data problems. So we focused on the mainstream of Hive and the development of a Dryad like runtime for optimal execution of operators in physical plans for SQL in a way that meshed deeply with YARN. That model took the learnings of the database community and scale out big data solutions and built on them "from the inside out", so to speak.

Anyone who has been tracking Hadoop for, oh, the last 2-3 years will understand intuitively the right architectural approach needs to be based on YARN. What I mean is that the query execution must - at the query task level - be composed of tasks that are administered directly by YARN. This is absolutely critical for multi-workload systems (this is one reason why a bolt on MPP solution is a mistake for Hadoop - it is at best a tactical model while the system evolves).  This is why we are working with the community on Tez, a low level framework for enabling YARN native domain specific execution engines. For Hive-on-Tez, Hive is the engine and Tez provides the YARN level integration for resource negotiation and coorindation for DAG execution: a DAG of native operators analogous the the execution model found in the MPP world (when people compare Tez and Spark, they are fundamentally confused - Spark could be run on Tez for example for a much deeper integration with Hadoop 2 for example). This model allows the full range of use cases from interactive to massive batch to be administered in a deeply integrated, YARN native way.

Spark will undoubtedly mature into a great tool for what it is designed for: in memory, interactive scenarios - generally script driven - and likely grow to subsume new use cases we aren't anticipating today. It is, however, exactly the wrong choice for scale out big data batch processing in anything like the near term; worse still, returning to a monolithic general purpose compute framework for all Hadoop models would be a huge regression and is a disastrously bad idea.

Changing Failgroup of ASM Disks in Exadata

Pythian Group - Mon, 2014-07-21 08:12

There was a discrepancy in the failgroups of couple of ASM disks in Exadata. In Exadata, the cell name corresponds to the failgroup name. But there were couple of disks with different failgroup names. Using the following plan to rectify the issue online without any downtime:

1) Check disks and their failgroup:

col name format a27
col path format a45

SQL> select path,failgroup,mount_status,mode_status,header_status,state from v$asm_disk order by failgroup, path;

o/    mycellNET0             CACHED  ONLINE  MEMBER      NORMAL
o/      mycell_NET0             CACHED  ONLINE  MEMBER      NORMAL

2) Drop Disks:


3) Wait for rebalanacing to finish

select * from gv$asm_operation;

4) Add the disks to the correct failgroups

ALTER DISKGROUP DATA ADD failgroup mycellNET0 DISK ‘o/′ rebalance power 32;

– Wait for rebalance to complete.

5) select * from v$asm_operation;

6) Verify the incorrect failgroup has gone

select name,path,failgroup from v$asm_disk  where failgroup in (‘mycell_NET0′) order by name;

select path,failgroup,mount_status,mode_status,header_status,state from v$asm_disk order by failgroup, path;

Categories: DBA Blogs

Small Files on MapR-FS

Pythian Group - Mon, 2014-07-21 08:11

One of the well-known best practices for HDFS is to store data in few large files, rather than a large number of small ones. There are a few problems related to using many small files but the ultimate HDFS killer is that the memory consumption on the name node is proportional to the number of files stored in the cluster and it doesn’t scale well when that number increases rapidly.

MapR has its own implementation of the Hadoop filesystem (called MapR-FS) and one of its claims to fame is to scale and work well with small files. In practice, though, there are a few things you should do to ensure that the performance of your map-reduce jobs does not degrade when they are dealing with too many small files, and I’d like to cover some of those.

The problem

I stumbled upon this when investigating the performance of a job in production that was taking several hours to run on a 40-node cluster. The cluster had spare capacity but the job was progressing very slowly and using only 3 of the 40 available nodes.

When I looked into the data that was being processed by the active mappers, I noticed that vast majority of the splits being read by the mappers were in blocks that were replicated into the same 3 cluster nodes. There was a significant data distribution skew towards those 3 nodes and since the map-reduce tasks prefer to execute on nodes where the data is local, the rest of the cluster sat idle while those 3 nodes were IO bound and processing heavily.

MapR-FS architecture

Differently from HDFS, MapR-FS doesn’t have name nodes. The file metadata is distributed across different data nodes instead. This is the key for getting rid of the name node memory limitation of HDFS, and let MapR-FS handle a lot more files, small or large, than a HDFS cluster.

Files in MapR-FS have, by default, blocks of 256MB. Blocks are organised in logical structures called “containers”. When a new block is created it is automatically assigned to one existing container within the volume that contains that file. The container determines the replication factor (3 by default) and the nodes where the replicas will be physically stored. Containers are bound to a MapR volume and cannot span multiple volumes.

There’s also a special container in MapR-FS called a “name container”, which is where the volume namespace and file chunk locations are stored. Besides the metadata, the name container always stores the first 64KB of the file’s data.

Also, there’s only a single name container per MaprFS volume. So the metadata for all the files in a volume, along with the files’ first 64KB of data, will be all stored in the same name container. The larger the number of files in a volume, the more data this container will be replicating across the same 3 cluster nodes (by default).

So, if your data set is comprised of a very large number of small files (with sizes around 64KB or less) and is all in the sae volume, most of the data will be stored in the same 3 cluster nodes, regardless of the cluster size. Even if you had a very large cluster, whenever you ran a map-reduce job to process those files, the job’s tasks would be pretty much allocated on only 3 nodes of the cluster due to data locality. Those 3 nodes would be under heavy load while the rest of the cluster would sit idle.

Real impact

To give you an idea of the dimension of this problem, the first time I noticed this in production was due to a Hive query that was causing high load only in 3 nodes of 40-node cluster. The job took 5 hours to complete. When I looked into the problem I found that the table used by the Hive query had tens of thousands of very small files, many of them smaller than 64K, due to the way the data was being ingested.

We coalesced the table to combine all those small files into a much smaller number of bigger ones. The job ran again after that, without any changes, and completed in just 15 minutes!! To be completely fair, we also changed the table’s file format from SequenceFile to RCFile at the same time we coalesced the data, which certainly brought some additional performance improvements. But, from the 3-node contention I saw during the first job run, I’m fairly convinced that the main issue in this case was the data distribution skew due to the large amount of small files.

Best practices

This kind of problem is mitigated when large files are used, since only a small fraction of the data (everything below the 64KB mark) will be stored in the name container, with the rest distributed across other containers and, therefore, other nodes. We’ll also have a smaller number of files (for a similar data volume), which reduces the problem even more.

If your data is ingested in a way that creates many small files, plan to coalesce those files into larger ones on a regular basis. One good tool for that is Edward Capriolo’s File Crusher. This is also (and especially) applicable to HDFS.

Best practice #1: Keep you data stored into large files. Pay special attention to incremental ingestion pipelines, which may create many small files, and coalesce them on a regular basis.

A quick and dirty workaround for the 3-node contention issue explained above would be to increase the replication factor for the name container. This would allow more nodes to run map-reduce tasks on that data. However, it would also use a lot more disk space just to achieve the additional data locality across a larger number of nodes. This is NOT an approach I would recommend to solve this particular problem.

Instead, the proper way to solve this in Mapr-FS is to split your data across different volumes. Especially if you’re dealing with a large number of small files that cannot be coalesced, splitting them across multiple volumes will keep the number of files per volume (and per name container) under control and it will also spread the small files’ data evenly across the cluster, since each volume will have its own name container, replicate across a different set of nodes.

The volumes may, or may not, follow your data lifecycle, with monthly, weekly or even daily volumes, depending on the amount of data being ingested and files being created.

Best practice #2: Use Mapr-FS volumes to plan your data distribution and keep the number of files per volume under control.

  1. MapR Architecture Guide
Categories: DBA Blogs

Cloudera Challenge 2014

Pythian Group - Mon, 2014-07-21 08:09

Yesterday, Cloudera released the score reports for their Data Science Challenge 2014 and I was really ecstatic when I received mine with a “PASS” score! This was a real challenge for me and I had to put a LOT of effort into it, but it paid off in the end!

Note: I won’t bother you in this blog post with the technical details of my submission. This is just an account of how I managed to accomplish it. If you want the technical details, you can look here.

Once upon a time… I was a DBA

I first learned about the challenge last year, when Cloudera ran it for the first time. I was intrigued, but after reading more about it I realised I didn’t have what it would be required to complete the task successfully.

At the time I was already delving into the Hadoop world, even though I was still happily working as an Oracle DBA at Pythian. I had studied the basics and the not-so-basics of Hadoop, and the associated fauna and had just passed my first Hadoop certifications (CCDH and CCAH). However, there was (and is) still so much to learn! I knew that to take the challenge I would have to invest a lot more time into my studies.

“Data Science” was still a fuzzy buzzword for me. It still is, but at the time, I had no idea about what was behind it. I remember reading this blog post about how to become a data scientist. A quick look at the map in that post turned me off: apart from the “Fundamentals” track in it, I had barely idea what the rest of the map was about! There was a lot of work to do to get there.

There’s no free lunch

But as I started reading more about Data Science, I started to realise how exciting it was and how interesting were the problems it could help tackle. By now I had already put my DBA career on hold and joined the Big Data team. I felt a huge gap between my expertise as a DBA and my skills as a Big Data engineer, so I put a lot of effort in studying the cool things I wanted to know more about.

The online courses at Coursera, Edx, Stanford and the like were a huge help and soon I started wading through courses and courses, sometime many at once: Scala, R, Python, more Scala, data analysis, machine learning, and more machine learning, etc… That was not easy and it was a steep learning curve for me. The more I read and studied I realised there was many times more to learn. And there still is…

The Medicare challenge

But when Cloudera announced the 2014 Challenge, early this year, I read the disclaimer and realised that this time I could understand it! Even though I had just scratched the surface of what Data Science is meant to encompass, I actually had tools to attempt tackling the challenge.

“Studies shall not stop!!!”, I soon found, as I had a lot more to learn to first pass the written exam (DS-200) and then tackle the problem proposed by the challenge: to detect fraudulent claims in the US Medicare system. It was a large undertaking but I took it one step at a time, and eventually managed to complete a coherent and comprehensive abstract to submit to Cloudera, which, as I gladly found yesterday, was good enough to give me a passing score and the “CCP: Data Scientist” certification from Cloudera!

I’m a (Big Data) Engineer

What’s next now? I have only one answer: Keep studying. There’s so much cool stuff to learn. From statistics (yes, statistics!) to machine learning, there’s still a lot I want to know about and that keeps driving me forward. I’m not turning into a Data Scientist, at least not for a while. I am an Engineer at heart; I like to fix and break things at work and Data Science is one more of those tools I want to have to make my job more interesting. But I want to know more about it and learn how to use it properly, at least to avoid my Data Scientist friends cringing away every time I tell tell I’m going to run an online logistic regression!

Categories: DBA Blogs

SQL Server 2014 Delayed Durability from an Application Perspective

Pythian Group - Mon, 2014-07-21 08:06

The idea of this blog post is to describe what the delayed durability feature is in SQL Server 2014 and to describe a use case from an application development perspective.

With every new SQL Server release we get a bunch of new features and delayed durability of transactions really caught my attention. Most of the relational database engines are used to handle transactions with the write ahead log method(, basically a transaction comes into the database, and in order to successfully commit a piece of information it will flush the pages from the memory, then write to the transaction log and finally to the datafile, always following a synchronous order, since the transaction log is pretty much a log of each transactions, recovery methods can even try to get the data from logs in case the data pages were never committed to the datafile, so as a summary this is a data protection method used to handle transactions, MSDN calls this a transaction with FULL DURABILITY.

So what is Delayed Transaction Durability?

To accomplish delayed durability in a transaction, asynchronous log writes happens from the buffers to the disk. Information is kept in memory until either the buffer is full or a flush takes place. This means instead of flushing from memory, then to log and then to datafile, the data will just wait in memory and the control of the transaction will be restored to the requestor app faster. If a transaction initially only hits memory and avoid going through the disk heads, it will for sure complete faster as well.

But when is the data really stored in disk?

SQL Server will handle this depending on how busy/full the memory is and will then execute asynchronous transactions to finally store the information in disk. You can always force this to happen with this stored procedure “sp_flush_log”.

Ok But there is a risk, right?

Correct, since the original data protection method is basically skipped, in the event of a system disruption such as SQL Server doing a failover or simply “unexpectedly shutting down”, some data can be lost in the so called LIMBO that is somewhere between the application pool and the network cable.

Why would I want to use this?

Microsoft recommends to use this feature only if you can tolerate a data loss, if you are experiencing a bottleneck or performance issue related to log writes or if your workload have a high contention rate(processes waiting for locks to be released.)

How do I Implement it?

To use delayed transactions you should enable this as a database property. You can used FORCED option which will try to handle all transactions as delayed durable, you can use ALLOWED which will let you use delayed durable transactions, which you then need to specify in your TSQL(this is called atomic block level control), see a sample taken from MSDN below:

LANGUAGE = N'English'


For more syntax and details I invite you to check the so full of wisdom MSDN Library.

Enough of the background information, and let’s take this puppy for a ride, shall we?

Consider the following scenario: You manage a huge application, probably some application between an ERP and a Finance module. The company has developed this application from scratch, each year more and more features are added in this app. The company decides that they want to standardize procedures and want to have more control over the events of the application. They realize they do not have enough audit traces, if someone deletes data, if a new deal or customer information is inserted, management needs to have a track record of almost anything that happens. They have some level of logging, but is implemented differently depending on the developer taste and mood.

So, Mr MS Architect decides they will implement enterprise library logging block, and will handle both exceptions and custom logging with this tool. After adding all this logging to the events, the system begins to misbehave and the usual slow is now officially really slow. Mr Consultant then comes in and suggest that the logging data is moved to a separate database, also this database should use Delayed durability, by doing so, transactions related to logging events will have less contention and will return the control faster to the application, some level of data loss can be tolerated which also makes the decision even better.

Let’s build a proof of concept and test it..

You can find a sample project attached: WebFinalNew

You need to have enterprise library installed in your visual studio. For this sample I am using Visual Studio 2010.

You need to create 2 databases, DelayedDB and NormalDB (Of Course we need to use SQL Server 2014)



Use the attached script LoggingDatabase (which is part of the scripts of Enterprise library), it will create all the objects needed for the application log block.


In the DelayedDB, edit the properties and set the Delayed Durability to FORCED, this will make all transactions to have delayed durability(please note some transactions will never be delayed durable such as system transactions, cross-database transactions, and operations involving FileTable, Change Tracking and Change Data Capture)


You need to create a windows web project, it should have a web.config , if not you can manually add a configuration file:



Make sure you add all the application block references(Logging Block)


Now right click over the web.config or app.config file and edit your enterprise library configuration



In the database Settings block, add 2 new connections to your database(one for NormalDB and the other for DelayedDB), make sure to specify the connection in the form of a connection string like the picture below:



In the Logging block, create a new category called DelayedLogging, this will point to the database with delayed durability enabled.


Then add 2 database Trace listeners, configure General Category to point to “Database Trace Listener” and then configure DelayedLogging Category to point to “Database Trace Listener 2”. Configure each listener to point to the corresponding database(one to each database previously configured in the Database block)



Save all changes and go back to the application, configure the design layout with something like below



Add a codebehind to the button in the upper screen and build a code that will iterate and send X amount of commands to each database, track the time it takes to send the transaction and regain control of the application into a variable, check the attached project for more details, but use logwriter.write and pass as parameter the category you configured to connect to DelayedDB(DelayedLogging) and the general category(default, no parameter) to connect to NormalDB. See a sample of how a logging transaction is fired below:


logWriter.Write("This is a delayed transaction","DelayedLogging");
logWriter.Write("This is a transaction");

This code will call the logging block and execute a transaction on each database, the “normal” database and the durable one, it will also track milliseconds it takes to return the control to the application, additionally I will have performance monitor and query statistics from the database engine to see the difference in behavior.


Quick Test

Batch insert of 1000 rows, a normal database took 1 millisecond more in average per transaction to return the control to the application:9


What information we have from sys.dm_io_virtual_file_stats?

Database io_stall_read_ms num_of_writes num_of_bytes_written io_stall_write_ms io_stall size_on_disk_bytes DelayedDB 47 5126 13843456 4960 5007 1048576 Normal 87 5394 14492160 2661 2748 1048576

We can see that the same amount of data was sent to both databases(last column size_on_disk_bytes), interesting observation are the stalls, in a delayed durable database the stall will be higher for writing, this means despite the fact that the transaction is executed “faster”, what really means is that it returns the control to the application faster, but the time it takes to actually store the data to disk can be higher since is done in async mode.


Let’s see a quick graphic of the performance impact

Delayed Durability

10 11

With a Delayed Durability the disk queue length average is higher, since it will wait to fill the buffer and then execute the write. You can appreciate the yellow peak(within the red circle) after the transaction completes, it will execute pending writes( moment where I issue a “sp_flush_log”.).


Full Durability



With a Full Durability the disk queue length average is lower, since it will sequentially execute the writes there will be less pending transactions in memory.



Delayed Durability feature is definitely a great addition to your DBA toolbelt, it needs to be used taking in consideration all the risks involved, but if properly tested and implemented it can definitely improve the performance  and architecture of certain applications. Is important to understand this is not a turbo button(like some people does with the nolock hint) and it should be used for certain types of transactions and tables. Will this change your design methods and make you plan for a separate delayed durable database? or plan to implement certain modules with delayed durable transactions? This for sure will have an interesting impact on software design and architecture.

Categories: DBA Blogs

OTN APEX Forum Link

Denes Kubicek - Mon, 2014-07-21 00:33
Oracle again changed the layout of the forum. For me, the old link didn't work any more. In case you have problems finding it, here is the new link:

If you go to the forum and search for example for "APEX" or "Application Exp", you will see no results. Typing in "Application Ex" will find "Application Express".

Each of the found links will have a funny description saying:

"An error occurred processing your request. If this problem persists, please contact the webmaster or administrator of this site."

:) So, it seems there are now even more bugs than before.

Probably, the intention to change the forum wasn't bad. However, once you manage to open it you will see a lot of information you don't need (or at least not all of the time). The real content is somewhere underneath and needs scrolling like in Facebook (oh, how I hate that site). And the worst thing is that you can see only ten threads per page - if you want to see more then click and scroll again. For those interested in helping others this is making things much more complicated.

One positive thing though. :) My name suddenly appears in the top list of the participants in the forum. The list isn't reduced to the top five but it now shows the top six. Top six is obviously the new top five. ;)

Categories: Development

Building Dynamic Branded Digital Experiences with Oracle WebCenter

WebCenter Team - Mon, 2014-07-21 00:00

This post originally appeared on the Oracle consulting blog, Future State - The Official Blog of Oracle Consulting Services
on Thursday Jul 17, 2014

Building Dynamic Branded Digital Experiences with Oracle WebCenter

By Ty Duval, Consulting Senior Practice Director, WebCenter, Oracle Consulting Services

A Cost Effective Solution to Securing Retail Data

At the Crossroads

I frequently encounter companies at the crossroads in their efforts to become digital businesses. Their journeys proceed along familiar paths and I can readily anticipate what their next steps should be. To begin with, these firms launched their initial web sites more than 15 years ago, and have steadily added multiple web-based applications (running on disparate systems) to support targeted initiatives. IT and business leaders are certainly web-aware, if not already web-savvy.

Yet a lot has changed over the past decade. Web-powered solutions are no longer nice-to-have additions to enterprise architectures and applications. Rather, these solutions are core capabilities for achieving strategic business objectives.

The Business Value for WebCenter

IT leaders must now provide both internal and external customers with the branded experiences for managing and using online content, while sharply reducing costs and accelerating time to market. It’s necessary -- but no longer sufficient -- to simply consolidate web sites by introducing standardized platforms and services that reduce technical footprints.

Instead, IT groups need to refresh, modernize, and mobilize their enterprise application infrastructures. There is also an evolution of responsibilities. Individual business units, not the IT groups, should create and manage all of the content required for engaging customers and driving the branded experiences across their organizations.

Of course, Oracle WebCenter provides the tooling for delivering effective enterprise-scale applications. Yet implementation makes a big difference. At OCS, we focus on three factors for deploying digital business solutions – consultative engagement, content inventory, and content reuse. Let me explain why these factors make a difference.

Consultative Engagement

First, the OCS engagement model is a consultative process. We work along side business stakeholders and creative teams to define the requirements for building branded experiences. With our deep technical knowledge and product expertise, we can help define how to use the right tool for the right job in the right way.

There is often a gap between what the business envisions and what the tools deliver. By being part of the conversation from the start, OCS consultants can bridge the gap, and make timely recommendations that leverage the key capabilities of the enabling tools and technologies. Then, when it comes to implementation, consultants can rapidly prototype and produce frequent enhancements on an ongoing basis. Utilizing an agile development methodology, they can work closely with business users and designers to mold the digital environment.

Content Inventory

Second, branded experiences depend on content. In any engagement, it’s essential to determine what information already exists and can be readily incorporated into the new solution, as well as what content is entirely missing and needs to be created. A content inventory maps the “to be” state about what information customers require, against the “as is” condition describing and categorizing all the content items that are currently available.

OCS consultants work with business stakeholders and creative teams to identify the kinds of content needed to support particular experiences. It is also important to identify the content owners who are responsible for producing the needed information, both currently and in the future. Often the content already exists in one repository or another. The design challenge then is to compile and organize the information from disparate sources.

The content inventory can also uncover the missing text, images, and rich media assets that customers expect as part of their experiences. OCS consultants can then work with line-of-business organizations to define new content management processes – the people, tasks, and activities required for creating and maintaining these needed information sources. Once deployed, the line organizations should be responsible for managing the content without IT support.

Content Reuse

Third, a successful digital business initiative depends on content reuse – the ability to create content items once, manage them systematically, and distribute them as needed across the enterprise. As an example, there should be a single source of content that describes the capabilities of a new product on a company’s web site, and the corresponding promotions contained in personalized email messages sent to prospective customers.

When it comes to building branded experiences, more is at stake then storing content within a shared repository or relying on a predefined set of editorial workflows for review and approvals. Reuse requires an appreciation for the power of content and an understanding about how to manage it for competitive advantage.

This is where WebCenter deployment expertise pays off. OCS consultants have the technical skill sets and business insights for defining the content models and metadata essential to ensure content reuse. They can utilize the appropriate capabilities of various WebCenter products for business results.

Knowhow and Experience

In short, there’s an art and a science to building branded experiences for digital businesses. Successful companies are going to transform – and digitize – key aspects of their ongoing operations, and create new business processes along the way. Different firms and even entire industries are going to pursue their own particular paths.

But there are common threads to weaving together the applications for next-generation, digitally empowered environments. It takes knowhow and experience. When implementing WebCenter, OCS consultants have the insights, methodologies, and tools to help companies make the journeys and become digital businesses.

Backup an SQL Server database from On-Premise to Azure

Yann Neuhaus - Sun, 2014-07-20 23:46

SQL Server database backup & restore from On-Premise to Azure is a feature introduced with SQL Server 2012 SP1 CU2. In the past, it could be used with these three tools:

  • Transact-SQL (T-SQL)
  • PowerShell
  • SQL Server Management Objects (SMO)

With SQL Server 2014, backup & restore can also be enabled via SQL Server Management Studio (SSMS).

Data integration as a business opportunity

DBMS2 - Sun, 2014-07-20 21:59

A significant fraction of IT professional services industry revenue comes from data integration. But as a software business, data integration has been more problematic. Informatica, the largest independent data integration software vendor, does $1 billion in revenue. INFA’s enterprise value (market capitalization after adjusting for cash and debt) is $3 billion, which puts it way short of other category leaders such as VMware, and even sits behind Tableau.* When I talk with data integration startups, I ask questions such as “What fraction of Informatica’s revenue are you shooting for?” and, as a follow-up, “Why would that be grounds for excitement?”

*If you believe that Splunk is a data integration company, that changes these observations only a little.

On the other hand, several successful software categories have, at particular points in their history, been focused on data integration. One of the major benefits of 1990s business intelligence was “Combines data from multiple sources on the same screen” and, in some cases, even “Joins data from multiple sources in a single view”. The last few years before application servers were commoditized, data integration was one of their chief benefits. Data warehousing and Hadoop both of course have a “collect all your data in one place” part to their stories — which I call data mustering — and Hadoop is a data transformation tool as well.

And it’s not as if successful data integration companies have no value. IBM bought a few EAI (Enterprise Application Integration) companies, plus top Informatica competitor Ascential, plus Cast Iron Systems. DataDirect (I mean the ODBC/JDBC guys, not the storage ones) has been a decent little business through various name changes and ownerships (independent under a couple of names, then Intersolv/Merant, then independent again, then Progress Software). Master data management (MDM) and data cleaning have had some passable exits. Talend raised $40 million last December, which is a nice accomplishment if you’re French.

I can explain much of this in seven words: Data integration is both important and fragmented. The “important” part is self-evident; I gave examples of “fragmented” a couple years back. Beyond that, I’d say:

  • A new class of “engine” can be a nice business — consider for example Informatica/Ascential/Ab Initio, or the MDM players (who sold out to bigger ETL companies), or Splunk. Indeed, much early Hadoop adoption was for its capabilities as a data transformation engine.
  • Data transformation is a better business to enter than data movement. Differentiated value in data movement comes in areas such as performance, reliability and maturity, where established players have major advantages. But differentiated value in data transformation can come from “intelligence”, which is easier to excel in as a start-up.
  • “Transparent connectivity” is a tough business. It is hard to offer true transparency, with minimal performance overhead, among enough different systems for anybody to much care. And without that you’re probably offering a low-value/niche capability. Migration aids are not an exception; the value in those is captured by the vendor of what’s being migrated to, not by the vendor who actually does the transparent translation. Indeed …
  • … I can’t think of a single case in which migration support was a big software business. (Services are a whole other story.) Perhaps Cast Iron Systems came closest, but I’m not sure I’d categorize it as either “migration support” or “big”.

And I’ll stop there, because I’m not as conversant with some of the new “smart data transformation” companies as I’d like to be.

Related links

Categories: Other

Dependent Rational Animals

Greg Pavlik - Sun, 2014-07-20 16:32
I wanted to briefly comment on Alisdair MacIntyre's lectures collected as "Dependent Rational Animals", but let me precede that with a couple of comments for context: first, as I alluded in my last post referencing Levinas, it is my view that the the ethics demands a certain primacy in any healthy conception of life and society; second, in the area of ethics, Macintyre's After Virtue is the book that has had perhaps the biggest impact on my own thinking.

One of the criticisms of MacIntyre is that his critique of rational ethics is, on the one hand, devastating; on the other hand, his positive case for working out a defense of his own position - a revivification of social ethics in the Aristotelian-Thomist tradition(s) was somewhat pro forma. I think this is legitimate in so far as it relates to After Virtue itself (I believe I have read the latest edition - 3 - most recently), though I am not enough of a MacIntyre expert to offer a defensible critique of his work overall.

I do, however, want to draw attention to Dependent Rational Animals specifically in this light. Here MacIntyre begins with is the position of human as animal - as a kind of naturalist starting point for developing another pass at the importance of the tradition of the virtues. What is most remarkable is that in the process of exploring the implications of our "animality" MacIntyre manages to subvert yet another trajectory of twentieth century philosophy, this time as it relates to the primacy of linguistics. The net effect is to restore philosophical discourse back toward the reality of the human condition in the context of the broader evolutionary context of life on earth without - and this I must say is the most amazing part of this book - resorting to fables-masked-as-science (evolutionary psychology).


Michael Feldstein - Sun, 2014-07-20 08:16

It would be deeply unfair of me to mock Blackboard for having a messy but substantive keynote presentation and not give equal time to D2L’s remarkable press release, pithily entitled “D2L Supercharges Its Integrated Learning Platform With Adaptive Learning, Robust Analytics, Game-Based Learning, Windows® 8 Mobile Capabilities, And The Newest Education Content All Delivered In The Cloud.” Here’s the first sentence:

D2L, the EdTech company that created the world’s first truly integrated learning platform (ILP), today announces it is supercharging its ILP by providing groundbreaking new features and partnerships designed to personalize education and eliminate the achievement gap.

I was going to follow that quote with a cutting remark, but really, I’m not sure that I have anything to say that would be equal to the occasion. The sentence speaks for itself.

For a variety of reasons, Phil and I did not attend D2L FUSION this year, so it’s hard to tell from afar whether there is more going on at the company than meets the eye. I’ll do my best to break down what we’re seeing in this post, but it won’t have the same level of confidence that we have in our Blackboard analysis.

Let me get to the heart of the matter first. Does it look to us like D2L has made important announcements this year? No, it does not. Other than, you know, supercharging its ILP by providing groundbreaking new features and partnerships designed to personalize education and eliminate the achievement gap. They changed their product name to “Brightspace” and shortened their company name to D2L. The latter strikes me as a particularly canny PR move. If they are going to continue writing press releases like their last one, it is probably wise to remove the temptation of the endless variety of potential “Desire2″ jokes. Anyway, THE Journal probably does the best job of summarizing the announcements. For an on-the-ground account of the conference and broader observations about shifts in the company’s culture, read D’Arcy Norman’s post. I’ve been following D’Arcy since I got into blogging ten years ago and have learned to trust his judgment as a level-headed on-the-ground observer.

From a distance, a couple of things jump out at me. First, it looks to me like D2L is trying to become a kind of a content player. Having acquired the adaptive platform in Knowillage, they are combining it with the standards database that they acquired with the Achievement Standards Network. They are also making a lot of noise about enhancements to and content partnerships for their Binder product, which is essentially an eBook platform. Put all of this together, and you get something that conceptually is starting to look (very) vaguely like CogBooks. It wants to be an adaptive courseware container. If D2L pulls this off it will be significant, but I don’t see signs that they have a coherent platform yet—again, acknowledging that I wasn’t able to look at the strategy up close at FUSION this year and could easily be missing critical details.

Second, their announcement that they are incorporating IBM’s Cognos into their Insights learning analytics platform does not strike me as a good sign for Insights. As far as we have been able to tell from our sources, that product has languished since Al Essa left the company for McGraw Hill. One problem has been that their technical team was unable to deliver on the promise of the product vision. There were both data integrity and performance issues. This next bit is speculation on my part, but the fact that D2L is announcing that they plan to use the Cognos engine suggests to me that the company has thus far failed to solve those problems and now is going to a third party to solve them. That’s not necessarily a bad strategy, but it reinforces our impression that they’ve lost another year on a product that they hyped to the heavens and raises questions about the quality of their technical leadership.

The post Desire2Wha? appeared first on e-Literate.

Putting my DB / Apex install through the wringer

Gary Myers - Sun, 2014-07-20 03:53
I was mucking around trying to get APEX on one of my PCs to be visible on the internet.

This was just a proof-of-concept, not something I intend to actually leave running.

EPG on Port 8080

I do other testing on the home network too, so I already had my router configured to forward port 80 to another environment. That meant the router's web admin had been shifted to port 8080, and it wouldn't let me use that. Yes, I should find a open source firmware, but OpenWRT says it is unsupported and will "brick the router" and I can't see anything for Tomato.

So I figured I'd just use any incoming router port and forward it to the PC's 8080. I chose 6000. This was not a good choice. Looks like Chrome comes with a list of ports which it thinks shouldn't be talking http. 6000 is one of them, since it is supposed to be used for X11 traffic so Chrome told me it was unsafe and refused to co-operate.

Since it is a black-list of ports to avoid, I just happened to be unlucky (or stupid) in picking a bad one. Once I selected another, I got past that issue.

My task list was:

  1. Install Oracle XE 11gR2 (Windows 64-bit)
  2. Configure the EPG for Apex. I ran apex_epg_config.sql as, I had switched straight from the pre-installed Apex 4.0 to 4.2.5 rather than upgrading a version I had actively used. 
  3. Unlocked the ANONYMOUS database account
  4. Checked DBMS_XDB.GETHTTPPORT returned 8080 
(At this point, you can test that you have connectivity to apex on the machine on which XE / Apex is installed, through and localhost).
Local Network
  1. Enabled external access by setting DBMS_XDB.SETLISTENERLOCALACCESS(false); 
(Now you can test connectivity from another machine on the same local network through whatever hostname and/or IP address is assigned to that machine, such as 10.x.x.x or 192.168.x.x)
Remote Network
  • I got a handy Dynamic DNS via NoIP because my home IP can potentially change (though it is very rare). [Yes, there was a whole mess about Microsoft temporarily hijackinging some noip domains, but I'm not using this for anything important.] This was an option in my router setup.
  • The machine that runs XE / Apex should be assigned a specific 192.168.1.nnn IP address by the router (based on it's MAC address). This configuration is specific to the router hardware, so I won't go into my details here. But it is essential for the next step.
  • Configure the port forwarding on the router to push incoming traffic on the router's port 8088 off to port 8080 for the IP address of the machine running XE / Apex. This is also router specific. 
When everything is switched on, I can get to my Apex install from outside the local network based on the hostname set up with noip, and the port configured in the router. I used my phone's 3G internet connection to test this. 

Apex Listener

My next step was to use the Apex Listener rather than the EPG. Oracle have actually retagged the Apex Listener as RDS (Restful Data Services) so that search engines can confuse it with Amazon RDS (Relational Database Service).

This one is relatively easy to set up, especially since I stuck with "standalone" mode for this test. 

A colleague had pointed me to this OBE walkthrough on Apex PDF reports via RDS, so I took a spin through that and it all worked seamlessly.

My next step would be a regular web server/container for RDS rather than standalone. I'm tempted to give Jetty a try as the web server and container for the listener rather than Tomcat etc, but the Jetty documentation seems pretty sketchy. I'm used to the thoroughness of the documentation for Apache (as well as Oracle).

It’s The End of Cal State Online As We Know It . . .

Michael Feldstein - Sat, 2014-07-19 08:48

In a letter to campus leaders, Cal State University system office last month announced that Cal State Online will no longer operate as originally conceived. Emphasis added below.

As the CSU continues to expand its online education strategies, Cal State Online will evolve as a critical component. An early Cal State Online goal will continue: to increase the quality and quantity of fully online education offerings to existing and prospective CSU students, resulting in successful completion of courses and graduation.

The re-visioning of Cal State Online was recommended by the Council of Presidents and approved by the chancellor. This will include a shift to a communication, consultation and services’ strategy for fully online campus degree programs, credentials, certificates and courses supported by opt-in shared services. Cal State Online’s shared services will be designed, delivered and managed to:

1. Make it easy for prospective and existing students to discover, decide, enroll and successfully complete their CSU online education opportunities.

2. Make it more cost-effective for CSU campuses to develop, deliver and sustain their high- quality fully online degree, credential and certificate programs and courses.

Background in a nutshell

In early 2010 a sub-set of the Cal State presidents – the Technology Steering Committee (TSC) – came up with a plan to get the system to aggressively push online education across the system. In fall 2011 the group commissioned a consultant’s set of reports to help them pick an operating model, with the reports delivered in February 2012. This study led to the creation of CSU Online, conceived as a separate 501(c)3 non-profit group1 run by the system, with the plan to use a for-profit Online Service Provider (OSP).2 Early on they realized that Colorado State University was already using the CSU Online name, and the initiative was renamed Cal State Online. The idea was to offer fully-online programs offered by individual campuses in a one-stop shop. Based on an RFP process, in August 2012 Cal State Online selected Pearson as their OSP partner.

Some media coverage of initiative:

The March IHE article quoted official Cal State documents to describe the initiative.

“The goal of Cal State Online is to create a standardized, centralized, comprehensive business, marketing and outreach support structure for all aspects of online program delivery for the Cal State University System,” says the draft RFP. In the open letter, the executive director offers assurances that “participation is optional” for each of the system’s nearly two dozen campuses, “all programs participating in Cal State Online are subject to the same approval processes as an on-campus program,” and “online courses will meet or exceed the quality standards of CSU face-to-face courses.”

What has changed?

This change is significant and recent, meaning that Cal State likely does not have full plans on what will happen in the future. For now:

  • Cal State Online will no longer be a separate operating entity, and the remnant, or “re-visioned” services will be run by the existing Academic Technology Services department within the Chancellor’s Office.

The re-visioning Cal State Online team will be led by Gerry Hanley (Assistant Vice Chancellor for Academic Technology Services) with Sheila Thomas (State University Dean, Extended and Continuing Education).

  • Pearson is no longer the OSP, and in fact, they had already changed their role many months ago3 to remove the on-site team and become more of a platform provider for the LearningStudio (aka eCollege) LMS and supporting services.
  • Cal State is no longer attempting to provide a centralized, comprehensive support structure “for all aspects of online program delivery” but instead will centrally provide select services through the individual campuses.
  • It is clear that Cal State is positioning this decision to show as much continuity as possible. They will continue to provide some of the services started under Cal State Online and will continue to support the programs that have already been offered through the group.

Some services will continue and CSU may keep the name, but it’s the end of Cal State Online as we know it.

I am working on a longer post to explain what happened, including (hopefully) some interviews for supporting information . . . stay tuned.

Update: Changed description of Pearson change and added footnote.

  1. I have not independently verified that the organization truly was set up as a 501(c)3.
  2. Pearson had a team in place at Cal State providing LMS, implementation and integration services, enrollment management & marketing, course design support, analytics and reporting, learning object repository, help desk and technical support, training and faculty support.
  3. I believe this occurred Feb 2014 but am not sure.

The post It’s The End of Cal State Online As We Know It . . . appeared first on e-Literate.

Site Maintenance Complete!

Tim Hall - Sat, 2014-07-19 04:40

It looks like the site maintenance is complete and from my perspective the DNS changes have gone through.

If you go to the homepage and see a message called “Site Maintenance” in the “Site News” section, it means you are being directed to the new server. If you don’t see that it means you are still being directed to the old server and you won’t be able to read this. :)

I guess it will take a few hours for the DNS changes to propagate. Last time I moved the site it took a couple of days to complete for everyone.



Site Maintenance Complete! was first posted on July 19, 2014 at 11:40 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.