Skip navigation.

Feed aggregator

Server Problem : A Resolution?

Tim Hall - Fri, 2016-04-29 08:09

AWSIt’s been a pretty annoying couple of days on the website server front.

The server locking up intermittently is one thing, and for all I know, maybe my fault? The incompetence of the hosting company is quite something else.

Just so you are aware why I was doing my nut yesterday, the hosting company had disabled my ability to force a power cycle of my dedicated server while they did a hardware test. They forgot to re-enable it when they finished. I rang to ask them to re-enable it and also power cycle to server. It took them over 70 minutes to achieve the power cycle and it was the following day before the interface to allow me to force a power cycle was enabled again. Amateurs!

They offered to give me a free month of hosting, but I refused. Last night I moved the whole thing to Amazon Web Services so that’s the new home for the website. I finished the build and testing, then flipped the DNS and went to bed, figuring the DNS propagation can take up to 24 hours, so why hang around.

VirtualBox 5.0.20

Tim Hall - Fri, 2016-04-29 07:32

It’s been 10-ish days since VirtualBox 5.0.18 was released and the world is screaming out, “Where is 5.0.20 already? We want to update all out Guest Additions!” Well, it’s here…

Experiments with Elastic’s Graph Tool

Rittman Mead Consulting - Fri, 2016-04-29 07:17

Elastic announced their Graph tool at ElastiCON 2016 (see presentation here). It’s part of the forthcoming X-Pack which bundles Graph along with other helper tools such as Shield and Marvel. Graph itself is two things; an extension of Elasticsearch’s capabilities, enabling the user to explore how items indexed in Elasticsearch are related, and a plugin for Kibana that acts as an optional front-end for this new functionality.

You can find a good introduction to Graph and the purpose and theory behind it in the documentation here. The installation of the components themselves is simple and documented here.

First Graph

To use Graph, you just point it at your existing data in Elasticsearch. The first data set I’m going to explore is one of the standard ones that everyone uses; Twitter. I’m streaming it in through Logstash (via Kafka for flexibility), but if you wanted you could ship it in via JDBC from any RDBMS, or from HDFS too.
See an important note at the end of this article about the slice of data within it, because it affects how the relationships visualised here should be viewed. 

On launching Kibana’s Graph plugin (http://localhost:5601/app/graph) I choose the index (note that index patterns, e.g. when partitioning by date, are not supported yet), and the field in the data that I want to use as my vertices. A point to note here – “vertices” are usually called “nodes” in Graph terminology, but since Elasticsearch already uses “nodes” as part of its infrastructure topology terminology, they had to pick a different term.

In the search box, I can put my search term from which I’m interested to see the related ‘vertices’.

Sounds baffling? It is, kinda – right up until you run it (hit enter from the search box or click the magnifying glass search icon) and see what happens:

Here we’re seeing the hashtags used in tweets that mention Kibana. The “connections” (Elastic term) or “edges” (general Graph term) show which vertices (nodes) are related, and the width indicates the strength of that relationship (based on Elasticsearch’s significant terms and scoring algorithm). For more details, see the “Behind the Scenes” section towards the end of this article.

We can add in a second set of vertices by running a second search (“Elasticsearch”) – the results for these are, in effect, appended to the existing ones:

Add Links

Since we’ve pulled back an additional set of vertices, it could be that there’s overlap between these and the first set (you’d kinda of expect it, Elasticsearch and Kibana being related). To visualise this, use the Add Links button

Note how the graph redraws itself with additional connections:

Blinked and you missed it? Use the Undo button to step back, and Redo button to re-apply.

Grouping Vertices

If you look closely at the graph you’ll see that Elasticsearch, ElasticSearch, and elasticsearch are all there as separate vertices. This is because I’m using a non-analyzed index field, so the strings are treated literally, case included. In this specific example, we’d probably re-run the graph using the analysed version of the field, which following the same two searches as above gives this:

But, sticking with our non-analysed example, we can use it to demonstrate Graph’s ability to group multiple terms together into a single vertex. Switch to Advanced Mode:

and then select the three vertices and click the group option

Now all three, and their connections, are as one:

Whilst the above analysed/non-analysed difference gave me excuse to show the group function (can you tell I’ve done many-a-failed-live-demo? ;-) ), I’m now going to switch over to a graph built on the analysed version of the hashtag field, as we saw briefly above:

Tidying up the Graph – Delete and Blacklist

There’s a few straglers on the Graph that are making it less easy to comprehend. We can temporarily remove them, or even blacklist them from appearing again in this session:

Expand Selection

One of the points of Graph analysis is visualising the relationships in your data in a way that standard relational methods may not lend themselves to so easily. We can now start to explore this further, by digging into the Graph that we’ve got so far. This process, along with the add links seen above, is often called “spidering“. By selecting the elasticsearch node and clicking on Expand selection we can see additional (by default, five) vertices related to this one:

So we see that kafka is related to Elasticsearch (in the view of the twitterati, at least), and let’s expand that Kafka vertex too:

By clicking the Expand selection button again for the same vertex we get further results added:

We can select one node (e.g. realtime) an using the Add Link see additional relationships:

But, there are many nodes, and we want to see any relationships. So, switch to Advanced Mode, select All

…add Add Link again:

Knob Twiddling

Let’s start with a blank canvas, in basic mode, showing hashtags related to … me (@rmoff)!

But, surely I do more than talk about OBIEE and ODI? Like, Elasticsearch? Let’s relax the Graph selection criteria, under Settings:

and run the search again (on top of the existing results):

There’s more results … but I know how much I tweet and it feels like I’m only seeing a part of the picture. By switching over to Advanced Mode, we can refine how many results each field returns:

I reset the workspace (undo to blank, or just reload), and run the search again, this time with a greater number of hashtag field values shown, and with the same relaxed search settings as shown above:

At this point I’m into “fiddling” territory, twiddling with the ‘Number of terms’, ‘Significant’ and ‘Certainty’ knobs to see how the results vary. You can read more about the algorithm behind the Significance setting here, and more about the Graph API here. The certainty setting is simply “The min number of documents that are required as evidence before introducing a related term”, so by lowering it we see more links, but potentially with more “noise” too, of terms that aren’t really related.

An important point to note here is the dataset that I’m using is already biased because of the terms I’m including in my twitter feed search, therefore I’d expect to see this skew in the results below. See the section at the end of this article for more details of the dataset.

  • 50 terms, significant unticked, certainty 1 (as above)
  • 50 terms, significant ticked, certainty 1
  • 50 terms, significant ticked, certainty 3
  • 20 terms, significant ticked, certainty 1
  • 20 terms, significant unticked, certainty 1

Based on the above, “Significant” seems to reduce the number of relationships discovered, but increase the level of weight shown in those that are there.

Adding Additional Vertex Fields

So we’ve seen a basic overview of how to generate Graphs, expand selections, and add relationships to those additional selections. Let’s look now at how multiple fields can be added to a Graph.

Starting with a blank workspace, I switched to Advanced Mode and added two fields from my twitter data:

  • user.screen_name
  • in_reply_to_screen_name

Note that you can customise the colour and icon of different fields.

Under Options I’ve left Significant Links enabled, and set Certainty to 1.

Let’s see who’s been interacting about the recent E4 summit:

Whilst it looks like Mark Rittman is the centre of everything, this is actually highlighting a skew in the source dataset – which includes everything Mark tweets but not all tweets about E4. See the section at the end of this article for more details of the dataset.

The lower cluster is Mark as the addressee of tweets (i.e. he is the in_reply_to_screen_name), whilst the upper cluster is tweets that Mark has sent addressing others (i.e. he is the user.screen_name).

If we click on Add Links a couple of times we can see that there’s other connections here – for example, Mark replies to Stewart (@stewartbryson), who Christian Berg (@Nephentur) talks to, who in turn talks to Mark.

This being twitter and the age of narcissism, I’ll click on my vertex and click Expand Selection to see the people who in turn talk to me:

And by using Add Link see how they relate to those already shown in the Graph:

Viewing Associated Records

Within Graph there’s the option to view the data associated with one or more vertices. We do this by selecting a vertex and clicking on View Example Docs (in Elasticsearch parlance, a document is akin to a ‘row’ as traditional RDBMS folk would know it). From here select the field – for twitter the text field has the contents of the tweet:

Adding Even more Vertex Fields

So, we’ve got a bit of a picture of who talks to whom, but can we see what they’re talking about? We could use the text field shown above to see the contents of tweets but that’s down in the weeds of individual tweets – we want to step back a notch and get a summarised view.

First I add in the hashtag field:

And then deselect the two username fields. This is so that I can expand existing vertices, and instead of showing related hashtags and users, instead I only expand it to show hashtags – and not additional users.

Now I select Mark as the orinator of a tweet, and Expand Selection followed by Add Links on all vertices until I get this:

The number of values selected is key in getting a representative Graph. Above I used a value of 10. Compare that to instead running the same process but with 50. Under Options I’ve left Significant Links enabled, and set Certainty to 1:

One interesting point we can see from this is that the user “itknowingness” in the cluster on the left seems to use all the hashtags, but doesn’t interact with anyone – from the Graph it’s easy to see, and a great example of where Graph gives you the answer to a question you didn’t necessarily know that you had, and which to get the answer out through a traditional RDBMS query would need a very specific query to do so. Looking at the source data via Kibana’s Discover panel shows that it is indeed a bot auto-retweeting anything and everything:

Building a Graph from Scratch

Now that we’ve seen all the salient functions, let’s start with a blank canvas, and see where we get.
The setttings I’m using are:

  • Significant Links unticked
  • Certainty = 1
  • Field entities.hashtags.text.analyzed max terms = 10
  • Field user.screen_name max terms = 10
  • Initial search term rmoff

Then I click on markrittman and Expand Selection, the same for mrainey, and also for the two hashtags e4 and hadoop:

Within the clusters, let’s see what links exist. With no vertices select I click on Add Links (which seems to be the same as selecting all vertices and doing the same). With each click additional links are added, all related to the hadoop/bigdata area:

I’m interested now in the E4 region of the Graph, and the vertices related to Mark Rittman. Clicking on his vertex and clicking “Select Neighbours” does exactly that:

Now I’m more interested in digging into the terms (hashtags) that are related that people, so I deselect the user.screen_name field, and then Expand Selection and Add Links again.

Note the width of the connections – a strong relationship between Mark Rittman, “Hadoop” and “SQL”, which is presumably from the tweets around the presentation he did recently on the subject of… SQL on Hadoop. Other terms, including Hive and Impala, are also related, as you’d expect.

Graphing Tweet Text Contents

By making sure that the tweet text is available as an analysed field we can produce a Graph based on the ‘tokens’ within the tweet, rather than the literal 140 characters. Whilst hashtags are there deliberately to help with the classification and grouping of tweets (so that other people can follow conversations on the same subject) there are two reasons why you’d want to look at the tweet text too:

  1. Not everyone uses hashtags
  2. Not all relationships are as boolean as a hashtag or not – maybe a general discussion in an area re-uses the same words which overall forms a relationship between the terms.

Here I’m going back to the default settings:

  • Significant Links ticked
  • Certainty = 3

And returning two fields – hashtag and tweet text

  • Field entities.hashtags.text.analyzed max terms = 20
  • Field text.analyzed max terms = 50
  • Initial search term kafka

I then tidy it up a bit :

  • Joining the same/near-same text and hashtags, such as “kafkasummit” hashtag and the same text. If you think about the contents of a tweet, hashtags are part of the text, therefore, there’s going to be a lot of this duplication.
  • Blacklisted text terms that are URL snippets. Here I’m using the Example Docs function to check the context of the term in the whole text field

    I also blacklisted common words (“the”, “of”, etc), and foreign ones (how British…).

Behind the Scenes

The Kibana Graph plugin is just a front-end for the Graph extension in Elasticsearch. It’s useful (and fun!) for exploring data, but in practice you’d be making direct REST API calls into Elasticsearch to retrieve a list of vertices and connections and relative weights for use in your application. You can see details of this from the Settings page and Last Request option

Looking at an example (the one used in the first example on this article), the request is pretty simple:

{
    "query": {
        "query_string": {
            "default_field": "_all",
            "query": "kibana"
        }
    },
    "controls": {
        "use_significance": true,
        "sample_size": 2000,
        "timeout": 5000
    },
    "connections": {
        "vertices": [
            {
                "field": "entities.hashtags.text.analyzed",
                "size": 5,
                "min_doc_count": 3
            }
        ]
    },
    "vertices": [
        {
            "field": "entities.hashtags.text.analyzed",
            "size": 5,
            "min_doc_count": 3
        }
    ]
}

and the response not too complex either, just long.

{
    "took": 201,
    "timed_out": false,
    "failures": [],
    "vertices": [
        {
            "field": "entities.hashtags.text.analyzed",
            "term": "logstash",
            "weight": 0.1374238061561338,
            "depth": 0
        },
        {
            "field": "entities.hashtags.text.analyzed",
            "term": "timelion",
            "weight": 0.12719678206002483,
            "depth": 0
        },
        {
            "field": "entities.hashtags.text.analyzed",
            "term": "elasticsearch",
            "weight": 0.11733085557405047,
            "depth": 0
        },
        {
            "field": "entities.hashtags.text.analyzed",
            "term": "osdc",
            "weight": 0.00759026383038536,
            "depth": 1
        },
        {
            "field": "entities.hashtags.text.analyzed",
            "term": "letsencrypt",
            "weight": 0.006869972953128271,
            "depth": 1
        },
        {
            "field": "entities.hashtags.text.analyzed",
            "term": "kibana",
            "weight": 0.6699955212823048,
            "depth": 0
        },
        {
            "field": "entities.hashtags.text.analyzed",
            "term": "filebeat",
            "weight": 0.004700657388257993,
            "depth": 1
        },
        {
            "field": "entities.hashtags.text.analyzed",
            "term": "elk",
            "weight": 0.09717015256984456,
            "depth": 0
        },
        {
            "field": "entities.hashtags.text.analyzed",
            "term": "justsayin",
            "weight": 0.005724977460940227,
            "depth": 1
        },
        {
            "field": "entities.hashtags.text.analyzed",
            "term": "elasticsearch5",
            "weight": 0.004700657388257993,
            "depth": 1
        }
    ],
    "connections": [
        {
            "source": 0,
            "target": 3,
            "weight": 0.00759026383038536,
            "doc_count": 26
        },
        {
            "source": 7,
            "target": 5,
            "weight": 0.02004197094823259,
            "doc_count": 26
        },
        {
            "source": 5,
            "target": 4,
            "weight": 0.006869972953128271,
            "doc_count": 6
        },
        {
            "source": 5,
            "target": 0,
            "weight": 0.018289612748107368,
            "doc_count": 48
        },
        {
            "source": 0,
            "target": 6,
            "weight": 0.004700657388257993,
            "doc_count": 11
        },
        {
            "source": 7,
            "target": 0,
            "weight": 0.0038135609650491726,
            "doc_count": 10
        },
        {
            "source": 0,
            "target": 5,
            "weight": 0.0052711254217388415,
            "doc_count": 48
        },
        {
            "source": 0,
            "target": 9,
            "weight": 0.004700657388257993,
            "doc_count": 11
        },
        {
            "source": 5,
            "target": 1,
            "weight": 0.033204869273453314,
            "doc_count": 29
        },
        {
            "source": 1,
            "target": 5,
            "weight": 0.04492364819068228,
            "doc_count": 29
        },
        {
            "source": 5,
            "target": 8,
            "weight": 0.005724977460940227,
            "doc_count": 5
        },
        {
            "source": 2,
            "target": 5,
            "weight": 0.00015519515214322833,
            "doc_count": 80
        },
        {
            "source": 5,
            "target": 7,
            "weight": 0.022734810798933344,
            "doc_count": 26
        },
        {
            "source": 7,
            "target": 2,
            "weight": 0.0006823241440183544,
            "doc_count": 13
        }
    ]
}

Note how the connections are described using the relative (zero-based) instance number of the vertices. You can also see that the width of a connection is based on the weight (calculated from the significant terms algorithm), rather than document count. Compare the connection width of timelion/kibana (vertices 1 and 5 respectively), with a weighting of 0.33 (kibana -> timelion) and 0.045 (timelion -> kibana) but overlapping document count of 29:

with elasticsearch -> kibana that has an overlapping document count of 80 but only a weight of 0.0001.

Elasticsearch’s documentation describes the significant terms algorithm thus, using the example of suggesting “H5N1” when users search for “bird flu” in text:

In all these cases the terms being selected are not simply the most popular terms in a set. They are the terms that have undergone a significant change in popularity measured between a foreground and background set. If the term “H5N1” only exists in 5 documents in a 10 million document index and yet is found in 4 of the 100 documents that make up a user’s search results that is significant and probably very relevant to their search. 5/10,000,000 vs 4/100 is a big swing in frequency.

So from this, we can roughly say that Graph is looking at the number of documents in which timelion is mentioned as a proportion of the whole dataset, and then in the number of documents in which the hashtag Kibana exists and also timelion is mentioned. Since the former is a plugin of the latter, the close relationship would be expected. You can use Kibana to explore the significant terms concept further – for example, taking the same ‘seed’ as the original Graph query above, Kibana, gives a similar set of results as the Graph:

More information about the scoring can be found here, which includes the fact that the scoring is, in part, based on TF-IDF (Term Frequency-Inverse Document Frequency).

Licensing

Graph requires a licence – see here for details.

Conclusion

This tool is a great way to dip one’s toe into the waters of Graph analysis and visualisation. It’s another approach to consider in the data discovery phase of your analytics work, when you don’t even know the questions that you’ve got for the data in front of you. Your data can remain in Elasticsearch in the same format it’s always been, and the Graph function just runs on top of it.

I’ll not profess to be a Graph theory expert, so can’t pass much comment on the theoretical rigour of the results and techniques seen. One thing that struck me with it was that there’s no (apparent) way to manually influence the weight of connections and vertices – for example, based on the number of followers someone has one twitter consider them more (or less) relevant when determining relationships.

For a well-informed view on Graph theory and Social Network Analysis (SNA), see Jordan Meyer’s presentation here (and associated R code), as well as Mark Rittman’s presentation from BIWA this year.

Footnote: The Twitter Dataset

The dataset I’m using is a live stream from Twitter, via Logstash and Kafka, searching for a set of terms related to me and the field I work in. Therefore, there’s going to be a bunch of relationships missing (if I’ve not included the relevant term in my tweet search), and relationships over-stated (because as a proportion of all the records the terms I’ve selected will dominate).
An interesting use of Graph (or Elasticsearch’s significant terms aggregation in general) could be to identify all the relevant terms that I should be including in my twitter search, by sampling an ‘unpolluted’ feed for relationships. For example, if I’m interested in capturing Kafka tweets, perhaps I should also be capturing those related to Samza, Spark, and so on.

The post Experiments with Elastic’s Graph Tool appeared first on Rittman Mead Consulting.

Categories: BI & Warehousing

Turkish Oracle User Group Conference in Istanbul 2016 #TROUGDays

The Oracle Instructor - Fri, 2016-04-29 01:31

Straight after the Oracle University Expert Summit in Berlin – which was a big success, by the way – the circus moved on to another amazing place: Istanbul!

Istanbul viewThe Turkish Oracle User Group (TROUG) did its annual conference in the rooms of the Istanbul Technical University with local and international speakers and a quite attracting agenda.

Do you recognize anyone here?:-)

#TROUGDays speakers

I delivered my presentation “Best of RMAN” again like at the DOAG annual conference last year:

Uwe Hesse speaking in Istanbul

Many thanks to the organizers for making this event possible and for inviting us speakers to dinner

Istanbul speakers dinner

The conference was well received and in my view, it should be possible to attract even more attendees in the coming years by continuing to invite high-profile international speakers

audience

My special thanks to Joze, Yves and Osama for giving me your good company during the conference – even if that company was sometimes very tight during the car rides

Categories: DBA Blogs

private cloud vs public cloud

Pat Shuff - Fri, 2016-04-29 01:19
Today is our last day to talk about infrastructure as a service. We are moving up the stack into platform as a service after this. The higher up the stack we get, the more value it has to the business and end users. It is interesting to talk about storage and compute in the cloud but does this help us treat patients in a medical practice any better, find oil and gas faster, deliver our manufactured product any cheaper? Not really but not having them will negatively effect all of them. We need to make sure that these services are there so that we can perform higher functions without having to worry about triple redundant mirroring of a disk or load balancing compute servers to handle higher loads.

One of the biggest complaints with cloud services is that there are perception problems of security, latency, and governance. Why would I put my data on a computer that I don't control. There is a noisy neighbor issue where I am renting part of a computer, part of a disk, part of a network. If someone wants to play heavy metal at the highest volume (remember our apartment example) while I am trying to file a monthly report or do some analytics, my resources will suffer and it will take me longer to finish my job due to something out of my control.

Many people have decided to go with private hosted cloud solutions like VCE, VBlock, Cisco UCS clusters, and other products that provide raw compute, raw storage, and "hyper-converged" infrastructure to solve the cloud problem. I can create a golden master on my VMWare server and provision a database to my configuration as many times as I want. Yes, I run into a licensing issue. Yes, it is a really big licensing issue running Oracle on VMWare but Microsoft is not far behind with SQL Server licensing on a per core basis either. Let's look at the economics of putting together one of these private cloud servers.

It is important to dive into the licensing issue. The Oracle Database and WebLogic servers are licensed either on a processor or named user basis. Database licensing is detailed in a pdf on the Oracle site. The net of the license says that the database is licensed based on the core count of the processor running on the server. There is a multiplication factor (0.5 for X86) based on the chip type that factors into the license cost. A few years ago it was easy to do this calculation. If I have a dual core, dual socket system, this is a four core computer. The license price of the computer would be 4 cores x 0.5 (Intel x86 chip) x $47,500. The total price would be $95K. Suddenly the core count of computers went to 8, 16, or 32 cores per chip. A single system could easily have 64 cores on a single board. If you aggregate multiple boards as is done in a Cisco UCS system you can have 8 board or 256 cores that you can use. There are very few applications that can take advantage of 256 cores so a virtualization engine was placed on top of these cores so that you could sub-divide the system into smaller chunks. If you have a 4 core database problem, you can allocate 4 cores to it. If you need 8 cores, allocate 8 cores. Products like VMWare and HyperV took advantage of this and grew rapidly. These virtualization packages added features like live migration, dynamic sizing, and bursting utilization. If you allocate 4 cores and the processor goes to 90%, two more cores will be made available for a short burst. The big question comes up as to how you now license on a per core basis. If you can flex to more processors without rebooting or live migrate from a 2 core to a 24 core system, which do you license for?

Unfortunately, Oracle took a different position from the rest of the industry. None of the Oracle products contain a license key. None of the products require that you go to a web site and get a token to allow you to run the software. The code is wide open and freely available to load and run on any system that you want. Unfortunately, companies don't do a good job of tracking utilization. If someone from the sales or purchasing department rolls out a golden master onto a new virtual machine, no one is really tracking that. People outside of IT can easily spin up more licenses. They can provision a database in a cloud service and assume that the company has enough licenses to cover their project. After a while, licensing gets out of control and a license review is done to see what is actually being used and how it is being used. Named user licenses are great but you have to have a ratio of users to cores to meet minimums. You can't for example, buy a 5 user license and deploy it on a 64 core system. You have to maintain a typical ratio of 25 users to a core of 40 users to a core based on the product that you are using. You also need to make sure that you understand soft partitioning vs hard partitioning. Soft partitioning is the ability to flex or change the core count without having to reconfigure or reboot the system. A hard partition puts hard limits on the core count and does not allow you exceed it. Products like OracleVM, Solaris, and AIX contain hard partition virtualization. Products like HyperV and VMWare contain soft partitions. With soft partitions, you need to pay for all of the cores in the cluster since in theory you can consume all of the cores. To be honest, most people don't understand this and get in trouble with license reviews.

When we talk about cloud services, licensing is also important to understand. Oracle published cloud license rules to detail limits and restrictions. The database is still licensed on a per core basis. The Linux operating system is licensed on per server instance and is limited to 8 virtual cores. If you deploy the Oracle database or WebLogic server in AWS or Azure or any other cloud vendor, you have to own a perpetual license for the database using the formulas above. The license must correlate to the high water mark for the core count that you provision. If you provision a 4 core system, you need a 2 processor license. If you run the database for six months and shut it off, you still need to own the perpetual license. The only way to work around this is to purchase database as a service in the Oracle cloud. You can pay for the database license on an hourly or monthly basis with metered services or on an annual basis with non-metered services. This provides a great cost savings because if we only need a database for 6 months we only need to pay for 6 months x the number of cores x the database edition type. If, for example, we want just the Database Enterprise Edition, it is $3K/core/month. If we want 4 cores that is $12K per month. If we want 6 months then we get it for $72K. We can walk away from the license and not have to pay the 22% annual maintenance on the $95K. We save $23K the first year and $20K annually by only using the database in the cloud for six months. If we wanted to use the database for 9 months, it is cheaper to own the license and lease processor and storage. If we go to the next higher level of database, Database High Performance Edition at $4K/core/month, it becomes cheaper to use the cloud service because it contains so many options that cost $15K/processor. Features like partitioning, compression, diagnostics, tuning, real application testing, and encryption are part of this cloud service. Suddenly the economics is in favor of cloud hosting a database rather than hosting in a data center.

Let's go back to the Cisco UCS and network attached storage discussion. If we purchase a UCS 8 blade server with 32 cores per blade we are looking at $150K or higher for the server. We then want to attach a 100 TB disk array to it at about $300K (remember the $3K/TB). We will then have to pay $300K for VMWare. If we add 10% for hardware and 20% for software we are at just over $1M for a base solution. With a three year amortization we are looking at about $330K per year just to have a compute and storage infrastructure. We have to have a VMWare admin who doles out processors and storage, loads operating systems, creates golden masters, and acts as a traffic cop to manage and allocate resources. We still need to pay for the Oracle database license which is licensed on a per core basis. Unfortunately, with VMWare we must license all of the cores in the cluster so we either have to sub-divide the cluster into one blade and license all 32 cores or end up paying for all 256 cores. At roughly $25K/core that gets expensive quickly. Yes, you can run OracleVM or Solaris on one of the blades and subdivide the database into two cores and only pay for two cores since they both support hard partitioning but you would be amazed at how many people fight this solution. You now have two virtualization engines that you need to support with two different file formats. No one in mass wants two solutions just to solve a licensing issue.

Oracle has taken a radically different approach to this. Rather than purchasing hardware, storage, and a virtualization platform, run everything in the cloud and pay for it on a monthly basis. The biggest objection is that the cloud is in another city and security, latency, ... you get the picture. The new solution is to run this hardware in your data center with the Oracle Public Cloud Machine. The cost of this solution is roughly $260K/year with a three year commit. You get 200 plus cores and 100 ish TB of storage to use as you want. You don't manage it with VSphere but manage it with the same web page that you manage the public cloud services. If you want to script everything then you can manage it with REST apis or perl/java/insert your lanaguage scripts. The key benefit to this is that you no longer need to worry about what virtualization engine you are using. You manage the higher level and lease the database or weblogic or SOA license on an hourly or monthly basis.

Next week we will move up the stack and look at database hosting. Today we talked about infrastructure choices and how it impacts database license cost. Going with AWS or Azure still requires that you purchase the database license. Going with the Oracle public cloud or public cloud machine allows you to not own the database license but effectively lease it on an hourly or monthly basis. It might not be the right solution for 7x24x365 operation but it might be. It really is the right solution for bursty needs list holiday peak periods, student registration systems, development and testing where you only need a large footprint for a few weeks and don't need to buy for your highwater mark and run at 20% the rest of the year.

Migrating Oracle Utilities products from On Premise to Oracle Public Cloud

Anthony Shorten - Thu, 2016-04-28 17:48

A while back Oracle Utilities announced that the latest releases of the Oracle Utilities Application Framework applications were supported on Platform As A Service (PaaS) on Oracle Public Cloud. As part of that support a new whitepaper has been released outlining the process of migrating an on-premise installation of the product to the relevant Platform As A Service offering on Oracle Public Cloud.

The whitepaper covers the following from a technical point of view:

  • The Oracle Cloud services to obtain to house the products, including the Oracle Java Cloud Service and Oracle Database As A Service with associated related services.
  • Setup instructions on how to configure the services in preparation to house the product.
  • Instructions of how to prepare the software for transfer.
  • Instructions on how to transfer the product schema to a Oracle Database As A Service instance using various techniques.
  • Instructions on how to transfer the software and make configuration changes to realign the product installation for the cloud. The configuration must follow the instructions in the Native Installation Oracle Utilities Application Framework (Doc Id: 1544969.1) available from My Oracle Support which has also been updated to reflect the new process.
  • Basic instructions on using the native cloud facilities to manage your new PaaS instances. More information is available in the cloud documentation.

The whitepaper applies to the latest releases of the Oracle Utilities Application Framework based products only. Customers and partners wanting to establish new environments (with no previous installation) can use the same process with the addition of actually running the installation on the cloud instance.

Customers and partners considering using Oracle Infrastructure As A Service can use the same process with the addition of installing the prerequisites.

The Migrating From On Premise To Oracle Platform As A Service (Doc Id: 2132081.1) whitepaper is available from My Oracle Support. This will be the first in a series of cloud based whitepapers.

Oracle Fusion Middleware : Concepts & Architecture : If I can do it, you can too

Online Apps DBA - Thu, 2016-04-28 15:43

If I can do it, you can too…and I truly believe this.      In this post I am going to cover what is Oracle Fusion Middleware (FMW), Why I learnt it and What & How you should learn it too.     Before I tell more about FMW, for those who don’t know me, 16 Years ago, […]

The post Oracle Fusion Middleware : Concepts & Architecture : If I can do it, you can too appeared first on Oracle Trainings for Apps & Fusion DBA.

Categories: APPS Blogs

A Practitioner’s Assessment: Digital Transformation

Pythian Group - Thu, 2016-04-28 12:49

 

Rohinee Mohindroo is a guest blogger on Pythian Business Insights.

 

trans·for·ma·tion/ noun: a thorough or dramatic change in form or appearance

The digital transformation rage continues into 2016 with GE, AT&T, GM, Domino’s, Flex, and Starbucks, to name a few. So what’s the big deal?

Technical advances continue to progress at a rapid rate. Digital transformation simply refers to the rate at which the technological trends are embraced by an individual, organization or team.

Organizational culture and vocabulary are leading indicators of the digital transformation maturity level.

blogimagerohinee

Level 1: Business vs. Tech (us vs. them). Each party is fairly ignorant of the value and challenges of the other. Each blames the other for failures and takes credit for successes. Technology is viewed as a competency with a mandate to enable the business.

Level 2: Business and Tech (us and them). Each party is aware of the capability and challenges of the other. Credit for success is shared, failure is not discussed publicly or transparently. Almost everyone  is perceived to be technically literate with a desire to deliver business differentiation.

Level 3: Business is Tech (us). Notable awareness of the business model and technology capabilities and opportunities throughout the organization. Success is expected and failure is an opportunity. The organization is relentlessly focused on learning from customers and partners with a shared goal to continually re-define the business.

Which level best describes you or your organization? Please share what inhibits your organization from moving to the next level.

 

Categories: DBA Blogs

Slides and demo script from my APEX command line scripting talk at APEX Connect 2016 in Berlin

Dietmar Aust - Thu, 2016-04-28 11:30
Hi everybody,

I just came back from the DOAG APEX Connect 2016 conference in Berlin ... very nice location, great content and the wonderful APEX community to hang out with ... always a pleasure. This time we felt a little pinkish ;)

As promised, you can download the slides and the demo script (as is) from my site. They are in German, but I will give the talk in June at KScope 2016 in Chicago in English just as well.

Instructions are included.

See you at KScope in Chicago, #letswreckthistogether .

Cheers and enyoy!
~Dietmar. 

Slides and demo script from my ORDS talk at APEX Connect 2016 in Berlin

Dietmar Aust - Thu, 2016-04-28 11:25
Hi everybody,

I just came back from the DOAG APEX Connect 2016 conference in Berlin ... very nice location, great content and the wonderful APEX community to hang out with ... always a pleasure. This time we felt a little bit pink ;)

As promised, you can download the slides and the demo script (as is) from my site.

Instructions are included.

See you at KScope in Chicago, #letswreckthistogether .

Cheers and enyoy!
~Dietmar. 

Log Buffer #471: A Carnival of the Vanities for DBAs

Pythian Group - Thu, 2016-04-28 09:14

This Log Buffer Edition covers Oracle, SQL Server and MySQL blog posts of the week.

Oracle:

Improving PL/SQL performance in APEX

A utility to extract and present PeopleSoft Configuration and Performance Data

No, Oracle security vulnerabilities didn’t just get a whole lot worse this quarter.  Instead, Oracle updated the scoring metric used in the Critical Patch Updates (CPU) from CVSS v2 to CVSS v3.0 for the April 2016 CPU.  The Common Vulnerability Score System (CVSS) is a generally accepted method for scoring and rating security vulnerabilities.  CVSS is used by Oracle, Microsoft, Cisco, and other major software vendors.

Oracle Cloud – DBaaS instance down for no apparent reason

Using guaranteed restore points to navigate through time

SQL Server:

ANSI SQL with Analytic Functions on Snowflake DB

Exporting Azure Data Factory (ADF) into TFS Source Control

Getting started with Azure SQL Data Warehouse

Performance Surprises and Assumptions : DATEADD()

With the new security policy feature in SQL Server 2016 you can restrict write operations at the row level by defining a block predicate.

MySQL:

How to rename MySQL DB name by moving tables

MySQL 5.7 Introduces a JSON Data Type

Ubuntu 16.04 first stable distro with MySQL 5.7

MariaDB AWS Key Management Service (KMS) Encryption Plugin

MySQL Document Store versus Bug hunter

Categories: DBA Blogs

No Filters: My ASU/GSV Conference Panel on Personalized Learning

Michael Feldstein - Thu, 2016-04-28 07:57

By Michael FeldsteinMore Posts (1069)

ASU’s Lou Pugliese was kind enough to invite me to participate on a panel discussion on “Next-Generation Digital Platforms,” which was really about a soup of adaptive learning, CBE, and other stuff that the industry likes to lump under the heading “personalized learning” these days. One of the reasons the panel was interesting was that we had some smart people on the stage who were often talking past each other a little bit because the industry wants to talk about the things that it can do something about—features and algorithms and product design—rather than the really hard and important parts that it has little influence over—teaching practices and culture and other messy human stuff. I did see a number of signs at the conference (and on the panel) that ed tech businesses and investors are slowly getting smarter about understanding their respective roles and opportunities. But this particular topic threw the panel right into the briar patch. It’s hard to understand a problem space when you’re focusing on the wrong problems. I mean no disrespect to the panelists or to Lou; this is just a tough nut to crack.

I admit, I have few filters under the best of circumstances and none left at all by the second afternoon of an ASU/GSV conference. I was probably a little disruptive, but I prefer to think of it as disruptive innovation.

Here’s the video of the panel:

The post No Filters: My ASU/GSV Conference Panel on Personalized Learning appeared first on e-Literate.

Standard SQL ? – Oracle REGEXP_LIKE

The Anti-Kyte - Thu, 2016-04-28 05:55

Is there any such thing as ANSI Standard SQL ?
Lots of databases claim to conform to this standard. Recent experience tends to make me wonder whether it’s more a just basis for negotiation.
This view is partly the result of having to juggle SQL between three different SQL parsers in the Cloudera Hadoop infrastructure, each with their own “quirks”.
It’s worth remembering however, that SQL differs across established Relational Databases as well, as a recent question from Simon (Teradata virtuoso and Luton Town Season Ticket Holder) demonstrates :

Is there an Oracle equivalent of the Teradata LIKE ANY operator when you want to match against a list of patterns, for example :

like any ('%a%', '%b%')

In other words, can you do a string comparison, including wildcards, within a single predicate in Oracle SQL ?

The short answer is yes, but the syntax is a bit different….

The test table

We’ve already established that we’re not comparing apples with apples, but I’m on a bit of a health kick at the moment, so…

create table fruits as
    select 'apple' as fruit from dual
    union all
    select 'banana' from dual
    union all
    select 'orange' from dual
    union all
    select 'lemon' from dual
/
The multiple predicate approach

Traditionally the search statement would look something like :

select fruit
from fruits
where fruit like '%a%'
or fruit like '%b%'
/

FRUIT 
------
apple 
banana
orange

REGEXP_LIKE

Using REGEXP_LIKE takes a bit less typing and – unusually for a regular expression – less non-alphanumeric characters …

select fruit
from fruits
where regexp_like(fruit, '(a)|(b)')
/

FRUIT 
------
apple 
banana
orange

We can also search for multiple substrings in the same way :

select fruit
from fruits
where regexp_like(fruit, '(an)|(on)')
/

FRUIT 
------
banana
orange
lemon 

I know, it doesn’t feel like a proper regular expression unless we’re using the top row of the keyboard.

Alright then, if we just want to get records that start with ‘a’ or ‘b’ :

select fruit
from fruits
where regexp_like(fruit, '(^a)|(^b)')
/

FRUIT 
------
apple 
banana

If instead, we want to match the end of the string…

select fruit
from fruits
where regexp_like(fruit, '(ge$)|(on$)')
/

FRUIT
------
orange
lemon

…and if you want to combine searching for patterns at the start, end or anywhere in a string, in this case searching for records that

  • start with ‘o’
  • or contain the string ‘ana’
  • or end with the string ‘on’

select fruit
from fruits
where regexp_like(fruit, '(^o)|(ana)|(on$)')
/

FRUIT
------
banana
orange
lemon

Finally on this whistle-stop tour of REGEXP_LIKE, for a case insensitive search…

select fruit
from fruits
where regexp_like(fruit, '(^O)|(ANA)|(ON$)', 'i')
/

FRUIT
------
banana
orange
lemon

There’s quite a bit more to regular expressions in Oracle SQL.
For a start, here’s an example of using REGEXP_LIKE to validate a UK Post Code.
There’s also a comprehensive guide here on the PSOUG site.
Now I’ve gone through all that fruit I feel healthy enough for a quick jog… to the nearest pub.
I wonder if that piece of lime they put in top of a bottle of beer counts as one of my five a day ?


Filed under: Oracle, SQL Tagged: regexp_like

Contemplating Upgrading to OBIEE 12c?

Rittman Mead Consulting - Thu, 2016-04-28 04:00
Where You Are Now

NewImage
OBIEE 12c has been out for some time, and it seems like most folks are delaying upgrading to OBIEE 12c until the very last minute. Or at least until Oracle decides to put out another major version change of OBIEE, which is understandable. You’ve already spent time and money and devoted hundreds of resource hours to system monitoring, maintenance, testing, and development. Maybe you’ve invested in staff training to try to maximize your ROI in your existing OBIEE purchase. And now, after all this time and effort, you and your team have finally gotten things just right. Your BI engine is humming along, user adoption and stickiness are up, and you don’t have a lot of dead objects clogging up the Web Catalog. Your report hacks and work-arounds have been worked and reworked to become sustainable and maintainable business solutions. Everyone is getting what they want.

Sure, this scenario is part fantasy, but it doesn’t mean that as a BI team lead or member, you’re not always working toward this end. It would be nice to think that the people designing the tools with which we do this work understood the daily challenges and processes we must undergo in order to maintain the precarious homeostasis of our BI ecosystems. That’s where Rittman Mead comes in. If you’re considering upgrading to OBIEE 12c, or are even curious, keep reading. We’re here to help.

So Why Upgrade

Let’s get right down to it. Shoot over here and here to check out what our very own Mark Rittman had to say about the good, the bad, and the ugly of 12c. Our Silvia Rauton did a piece on lots of the nuts and bolts of 12c’s new front-end features. They’re all worth a read. Upgrading to OBIEE 12c offers many exciting new features that shouldn’t be ignored.

Heat Map

How Rittman Mead Can Help

We understand what it is to be presented with so many project challenges. Do you really want to risk the potential perils and pitfalls presented by upgrading to OBIEE 12c? We work both harder and smarter to make this stuff look good. And we get the most out of strategy and delivery via a number of in-house tools designed to keep your OBIEE deployment in tip top shape.

Maybe you want to make sure all your Catalog and RPD content gets ported over without issue? Instead of spending hours on testing every dashboard, report, and other catalog content post-migration, we’ve got the Automated Regression Testing package in our tool belt. We deploy this series of proprietary scripts and dashboards to ensure that everything will work just the way it was, if not better, from one version to the next.

Maybe you’d like to make sure your system will fire on all cylinders or you’d like to proactively monitor your OBIEE implementation. For that we’ve got the Performance Analytics Dashboards, built on the open source ELK stack to give you live, active monitoring of critical BI system stats and the underlying database and OS.

OBIEE Performance Analytics

On top of these tools, we’ve got the strategies and processes in place to not only guarantee the success of your upgrade, but to ensure that you and your team remain active and involved in the process.

What to Expect

You might be wondering what kinds of issues you can expect to experience during upgrading to OBIEE 12c (which is to say, nothing’s going to break, right?). Are you going to have to go through a big training curve? Does upgrading to OBIEE 12c mean you’re going to experience considerable resource downtime as your team, or an even an outside company, manages this process? To answer this question, I’m reminded of a quote from the movie Fight Club: “Choose your level of involvement.”

While we always prefer to work alongside your BI or IT team to facilitate the upgrade process, we also know that resource time is valuable and that your crew can’t stop what they’re doing until things wraps up. We often find that the more clients are engaged with the process, however, the easier the hand-off is because clients better understand best practices, and IT and BI teams are more empowered for the future.

Learning More about OBIEE 12c

But if you’re like many organizations, maybe you have to stay more hands off and get training after the upgrade is complete. Check out the link here to look over the agenda of our OBIEE 12c Bootcamp training course. Like our hugely popular 11g course, this program is five days of back-to-front instruction taught via a selection of seminars and hands-on labs, designed to impart most everything your team will need to know to continue or begin their successful BI practice.

What we often find is that, in addition to being a thorough and informative course, the Bootcamp is a great way to bring together teams or team members, often dispersed among different offices, under one roof to gain common understanding about how each person plays an important role as a member of the BI process. Whether they handle the ETL, data modeling, or report development, everyone can benefit from what often evolves from a training session into some impromptu team building.

Feel Empowered

If you’re still on the fence about whether or not to upgrade, as I said before, you’re not alone. There are lots of things you need to consider, and rightfully so. You might be thinking, “What does this mean for extra work on the plates of my resources? How can I ensure the success of my project? Is it worth it to do it now, or should I wait for the next release?” Whatever you may be mulling over, we’ve been there, know how to answer the questions, and have some neat tools in our utility belt to move the process along. In the end, I hope to have presented you with some bits to aid you in making a decision about upgrading to OBIEE 12c, or at least the impetus to start thinking about it.

If you’d like any more information or just want to talk more about the ins and outs of what an upgrade might entail, send over an email or give us a call.

The post Contemplating Upgrading to OBIEE 12c? appeared first on Rittman Mead Consulting.

Categories: BI & Warehousing

Any Questions

Jonathan Lewis - Thu, 2016-04-28 02:56

I’m going to be at the OUG Scotland conference on 22nd June, and one of my sessions is a panel session on Optimisation where I’ll be joined by Joze Senegacnik and Card Dudley.

The panel is NOT restricted to questions about how the cost based optimizer works (or not), we’re prepared to tackle any questions about making Oracle work faster (or more efficiently – which is not always the same thing). This might be configuration, indexing, other infrastructure etc.; and if we haven’t got a clue we can always ask the audience.

To set the ball rolling on the day it would be nice to have a few questions in advance, preferably from the audience but any real-world problems will be welcome and (probably) relevant to the audience. If you have a question that you think suitable please email it to me or add it as a comment below. Ideally a question will be fairly short and be relevant to many people; if you have to spend a long time setting the scene and supplying lots of specific detail then it’s probably a question that an audience (and the panel) would not be able to follow closely enough to give relevant help.

Update 29th April

I’ve already had a couple of questions in the comments and a couple by email – but keep them coming.

 

 

 


Server Problems : Update

Tim Hall - Thu, 2016-04-28 01:08

hard-disk-42935_640This is a follow on from my server problems post from yesterday…

Regarding the general issue, misiaq came up with a great suggestion, which was to use watchdog. It’s not going to “fix” anything, but if I get a reboot when the general issue happens, that would be much better than having the server sit idle for 5 hours until I  wake up.

Archive Storage Services vs Archive Cloud Services

Pat Shuff - Thu, 2016-04-28 01:07
Yesterday we started talking about the cost comparison for storage in the cloud. We briefly touched on the cost of long term archive in the cloud. How much does it cost to backup data for long term archive and what is the best way to do this? Years ago the default way of doing this was to copy your data on disk to a tape unit and put the tape in a box. The box was then put in an environmentally controlled room to extend the lifetime of tape and a person was put on staff to pull the data off the shelf when the data was needed. The data might be a backup of data on disk or a secondary copy just in case the disk failed. Tape was typically used to provide separation of duties required by Sarbanes-Oxly to keep people who report on financial data separate from the financial data. It also allowed companies to take large volumes of data, like seismic data, and not keep it on spinning disks. The traces were reloaded when geophysicists wanted to look at the data.

The first innovation in this technology was to provide a robot to load and unload tapes as a tape unit gets full or needs to be reloaded. Magazines were created that could hold eight tapes and the robots had bar code readers so that they could seek to the right tape in the magazine, pull it out of the series of tapes and inserted into the tape unit for reading or writing. Management software got more advanced and understood the bar code values and could sequence the whopping 800 GB of data that could be written to an LT04 tape. Again, technology gets updated and the industry moved to LT05 and LT06 tapes with significantly higher densities. A single LT06 could hold 2.5 TB per tape unit. Technology marches on and compression allows us to store 6 TB on these disks. If we go back to our 120 TB case that we talked about yesterday this means that we will need 20 tapes (at $30-$45 for each tape) and $25K for a single tape drive unit. Most tape drive systems support 8 tapes per magazine so we are talking about something that will support three magazines. To support three magazines, we need a second shelf in our tape storage so the price goes up by about $20K. We are sitting at about $55K to backup our 120 TB and $5.5K in support annually for the hardware. We also need about $1K in tape for the number of full and incremental backups that we want which would be $20K for four months of retention before we recycle the tapes. These tapes are good for a dozen re-writes so every three years we will need to repurchase tapes. If we spread the cost of the tape unit, tape drives, and tapes across three years we are looking at $2K/month to backup our 120 TB. We also need to factor in $60/week for tape pickup and storage fees at a service like Iron Mountain and a couple of $250 charges to retrieve tapes in the event of a catastrophic failure to drive tapes back to our data center from cold storage. This bumps the cost to $2.2K/month which is significantly cheaper than the $10K/month for network storage in our data center or $3.6K/month for cloud storage services. Unfortunately, a tape unit requires someone to care and feed it and you will pay that person more than $600/month but not $7.8K/month which you would with the cloud or disk solutions.

If you had a ton of data to archive you could purchase a tape silo that supported hundreds or thousands of magazines. Unfortunately, this expandability cones at a cost. The tape backup unit grew from an eighth of a rack to twenty full racks. There isn't much in between. You can get an eighth of a rack solution, a full rack solution, or a twenty full rack solution. The larger solution comes in at hundreds of thousands of dollars rather than tens of thousands.

Enter cloud solutions. Amazon and Oracle offer tape solutions in the cloud. Both companies offer the twenty full rack solution but only charge a per tape charge to consumers. Amazon Glacier charges $7/TB/month to store data. Oracle charges $1/TB/month for the same service. Both companies charge for data restoration and outbound transfer of data. The Amazon Glacier cost of writing 120 TB and reading back 10% of it comes in at $2218/month. This is the same cost as having the tape unit on site. The key difference is that we can recover the data by requesting it from Amazon and get it back in less than four hours. There is no emergency recovery charges. There is not the weekly pickup charges. We can expand the amount that we backup and the bulk of this cost is reading back the data ($1300). Storage is relatively cheap for our backups, we just need to plan on the cost of recovery and try to limit this since it is the bulk of the cost.

We can drop this cost even more using the Oracle Archive Cloud Services. The price from Oracle is $1/TB/month but the recovery and transmission charges are about the same. The same archive service with Oracle is $1560/month with roughly $1300 being the charges for restoring and outbound transfer of the data. Unfortunately, Oracle does not offer an un-metered archive service so we have to guestimate how much we are going to restore on a monthly basis.

Both services use REST apis to write, restore, and read data. When a container (Oracle Archive) or bucket (Amazon Glacier) is created, a PUT call is done to the endpoint of the service. The first step required by both are authentication to provide credentials into the service. Below we show the Oracle authentication and creation process through the REST api.

The important part of this is the archive header extension. This differentiates if the container is spinning disk or if it is tape in the cloud.

Amazon recommends using a windows based tool like s3browser, CloudBerry, or using a language like Java, .NET, or Ruby and their published SDKs. CloudBerry works for the Oracle Archive as well. When you create a container you have the option of pulling down storage or archive as the container type.

Both services allow you to encrypt and compress the data as it is written with HTML Headers changing the characteristics and parameters of the container. Both services require you to issue a PUT request to write the data to tape. Below we show the Oracle REST api.

For CloudBerry and the other gui based tools, uploading is just a drag and drop from your local file system to the tape storage in the cloud.

Amazon details the readback procedure and job system that shows the status of the restore request. Oracle has a similarly defined retrieval policy as well as an archive tutorial. Both services offer a 4 hour window to allow for restoration. Below is an example of a restore request and checking on the job status of the job spawned to load the tape and transfer the data for reading. The file is ready to read when the completedPercentage is 100.

We can do the same thing with the S3 browser and Amazon Glacier. We need to request the restore, check the job status, then download the restored files. The files change color when they are ready to read.

In summary, we have looked at how to reduce cost of archives and backups. We looked at using a secondary disk at our data center or another data center. We looked at using on site tape units. We looked at disk in the cloud. Today we looked at tape in the cloud. It is important to remember that not one of these solutions is the answer. A combination of any or all of them are needed. Daily and weekly backups should happen to a secondary disk locally. This data is most likely to be restored on a regular basis. Once you get a full backup or two under your belt, move the data to another site. It might be spinning disk, it might be tape but something needs to be offsite in the event of a true catastrophic failure like a communication link going out (think Dell PowerVault and a thunderstorm) and you loose your primary lun and secondary lun that contains your backups. The whole idea of offsite backups are not for restore but primary for insurance and regulation compliance. If someone needs to see the old data, it is there. You are betting that you won't need to read it back and the cloud vendors are counting on that. If you do read it back on a regular basis you might want to significantly increase your budget, pass the charges onto the people who want to read data back, or look for another solution. Tape storage in the cloud is a great way of archiving data for a long time at a low cost.

Fall 2014 IPEDS Data: Top 30 largest online enrollments per institution

Michael Feldstein - Wed, 2016-04-27 17:21

By Phil HillMore Posts (402)

The National Center for Educational Statistics (NCES) and its Integrated Postsecondary Education Data System (IPEDS) provide the most official data on colleges and universities in the United States. This is the third year of data.

Let’s look at the top 30 online programs for Fall 2014 (in terms of total number of students taking at least one online course). Some notes on the data source:

  • I have combined the categories ‘students exclusively taking distance education courses’ and ‘students taking some but not all distance education courses’ to obtain the ‘at least one online course’ category;
  • Each sector is listed by column;
  • IPEDS tracks data based on the accredited body, which can differ for systems – I manually combined most for-profit systems into one institution entity as well as Arizona State University[1];
  • See this post for Fall 2013 Top 30 data and see this post for Fall 2014 profile by sector and state.
Fall 2014 Top 30 Largest Online Enrollments Per Institution Number Of Students Taking At Least One Online Course (Graduate & Undergraduate Combined)

Top 30 Online Enrollments By Fall 2014 IPEDS Data

The post Fall 2014 IPEDS Data: Top 30 largest online enrollments per institution appeared first on e-Literate.

Fall 2014 IPEDS Data: New Profile of US Higher Ed Online Education

Michael Feldstein - Wed, 2016-04-27 16:12

By Phil HillMore Posts (402)

The National Center for Educational Statistics (NCES) and its Integrated Postsecondary Education Data System (IPEDS) provide the most official data on colleges and universities in the United States. I have been analyzing and sharing the data in the initial Fall 2012 dataset and for the Fall 2013 dataset. Both WCET and the Babson Survey Research Group also provide analysis of the IPEDS data for distance education. I highly recommend the following analysis in addition to the profile below (we have all worked together behind the scenes to share data and analyses).

Below is a profile of online education in the US for degree-granting colleges and university, broken out by sector and for each state.

Please note the following:

  • For the most part distance education and online education terms are interchangeable, but they are not equivalent as DE can include courses delivered by a medium other than the Internet (e.g. correspondence course).
  • I have provided some flat images as well as an interactive graphic at the bottom of the post. The interactive graphic has much better image resolution than the flat images.
  • There are three tabs below in the interactive graphic – the first shows totals for the US by sector and by level (grad, undergrad); the second also shows the data for each state; the third shows a map view.
  • Yes, I know I’m late this year in getting to the data.

By Sector

If you select the middle tab, you can view the same data for any selected state. As an example, here is data for Virginia in table form.

By State Table VA

There is also a map view of state data colored by number of, and percentage of, students taking at least one online class for each sector. If you hover over any state you can get the basic data. As an example, here is a view highlighting Virginia private 4-year institutions.

By State Map VA

Interactive Graphic

For those of you who have made it this far, here is the interactive graphic. Enjoy the data.

The post Fall 2014 IPEDS Data: New Profile of US Higher Ed Online Education appeared first on e-Literate.

Oracle Security And Delphix Paper and Video Available

Pete Finnigan - Wed, 2016-04-27 14:50

I did a webinar with Delphix on 30th March 2016 on USA time. This was a very good session with some great questions at the end from the attendees. I did a talk on Oracle Security in general, securing non-production....[Read More]

Posted by Pete On 01/04/16 At 03:43 PM

Categories: Security Blogs