Other

Interana

DBMS2 - Mon, 2017-04-17 05:10

Interana has an interesting story, in technology and business model alike. For starters:

  • Interana does ad-hoc event series analytics, which they call “interactive behavioral analytics solutions”.
  • Interana has a full-stack analytic offering, include:
    • Its own columnar DBMS …
    • … which has a non-SQL DML (Data Manipulation Language) meant to handle event series a lot more fluently than SQL does, but which the user is never expected to learn because …
    • … there also are BI-like visual analytics tools that support plenty of drilldown.
  • Interana sells all this to “product” departments rather than marketing, because marketing doesn’t sufficiently value Interana’s ad-hoc query flexibility.
  • Interana boasts >40 customers, with annual subscription fees ranging from high 5 figures to low 7 digits.

And to be clear — if we leave aside any questions of marketing-name sizzle, this really is business intelligence. The closest Interana comes to helping with predictive modeling is giving its ad-hoc users inspiration as to where they should focus their modeling attention.

Interana also has an interesting twist in its business model, which I hope can be used successfully by other enterprise software startups as well.

  • For now, at no extra charge, Interana will operate its software for you as a managed service. (A majority of Interana’s clients run the software on Amazon or Azure, where that kind of offering makes sense.)
  • However, presumably in connection with greater confidence in its software’s ease of administration, Interana will move this year toward unbundling the service as an extra-charge offering on top of the software itself.

The key to understanding Interana is its DML. Notes on that include:

  • Interana’s DML is focused on path analytics …
    • … but Interana doesn’t like to use that phrase because it sounds too math-y and difficult.
    • Interana may be the first company that’s ever told me it’s focused on providing a better nPath. :)
  • Primitives in Interana’s language — notwithstanding the company’s claim that it never ever intended to sell to marketing departments — include familiar web analytics concepts such as “session”, “funnel” and so on. (However, these are being renamed to more neutral terms such as “flow” in an upcoming version of the product.)
  • As typical example questions or analytic subjects, Interana offered:
    • “Which are the most common products in shopping carts where time-to-checkout was greater than 30 minutes?”
    • Exactly which steps in the onboarding process result in the greatest user frustration?
  • The Interana folks and I agree that Splunk is the most recent example of a new DML kicking off a significant company.
  • The most recent example I can think of in which a vendor hung its hat on a new DML that was a “visual programming language” is StreamBase, with EventFlow. That didn’t go all that well.
  • To use Founder/CTO Bobby Johnson’s summary term, the real goal of the Interana language is to describe a state machine, specifically one that produces (sets of) sequences of events (and the elapsed time between them).

Notes on Interana speeds & feeds include:

  • Interana only promises data freshness up to micro-batch latencies — i.e., a few minutes. (Obviously, this shuts them out of most networking monitoring and devops use cases.)
  • Interana thinks it’s very important for query response time to max out at a low number of seconds. If necessary, the software will return approximate results rather than exact ones so as to meet this standard.
  • Interana installations and workloads to date have gotten as large as:
    • 1-200 nodes.
    • Trillions of rows, equating to 100s of TBs of data after compression/ >1 PB uncompressed.
    • Billions of rows/events received per day.
    • 100s of 1000s of (very sparse) columns.
    • 1000s of named users.

Although Interana’s original design point was spinning disk, most customers store their Interana data on flash.

Interana architecture choices include:

  • They’re serious about micro-batching.
    • If the user’s data is naturally micro-batched — e.g. a new S3 bucket every few minutes — Interana works with that.
    • Even if the customer’s data is streamed — e.g. via Kafka — Interana insists on micro-batching it.
  • They’re casual about schemas.
    • Interana assumes data arrives with some kind of recognizable structure, via JSON, CSV or whatever.
      • Interana observes, correctly, that log data often is decently structured.
        • For example, if you’re receiving “phone home” pings from products you originally manufactured, you know what data structures to expect.
        • Interana calls this “logging with intent”.
      • Interana is fine with a certain amount of JSON (for example) schema change over time.
      • If your arriving data truly is a mess, then you need to calm it down via a pass through Splunk or whatever before sending it to Interana.
    • JSON hierarchies turn into multi-part column names in the usual way.
    • Interana supports one level of true nesting, and one level only; column values can be “lists”, but list values can’t be list themselves.

Finally, other Interana tech notes include:

  • Compression is a central design consideration …
    • … especially but not only compression algorithms designed to deal with great sparseness, such as run-length encoding (RLE).
    • Dictionary compression, in a strategy that is rarer than I once expected it to be, uses a global rather than shard-by-shard dictionary. The data Interana expects is of low-enough cardinality for this to be the better choice.
    • Column data is sorted. A big part of the reason is of course to aid compression.
    • Compression strategies are chosen automatically for each segment. Wholly automatically, I gather; you can’t tune the choice manually.
  • As you would think, Interana technically includes multiple data stores.
    • Data first hits a write-optimized store. Unlike the case of Vertica, this WOS never is involved in answering queries.
    • Asynchronously, the data is broken into columns, and banged to “disk”.
    • Asynchronously again, the data is sorted.
    • Queries run against sorted data, sorting recent blocks on-the-fly if necessary.
  • Interana lets you shard different replicas of the data according to different shard keys.
  • Interana is proud of the random sampling it does when serving approximate query results.
Categories: Other

Automate and expedite bulk loading into Windchill.

Data migration is the least attractive part of a PDM/PLM project.  Take a look at our latest infographic to learn how to speed up bulk loading data from Creo, Autodesk Inventor and AutoCAD, SolidWorks, Documents, WTParts and more into Windchill PDMLink and Pro/INTRALINK.

More information can also be found in our previous posts:

Approaches to Consider for Your Organization’s Windchill Consolidation Project

Consider Your Options for SolidWorks to Windchill Data Migrations

 

The post Automate and expedite bulk loading into Windchill. appeared first on Fishbowl Solutions.

Categories: Fusion Middleware, Other

Analyzing the right data

DBMS2 - Thu, 2017-04-13 07:05

0. A huge fraction of what’s important in analytics amounts to making sure that you are analyzing the right data. To a large extent, “the right data” means “the right subset of your data”.

1. In line with that theme:

  • Relational query languages, at their core, subset data. Yes, they all also do arithmetic, and many do more math or other processing than just that. But it all starts with the set theory.
  • Underscoring the power of this approach, other data architectures over which analytics is done usually wind up with SQL or “SQL-like” language access as well.

2. Business intelligence interfaces today don’t look that different from what we had in the 1980s or 1990s. The biggest visible* changes, in my opinion, have been in the realm of better drilldown, ala QlikView and then Tableau. Drilldown, of course, is the main UI for business analysts and end users to subset data themselves.

*I used the word “visible” on purpose. The advances at the back end have been enormous, and much of that redounds to the benefit of BI.

3. I wrote 2 1/2 years ago that sophisticated predictive modeling commonly fit the template:

  • Divide your data into clusters.
  • Model each cluster separately.

That continues to be tough work. Attempts to productize shortcuts have not caught fire.

4. In an example of the previous point, anomaly management technology can, in theory, help shortcut any type of analytics, in that it tries to identify what parts of your data to focus on (and why). But it’s in its early days; none of the approaches to general anomaly management has gained much traction.

5. Marketers have vast amounts of information about us. It starts with every credit card transaction line item and a whole lot of web clicks. But it’s not clear how many of those (10s of) thousands of columns of data they actually use.

6. In some cases, the “right” amount of data to use may actually be tiny. Indeed, some statisticians claim that fewer than 10 data points may be enough to get a good model. I’m skeptical, at least as to the practical significance of such extreme figures. But on the more plausible side — if you’re hunting bad guys, it may not take very many separate facts before you have good evidence of collusion or fraud.

Internet fraud excepted, of course. Identifying that usually involves sifting through a lot of log entries.

7. All the needle-hunting in the world won’t help you unless what you seek is in the haystack somewhere.

  • Often, enterprises explicitly invest in getting more data.
  • Keeping everything you already generate is the obvious choice for most categories of data, but some of the lowest-value-per-bit logs may forever be thrown away.

8. Google is famously in the camp that there’s no such thing as too much data to analyze. For example, it famously uses >500 “signals” in judging the quality of potential search results. I don’t know how many separate data sources those signals are informed by, but surely there are a lot.

9. Few predictive modeling users demonstrate a need for vast data scaling. My support for that claim is a lot of anecdata. In particular:

  • Some predictive modeling techniques scale well. Some scale poorly. The level of pain around the “scale poorly” aspects of that seems to be fairly light (or “moderate” at worst). For example:
    • In the previous technology generation, analytic DBMS and data warehouse appliance vendors tried hard to make statistical packages scale across their systems. Success was limited. Nobody seemed terribly upset.
    • Cloudera’s Data Science Workbench messaging isn’t really scaling-centric.
  • Spark’s success in machine learning is rather rarely portrayed as centering on scaling. And even when it is, Spark basically runs in memory, so each Spark node is processing all that much data.

10. Somewhere in this post — i.e. right here :) — let’s acknowledge that the right data to analyze may not be exactly what was initially stored. Data munging/wrangling/cleaning/preparation is often a big deal. Complicated forms of derived data can be important too.

11. Let’s also mention data marts. Basically, data marts subset and copy data, because the data will be easier to analyze in its copied form, or because they want to separate workloads between the original and copied data store.

  • If we assume the data is on spinning disks or even flash, then the need for that strategy declined long ago.
  • Suppose you want to keep data entirely in memory? Then you might indeed want to subset-and-copy it. But with so many memory-centric systems doing decent jobs of persistent storage too, there’s often a viable whole-dataset management alternative.

But notwithstanding the foregoing:

  • Security/access control can be a good reason for subset-and-copy.
  • So can other kinds of administrative simplification.

12. So what does this all suggest going forward? I believe:

  • Drilldown is and will remain central to BI. If your BI doesn’t support robust drilldown, you’re doing it wrong. “Real-time” use cases are not exceptions to this rule.
  • In a strong overlap with the previous point, drilldown is and will remain central to monitoring. Whatever monitoring means to you, the ability to pinpoint the specific source of interesting signals is crucial.
  • The previous point can be recast as saying that it’s crucial to identify, isolate and explain anomalies. Some version(s) of anomaly management will become a big deal.
  • SQL and “SQL-like” languages will remain integral to analytic processing for a long time.
  • Memory-centric analytic frameworks such as Spark will continue to win. The data size constraints imposed by memory-centric processing will rarely cause difficulties.

Related links

Categories: Other

Webinar Recording: Improve WebCenter Portal Performance by 30% and get out of Oracle ADF Development Hell

In this webinar Fishbowl’s Director of Solutions, Jerry Aber, shared how leveraging modern web development technologies like Oracle JET, instead of ADF taskflows, can dramatically improve the performance of a portal – including the overall time to load the home page, as well as making content or stylistic changes.

Jerry also shared how to architect a portal implementation to include a caching layer that further enhances performance. These topics were all be backed by real world customer metrics that Jerry and Fishbowl team have seen through numerous, successful customer deployments.

If you are a WebCenter Portal administrator and are frustrated with challenges of improving your ADF-centric portal, this webinar is for you. Watch to learn how to overhaul the ADF UI, which will lead to less development complexities and ensure more happy users.

 

The post Webinar Recording: Improve WebCenter Portal Performance by 30% and get out of Oracle ADF Development Hell appeared first on Fishbowl Solutions.

Categories: Fusion Middleware, Other

Hackathon Weekend at Fishbowl Solutions: Bots, Cloud Content Migrations, and Lightweight ECM Apps

Hackathon 2017 captains – from L to R: Andy Weaver, John Sim, and Jake Ferm.

It’s hackathon weeked at Fishbowl Solutions. This means our resident hackers (coders) will be working as teams to develop new solutions for Oracle WebCenter, enterprise search, and various cloud offerings. The theme overall this year is The Cloud, and each completed solution will integrate with a cloud offering from Oracle, Google, and perhaps even a few others if time allows.

This year three teams have formed, and they all began coding today at 1:00 PM. Teams have until 9:00 AM on Monday, April 10th to complete their innovative solutions. Each team will then present and demo their solution to everyone at Fishbowl Solutions during our quarterly meeting at 4 PM. The winning team will be decided by votes from employees that did NOT participate in the hackathon.

Here are the descriptions of the three solutions that will be developed over the weekend:

Team Captain: Andy Weaver
Team Name – for now: Cloud ECM Middleware
Overview: Lightweight ECM for The Cloud. Solution will provide content management capabilities (workflow, versioning, periodic review notifications, etc.) to Google’s cloud platform. Solution will also include a simple dashboard to notify users of documents awaiting their attention, and users will be able to use the solution on any device as well.

Team Captain: John Sim
Team Name: SkyNet – Rise of the Bots
Overview: This team has high aspirations as they will be working on a number of solutions. The first is a bot that they are calling Atlas that will essentially query Fishbowl’s Google Search Appliance and return documents, which are stored in Oracle WebCenter, based on what was asked. For example, “show me the standard work document on on ordering food for the hackathon”. The bot will use Facebook messenger as the input interface, and if time allows, a similar bot will be developed to support Siri, Slack, and Skype.

The next solution the team will try and code by Monday will be a self-service bot to query a human capital management/human resources system to return how many days of PTO the employee has.

The last solution will be a bot that integrates Alexa, which is the voice system that powers the Amazon Echo, with Oracle WebCenter. In this example, voice commands could be used to ask Alexa to tell the user the number of workflow items in their queue, or the last document checked in by their manager.

Team Captain: Jake Ferm
Team Name – for now: Cloud Content Migrator
Overview: Jake’s team will be working on an interface to enable users to select content to be migrated across Google Drive, Microsoft OneDrive, DropBox, and the Oracle Documents Cloud Service. The goal with this solution is to enable with as few clicks as possible the ability to, for example, migrate content from OneDrive to the Oracle Documents Cloud Service. They will also be working on ensuring that content with larger file sizes can be migrated in the background so that users can carry on with other computer tasks.

Please check back on Tuesday, April 11th for a recap of the event and details on the winning solution. Happy hacking!

Taco bar to fuel the hackers!

 

The post Hackathon Weekend at Fishbowl Solutions: Bots, Cloud Content Migrations, and Lightweight ECM Apps appeared first on Fishbowl Solutions.

Categories: Fusion Middleware, Other

Mindbreeze Partnership Brings GSA Migration Path for Customers

This morning Fishbowl announced a new partnership with Mindbreeze bringing additional enterprise search options to our customers. As a leading provider of enterprise search software, Mindbreeze serves thousands of customers around the globe spanning governments, banks, healthcare, insurance, and educational institutions. Last Friday, Gartner released the 2017 Insight Engines Magic Quadrant; Mindbreeze has been positioned highest for Ability to Execute.

With the sunsetting of the Google Search Appliance announced last year, Fishbowl has been undergoing an evaluation of alternatives to serve both new and existing customers looking to improve information discovery. While Fishbowl will continue to partner with Google on cloud search initiatives, we feel Mindbreeze InSpire provides a superior solution to the problems faced by organizations with large volumes of on-premise content. In addition to on-premise appliances, Mindbreeze also provides cloud search services with federation options for creating a single, hybrid search experience. We’re excited about the opportunity this partnership brings to once again help customers get more value from the millions of unstructured documents buried in siloed systems across the enterprise—particualrly those stored in Oracle WebCenter and PTC Windchill.

In the coming months, we’ll be expanding our connector offerings to integrate Mindbreeze Inspire with Oracle WebCenter Content and PTC Windchill. Mindbreeze InSpire is offered as an on-premise search appliance uniting information from varied internal data sources into one semantic search index. As a full-service Mindbreeze partner, Fishbowl will provide connectors, appliance resale, implementation services, and support for our customers. To learn more about Mindbreeze, GSA migration options, or beta access to our Mindbreeze connectors, please contact us.

The post Mindbreeze Partnership Brings GSA Migration Path for Customers appeared first on Fishbowl Solutions.

Categories: Fusion Middleware, Other

Monitoring

DBMS2 - Sun, 2017-03-26 06:16

A huge fraction of analytics is about monitoring. People rarely want to frame things in those terms; evidently they think “monitoring” sounds boring or uncool. One cost of that silence is that it’s hard to get good discussions going about how monitoring should be done. But I’m going to try anyway, yet again. :)

Business intelligence is largely about monitoring, and the same was true of predecessor technologies such as green paper reports or even pre-computer techniques. Two of the top uses of reporting technology can be squarely described as monitoring, namely:

  • Watching whether trends are continuing or not.
  • Seeing if there are any events — actual or impending as the case may be — that call for response, in areas such as:
    • Machine breakages (computer or general metal alike).
    • Resource shortfalls (e.g. various senses of “inventory”).

Yes, monitoring-oriented BI needs investigative drilldown, or else it can be rather lame. Yes, purely investigative BI is very important too. But monitoring is still the heart of most BI desktop installations.

Predictive modeling is often about monitoring too. It is common to use statistics or machine learning to help you detect and diagnose problems, and many such applications have a strong monitoring element.

I.e., you’re predicting trouble before it happens, when there’s still time to head it off.

As for incident response, in areas such as security — any incident you respond to has to be noticed first Often, it’s noticed through analytic monitoring.

Hopefully, that’s enough of a reminder to establish the great importance of analytics-based monitoring. So how can the practice be improved? At least three ways come to mind, and only one of those three is getting enough current attention.

The one that’s trendy, of course, is the bringing of analytics into “real-time”. There are many use cases that genuinely need low-latency dashboards, in areas such as remote/phone-home IoT (Internet of Things), monitoring of an enterprise’s own networks, online marketing, financial trading and so on. “One minute” is a common figure for latency, but sometimes a couple of seconds are all that can be tolerated.

I’ve posted a lot about all this, for example in posts titled:

One particular feature that could help with high-speed monitoring is to meet latency constraints via approximate query results. This can be done entirely via your BI tool (e.g. Zoomdata’s “query sharpening”) or more by your DBMS/platform software (the Snappy Data folks pitched me on that approach this week).

Perennially neglected, on the other hand, are opportunities for flexible, personalized analytics. (Note: There’s a lot of discussion in that link.) The best-acknowledged example may be better filters for alerting. False negatives are obviously bad, but false positives are dangerous too. At best, false positives are annoyances; but too often, alert fatigue causes you employees to disregard crucial warning signals altogether. The Gulf of Mexico oil spill disaster has been blamed on that problem. So was a fire in my own house. But acknowledgment != action; improvement in alerting is way too slow. And some other opportunities described in the link above aren’t even well-acknowledged, especially in the area of metrics customization.

Finally, there’s what could be called data anomaly monitoring. The idea is to check data for surprises as soon as it streams in, using your favorite techniques in anomaly management. Perhaps an anomaly will herald a problem in the data pipeline. Perhaps it will highlight genuinely new business information. Either way, you probably want to know about it.

David Gruzman of Nestlogic suggests numerous categories of anomaly to monitor for. (Not coincidentally, he believes that Nestlogic’s technology is a great choice for finding each of them.) Some of his examples — and I’m summarizing here — are:

  • Changes in data format, schema, or availability. For example:
    • Data can completely stop coming in from a particular source, and the receiving system might not immediately realize that. (My favorite example is the ad tech firm that accidentally stopped doing business in the whole country of Australia.)
    • A data format change might make data so unreadable it might as well not arrive.
    • A decrease in the number of approval fields might highlight a questionable change in workflow.
  • Data quality NULLs or malformed values might increase suddenly, in particular fields and data segments.
  • Data value distribution This category covers a lot of cases. A few of them are:
    • A particular value is repeated implausibly often. A bug is the likely explanation.
    • E-commerce results suddenly decrease, but only from certain client technology configuration. Probably there is a bug affecting only those particular clients.
    • Clicks suddenly increase from certain client technologies. A botnet might be at work.
    • Sales suddenly increase from a particular city. Again this might be fraud — or more benignly, perhaps some local influencers have praised your offering.
    • A particular medical diagnosis becomes much more common in a particular city. Reasons can range from fraud, to a new facility for certain kinds of tests, to a genuine outbreak of disease.

David offered yet more examples of significant anomalies, including ones that could probably only be detected via Nestlogic’s tools. But the ones I cited above can probably be found via any number of techniques — and should be, more promptly and accurately than they currently are.

Related links

Categories: Other

Replacing the “V” in Oracle ADF’s MVC design pattern with Oracle JET or other front end framework

This post was written by Fishbowl’s own John Sim – our resident Oracle User Experience expert. From front-end design to user journeys and persona mapping; John has helped numerous customers over 14 years enhance their desktop and mobile experiences with Oracle WebCenter. John is also an Oracle ACE, which recognizes leaders for their technical expertise and community evangelism.

One of our goals at Fishbowl is to continuously enhance and evolve the capabilities of WebCenter for both developers and clients with new tooling capabilities and pre-built custom components that are essential and not available today as part of the OOTB Oracle solution.

We have taken all of our collective knowledge and IP over the years since WebCenter PS3 and created the “Portal Solution Accelerator” previously known as “Intranet In A Box” that takes WebCenter Portal and it’s capabilities to the next level for creating Digital Workplace Portals.

We have taken all of our collective knowledge and IP over the years since WebCenter PS3 and created the “Portal Solution Accelerator” previously known as “Intranet In A Box” that takes WebCenter Portal and it’s capabilities to the next level for creating Digital Workplace Portals.

Today I’m going to cover one of the benefits of using our Portal Solution Accelerator: Replacing the “V” in ADFs MVC design pattern. This enables third party developers, web design agencies, marketers (with basic web design skills) to use other libraries and front end frameworks of their choosing such as Oracle JET, Angular, React, Vue, and Bootstrap – to name a few. By using a different front end library such as JET, you will be able to create more modern and dynamic responsive portals, widgets, and portlets with little to no experience of developing with ADF. You will also be able to leverage the benefits of ADF Model Controller and WebCenter’s Personalisation, Security, Caching and Mashup integration capabilities with other solutions like Oracle E-Business Suite (EBS) and Business Intelligence (BI) on the back end.

So, let’s take a closer look at the Portal Solution Accelerator in the following diagram. You can see it is made up of 2 core components – our back end PSA (Portal Solution Accelerator) component and our front end SPA (Single Page Application) component architecture. One of the things we decided early on is to separate the back end and front end architecture to allow for SPA front end components to be platform agnostic and allow them to work as a Progressive Web App and work on other platforms outside of Portal. This enables us to deploy SPA front end components directly onto BI to provide additional charting capabilities through their narrative components to EBS, SharePoint, and Liferay, as well as onto the cloud. This provides the potential for a hybrid on-premise Portal to Oracle Cloud (Site Cloud Service) Content Experience platform enabling reuse of our portal components and security on the Cloud.

To find out more about our Portal Solution Accelerator head over to our website – https://www.fishbowlsolutions.com/services/oracle-webcenter-portal-consulting/portal-solution-accelerator/

Lets go into a quick dive into WebCenter Portal Taskflows and our Single Page Application (SPA) architecture.

WebCenter Portal – allows you to create Widgets (ADF Taskflows) that can easily be dragged and dropped onto a page by a contributor and can work independently or alongside another taskflow. The interface View is currently generated at the back end with Java processes and can be easily optimised to enable support of adaptive applications. However, you should be aware that this model is very server process intensive.

  • Pros
    • If you know ADF development it makes it extremely fast to create connected web applications using the ADF UI.
    • The ADF generated HTML/JS/CSS UI supports Mobile and desktop browsers.
    • The UI is generated by the application allowing developers to create applications without the need for designers to be involved.
  • Cons
    • If you don’t know ADF or have a UI designed by a third party that does not align with ADFs UI capabilities , it can be very challenging to create complex UI’s using ADF tags, ADF Skins and ADFs Javascript framework.
    • It is a bad practice to combine mix and match open source libraries with ADF tags like jQuery or Bootstrap not supported by Oracle with ADF. This limits the reuse of the largely available open source to create dynamic interactive components and interfaces such as a Carousel etc.
    • It also can be very hard to brand, and is also very server process intensive.

Single Page Applications –  are essentially browser generated applications with Javascript that use AJAX to quickly and easily update and populate the user interface to create fluid and responsive web apps. Instead of the server processing and managing the DOM generated and sent to the client, the client’s browser processes and generates and caches the UI on the fly.

  • Pros
    • All modern front end frameworks allow you to create Single Page Applications and tie into lots of open source front end solutions and interfaces.
  • Cons
    • Can be hard to create Modular Isometric Universal JS applications.
    • You also need to test across browsers and devices your application is looking to support.
    • The front end application can get very large if not managed correctly.

The Portal Solution Accelerator.

What we have done with PSA is create a framework that provides the best of both worlds allowing you to create Modular Single Page Application taskflows that can be dragged and dropped onto a WebCenter Portal page. This allows your web design teams and agencies to manage and develop the front end quickly and effectively with any frameworks and standard HTML5, CSS, and Javascript. You can also use Groovy scripts or Javascript with (Oracle Nashorn) on the server side to create Isometric javascript taskflow applications.

Please note – you cannot create a taskflow that leverages both ADFs View model and our framework together. You can however create 1 taskflow that is pure ADF and drop it on the same page as a taskflow that has been created with a custom front end such as angular using our Portal Solution Accelerator View to replace ADF view. This enables you to use existing OOTB WebCenter Portal taskflows and have them work in conjunction with custom built components.

How Does it work?

Within WebCenter Portal in the composer panel where you can drag and drop in your taskflows onto a page there is a custom taskflow – Fishbowl Single Page Application.

Drop this onto the page and manage its parameters. Here is a quick screenshot of a sample taskflow component for loading in Recent News items.

The Template parameters points to a custom SPA frontend javascript component you would like to load in and inject into the taskflow. You can define custom parameters to pass to this component and these parameters can be dynamic ADF variables via the template parameter panel. The SPA component then handles the magic loading in the template, events, JS libraries CSS and images to be generated from within the taskflow.

Within the SPA API there are custom methods we have created that allow you to pass AJAX JSON calls to the ADF backend groovy or javascript code that enable the app to work and communicate with other services or databases.

ADF Lifecycle… Timeouts.

One of things that often comes up when we present our solution with others who have attempted to integrate JET applications with WebCenter portal is how do you manage the lifecycle and prevent ADF timeouts. For example, if you stay on the same WebCenter Portal page for some time working on a single page application you will get a popup saying you will be automatically logged out. Remember our Portal Solution Accelerator is a taskflow. We are using a similar ADF message queue to pass JSON updates to the ADF lifecycle when a user is working on a complex modular single page application so we don’t run into timeout issues.

Getting out of deployment hell (as well)!!!

One of the downsides with ADF development is having to build your ADF application and deploy stop and start the server to test and find there is a bug that needs to be fixed. And then go through the entire process again. Trust me – it is not quick!

Once you have our framework deployed you can easily deploy / upload standard Javascript Templates, CSS and groovy scripts to Apache or OHS that are automatically consumed by our ADF Taskflow. There is no stop start test. Just upload your updates and refresh the browser!!

I hear Oracle is working to integrate JET with ADF.

Yes, but it’s not there today.
Plus you’re not stuck to just JET with our framework. You can use React or any front end framework or library and you get the benefits of all the additional components, apps, tooling that the Portal Solution Accelerator provide.

Futures

Our next key release that we are working on is to fully support Progressive Web Application Taskflow Development. To find out more on what a progressive web app is head over to google – https://developers.google.com/web/progressive-web-apps/checklist

 

The post Replacing the “V” in Oracle ADF’s MVC design pattern with Oracle JET or other front end framework appeared first on Fishbowl Solutions.

Categories: Fusion Middleware, Other

PTC Windchill Success Story: The Benefits of Moving from PDM to PLM

PTC Windchill Success Story: The Benefits of Moving from PDM to PLM A Prominent Furniture Manufacturer deploys Fishbowl’s System Generated Drawing Automation to Increase Efficiencies with their Enterprise Part deployment within PTC Windchill

Our client has numerous global manufacturing facilities and is using PTC Windchill to streamline eBOM and mBOM processes. However, not all modifications to parts information propagates automatically/accurately at the drawing level. Updating plant specific drawings with enterprise part information was a time-consuming process that was manual, error prone, full of delays and diverted valuable engineering resources away from their value-added work.

The client desired a go-forward approach with their Windchill PLM implementation that would automatically update this critical enterprise part information. They became aware of our System Generated Drawing solution from a presentation at PTC LiveWorx. From the time of first contact the Fishbowl Solutions team worked to deliver a solution that helped them realize their vision.

BUSINESS PROBLEMS
  • Manufacturing waste due to ordering obsolete or incorrect parts
  • Manufacturing delays due to drawing updates needed for non-geometric changes – title block, lifecycle, BOM, as well as environmental/regulatory compliance markings, variant designs, etc.
  • Manually updating product drawings with plant specific parts information took away valuable engineering time
SOLUTION HIGHLIGHTS
  • Fishbowl’s System Generated Drawing Automation Systematically combines data from BOM, CAD, Drawing/Model, Part Attributes and enterprise resource planning (ERP) systems
  • Creates complete, static views of drawings based on multiple event triggers
  • Creates a template-based PDF that is overlaid along with the CAD geometry to produce a final document that can be dynamically stamped along with applicable lifecycle and approval information
  • Real-time watermarking on published PDFs
RESULTS

Increased accuracy of enterprise parts information included on drawings reduced product manufacturing waste
Allowed design changes to move downstream quickly, allowing a increase in design to manufacturing operational efficiencies

 

“Fishbowl’s System Generated Drawing Automation solution is the linchpin to our enterprise processes. It provides us with an automated method to include, update and proliferate accurate parts information throughout the business. This automation has in turn led to better data integrity, less waste, and more process efficiencies.” -PTC Windchill Admin/Developer

 

For more information about Fishbowl’s solution for System Generated Drawing Automation Click Here

The post PTC Windchill Success Story: The Benefits of Moving from PDM to PLM appeared first on Fishbowl Solutions.

Categories: Fusion Middleware, Other

Webinar: Improve WebCenter Portal Performance by 30% and get out of Oracle ADF Development Hell

DATE: Thursday, March 30th
TIME: 12:00 PM CST, 1:00 PM EST

Jerry AberJoin Fishbowl’s Enterprise Architect, Jerry Aber, as he shares recommendations on performance improvements for WebCenter-based portals. Jerry has been delivering portal projects for over 15 years, and has been instrumental in developing a technology framework and methodology that provides repeatable and reusable development patterns for portal deployments and their ongoing administration and management. In this webinar, Jerry will share how leveraging modern web development technologies like Oracle JET, instead of ADF taskflows, can dramatically improve the performance of a portal – including the overall time to load the home page, as well as making content or stylistic changes.

Jerry will also share how to architect a portal implementation to include a caching layer that further enhances performance. These topics will all be backed by real world customer metrics Jerry and Fishbowl team have seen through numerous, successful customer deployments.

If you are a WebCenter Portal administrator and are frustrated with challenges of improving your ADF-centric portal, this webinar is for you. Come learn how to overhaul the ADF UI, which will lead to less development complexities and ensure more happy users.

Register today. 

New to Zoom? Go to zoom.us/test to ensure you can access the webinar.

The post Webinar: Improve WebCenter Portal Performance by 30% and get out of Oracle ADF Development Hell appeared first on Fishbowl Solutions.

Categories: Fusion Middleware, Other

Cloudera’s Data Science Workbench

DBMS2 - Sun, 2017-03-19 19:41

0. Matt Brandwein of Cloudera briefed me on the new Cloudera Data Science Workbench. The problem it purports to solve is:

  • One way to do data science is to repeatedly jump through the hoops of working with a properly-secured Hadoop cluster. This is difficult.
  • Another way is to extract data from a Hadoop cluster onto your personal machine. This is insecure (once the data arrives) and not very parallelized.
  • A third way is needed.

Cloudera’s idea for a third way is:

  • You don’t run anything on your desktop/laptop machine except a browser.
  • The browser connects you to a Docker container that holds (and isolates) a kind of virtual desktop for you.
  • The Docker container runs on your Cloudera cluster, so connectivity-to-Hadoop and security are handled rather automagically.

In theory, that’s pure goodness … assuming that the automagic works sufficiently well. I gather that Cloudera Data Science Workbench has been beta tested by 5 large organizations and many 10s of users. We’ll see what is or isn’t missing as more customers take it for a spin.

1. Recall that Cloudera installations have 4 kinds of nodes. 3 are obvious:

  • Hadoop worker nodes.
  • Hadoop master nodes.
  • Nodes that run Cloudera Manager.

The fourth kind are edge/gateway nodes. Those handle connections to the outside world, and can also run selected third-party software. They also are where Cloudera Data Science Workbench lives.

2. One point of this architecture is to let each data scientist run the languages and tools of her choice. Docker isolation is supposed to make that practical and safe.

And so we have a case of the workbench metaphor actually being accurate! While a “workbench” is commonly just an integrated set of tools, in this case it’s also a place for you to use other tools your personally like and bring in.

Surely there are some restrictions as to which tools you can use, but I didn’t ask for those to be spelled out.

3. Matt kept talking about security, to an extent I recall in almost no other analytics-oriented briefing. This had several aspects.

  • As noted above, a lot of the hassle of Hadoop-based data science relates to security.
  • As also noted above, evading the hassle by extracting data is a huge security risk. (If you lose customer data, you’re going to have a very, very bad day.)
  • According to Matt, standard uses of notebook tools such as Jupyter or Zeppelin wind up having data stored wherever code is. Cloudera’s otherwise similar notebook-style interface evidently avoids that flaw. (Presumably, it you want to see the output, you rerun the script against the data store yourself.)

4. To a first approximation, the target users of Cloudera Data Science Workbench can be characterized the same way BI-oriented business analysts are. They’re people with:

  • Sufficiently good quantitative skills to do the analysis.
  • Sufficiently good computer skills to do SQL queries and so on, but not a lot more than that.

Of course, “sufficiently good quantitative skills” can mean something quite different in data science than it does for the glorified arithmetic of ordinary business intelligence.

5. Cloudera Data Science Workbench doesn’t have any special magic in parallelization. It just helps you access the parallelism that’s already out there. Some algorithms are easy to parallelize. Some libraries have parallelized a few algorithms beyond that. Otherwise, you’re on your own.

6. When I asked whether Cloudera Data Science Workbench was open source (like most of what Cloudera provides) or closed source (like Cloudera Manager), I didn’t get the clearest of answers. On the one hand, it’s a Cloudera-specific product, as the name suggests; on the other, it’s positioned as having been stitched together almost entirely from a collection of open source projects.

Categories: Other

Welcome to the new Fishbowl Solutions Blog

Out with the old and in with the new.  Welcome to the new home of the Fishbowl Solutions blog! Please enjoy upgraded functionality and integration with our website.  Check back often for new and exciting posts form our talented staff.  If you want automatic updates click the subscribe link to the right and be notified whenever a new post appears.

 

 

 

 

 

 

 

The post Welcome to the new Fishbowl Solutions Blog appeared first on Fishbowl Solutions.

Categories: Fusion Middleware, Other

Introduction to SequoiaDB and SequoiaCM

DBMS2 - Sun, 2017-03-12 13:19

For starters, let me say:

  • SequoiaDB, the company, is my client.
  • SequoiaDB, the product, is the main product of SequoiaDB, the company.
  • SequoiaDB, the company, has another product line SequoiaCM, which subsumes SequoiaDB in content management use cases.
  • SequoiaDB, the product, is fundamentally a JSON data store. But it has a relational front end …
  • … and is usually sold for RDBMS-like use cases …
  • … except when it is sold as part of SequoiaCM, which adds in a large object/block store and a content-management-oriented library.
  • SequoiaDB’s products are open source.
  • SequoiaDB’s largest installation seems to be 2 PB across 100 nodes; that includes block storage.
  • Figures for DBMS-only database sizes aren’t as clear, but the sweet spot of the cluster-size range for such use cases seems to be 6-30 nodes.

Also:

  • SequoiaDB, the company, was founded in Toronto, by former IBM DB2 folks.
  • Even so, it’s fairly accurate to view SequoiaDB as a Chinese company. Specifically:
    • SequoiaDB’s founders were Chinese nationals.
    • Most of them went back to China.
    • Other employees to date have been entirely Chinese.
    • Sales to date have been entirely in China, but SequoiaDB has international aspirations
  • SequoiaDB has >100 employees, a large majority of which are split fairly evenly between “engineering” and “implementation and technical support”.
  • SequoiaDB’s marketing (as opposed to sales) department is astonishingly tiny.
  • SequoiaDB cites >100 subscription customers, including 10 in the global Fortune 500, a large fraction of which are in the banking sector. (Other sectors mentioned repeatedly are government and telecom.)

Unfortunately, SequoiaDB has not captured a lot of detailed information about unpaid open source production usage.

While I usually think that the advantages of open source are overstated, in SequoiaDB’s case open source will have an additional benefit when SequoiaDB does go international — it addresses any concerns somebody might have about using Chinese technology.

SequoiaDB’s technology story starts:

  • SequoiaDB is a layered DBMS.
  • It manages JSON via update-in-place. MVCC (Multi-Version Concurrency Control) is on the roadmap.
  • Indexes are B-tree.
  • Transparent sharding and elasticity happen in what by now is the industry-standard/best-practices way:
    • There are many (typically 4096) logical partitions, many of which are assigned to each physical partition.
    • If the number of physical partitions changes, logical partitions are reassigned accordingly.
  • Relational OLTP (OnLine Transaction Processing) functionality is achieved by using a kind of PostgreSQL front end.
  • Relational batch processing is done via SparkSQL.
  • There also is a block/LOB (Large OBject) storage engine meant for content management applications.
  • SequoiaCM boils down technically to:
    • SequoiaDB, which is used to store JSON metadata about the LOBs …
    • … and whose generic-DBMS coordination capabilities are also used over the block/LOB engine.
    • A Java library focused on content management.

SequoiaDB’s relationship with PostgreSQL is complicated, but as best I understand SequoiaDB’s relational operations:

  • SQL parsing, optimization, and so on rely mainly on PostgreSQL code. (Of course, there are some hacks, such as to the optimizer’s cost functions.)
  • Actual data storage is done via SequoiaDB’s JSON store, using PostgreSQL Foreign Data Wrappers. Each record goes in a separate JSON document. Locks, commits and so on — i.e. “write prevention” :) — are handled by the JSON store.
  • PostgreSQL’s own storage engine is actually part of the stack, but only to manage temp space and the like.

PostgreSQL stored procedures are already in the SequoiaDB product. Triggers and referential integrity are not. Neither, so far as I can tell, are PostgreSQL’s datatype extensibility capabilities.

I neglected to ask how much of that remains true when SparkSQL is invoked.

SequoiaDB’s use cases to date seem to fall mainly into three groups:

  • Content management via SequoiaCM.
  • “Operational data lakes”.
  • Pretty generic replacement of legacy RDBMS.

Internet back-ends, however — and this is somewhat counter-intuitive for an open-source JSON store — are rare, at least among paying subscription customers. But SequoiaDB did tell me of one classic IoT (Internet of Things) application, with lots of devices “phoning home” and the results immediately feeding a JSON-based dashboard.

To understand SequoiaDB’s “operational data lake” story, it helps to understand the typical state of data warehousing at SequoiaDB’s customers and prospects, which isn’t great:

  • 2-3 years of data, and not all the data even from that time period.
  • Only enough processing power to support structured business intelligence …
  • … and hence little opportunity for ad-hoc query.

SequoiaDB operational data lakes offer multiple improvements over that scenario:

  • They hold as much relational data as customers choose to dump there.
  • That data can be simply copied from operational stores, with no transformation.
  • Or if data arrives via JSON — from external organizations or micro-services as the case may be — the JSON can be stored unmodified as well.
  • Queries can be run straight against this data soup.
  • Of course, views can also be set up in advance to help with querying.

Views are particularly useful with what might be called slowly changing schemas. (I didn’t check whether what SequoiaDB is talking about matches precisely with the more common term “slowly changing dimensions”.) Each time the schema changes, a new table is created in SequoiaDB to receive copies of the data. If one wants to query against the parts of the database structure that didn’t change — well, a view can be establish to allow for that.

Finally, it seems that SequoiaCM uses are concentrated in what might be called “security and checking-up” areas, such:

  • Photographs as part of an authentication process.
  • Video of in-person banking transactions, both for fraud prevention and for general service quality assurance.
  • Storage of security videos (for example from automated teller machines).

SequoiaCM deals seem to be bigger than other SequoiaDB ones, surely in part because the amounts of data managed are larger.

Categories: Other

One bit of news in Trump’s speech

DBMS2 - Tue, 2017-02-28 23:26

Donald Trump addressed Congress tonight. As may be seen by the transcript, his speech — while uncharacteristically sober — was largely vacuous.

That said, while Steve Bannon is firmly established as Trump’s puppet master, they don’t agree on quite everything, and one of the documented disagreements had been in their view of skilled, entrepreneurial founder-type immigrants: Bannon opposes them, but Trump has disagreed with his view. And as per the speech, Trump seems to be maintaining his disagreement.

At least, that seems implied by his call for “a merit-based immigration system.”

And by the way — Trump managed to give a whole speech without saying anything overtly racist. Indeed, he specifically decried the murder of an Indian-immigrant engineer. By Trump standards, that counts as a kind of progress.

Categories: Other

Coordination, the underused “C” word

DBMS2 - Tue, 2017-02-28 22:34

I’d like to argue that a single frame can be used to view a lot of the issues that we think about. Specifically, I’m referring to coordination, which I think is a clearer way of characterizing much of what we commonly call communication or collaboration.

It’s easy to argue that computing, to an overwhelming extent, is really about communication. Most obviously:

  • Data is constantly moving around — across wide area networks, across local networks, within individual boxes, or even within particular chips.
  • Many major developments are almost purely about communication. The most important computing device today may be a telephone. The World Wide Web is essentially a publishing platform. Social media are huge. Etc.

Indeed, it’s reasonable to claim:

  • When technology creates new information, it’s either analytics or just raw measurement.
  • Everything else is just moving information around, and that’s communication.

A little less obvious is the much of this communication could be alternatively described as coordination. Some communication has pure consumer value, such as when we talk/email/Facebook/Snapchat/FaceTime with loved ones. But much of the rest is for the purpose of coordinating business or technical processes.

Among the technical categories that boil down to coordination are:

  • Operating systems.
  • Anything to do with distributed computing.
  • Anything to do with system or cluster management.
  • Anything that’s called “collaboration”.

That’s a lot of the value in “platform” IT right there. 

Meanwhile, in pre-internet apps:

  • Some of the early IT wins were in pure accounting and information management. But a lot of the rest were in various forms of coordination, such as logistics and inventory management.
  • The glory days of enterprise apps really started with SAP’s emphasis on “business process'”. (“Business process reengineering” was also a major buzzword back in the day.)

This also all fits with the “route” part of my claim that “historically, application software has existed mainly to record and route information.”

And in the internet era:

  • “Sharing economy” companies, led by Uber and Airbnb, have created a lot more shareholder value than the most successful pure IT startups of the era.
  • Amazon, in e-commerce and cloud computing alike, has run some of the biggest coordination projects of all.

This all ties into one of the key underlying subjects to modern politics and economics, namely the future of work.

  • Globalization is enabled by IT’s ability to coordinate far-flung enterprises.
  • Large enterprises need fewer full-time employees when individual or smaller-enterprise contractors are easier to coordinate. (It’s been 30 years since I drew a paycheck from a company I didn’t own.)
  • And of course, many white collar jobs are being entirely automated away, especially those that can be stereotyped as “paper shuffling”.

By now, I hope it’s clear that “coordination” covers a whole lot of IT. So why do I think using a term with such broad application adds any clarity? I’ve already given some examples above, in that:

  • “Coordination” seems clearer than “communication” when characterizing the essence of distributed computing.
  • “Coordination” seems clearer than “communication” if we’re discussing the functioning of large enterprises or of large-enterprise-substitutes.

Further — even when we focus on the analytic realm, the emphasis on “coordination” has value. A big part of analytic value comes in determining when to do something. Specifically that arises when:

  • Analytics identifies a problem that just occurred, or is about to happen, allowing a timely fix.
  • Business intelligence is using for monitoring, of impending problems or otherwise, as a guide to when action is needed.
  • Logistics of any kind get optimized.

I’d also say that most recommendation/personalization fits into the “coordination” area, but that’s a bit more of a stretch; you’re welcome to disagree.

I do not claim that analytics’ value can be wholly captured by the “coordination” theme. Decisions about whether to do something major — or about what to do — are typically made by small numbers of people; they turn into major coordination exercises only after a project gets its green light. But such cases, while important, are pretty rare. For the most part, analytic results serve as inputs to business processes. And business processes, on the whole, typically have a lot to do with coordination.

Bottom line: Most of what’s valuable in IT relates to communication or coordination. Apparent counterexamples should be viewed with caution.

Related links

Categories: Other

There’s no escape from politics now

DBMS2 - Wed, 2017-02-01 23:31

The United States and consequently much of the world are in political uproar. Much of that is about very general and vital issues such as war, peace or the treatment of women. But quite a lot of it is to some extent tech-industry-specific. The purpose of this post is outline how and why that is.

For example:

  • There’s a worldwide backlash against “elites” — and tech industry folks are perceived as members of those elites.
  • That perception contains a lot of truth, and not just in terms of culture/education/geography. Indeed, it may even be a bit understated, because trends commonly blamed on “trade” or “globalization” often have their roots in technological advances.
  • There’s a worldwide trend towards authoritarianism. Surveillance/ privacy and censorship issues are strongly relevant to that trend.
  • Social media companies are up to their neck in political considerations.

Because they involve grave threats to liberty, I see surveillance/privacy as the biggest technology-specific policy issues in the United States. (In other countries, technology-driven censorship might loom larger yet.) My views on privacy and surveillance have long been:

  • Fixing the legal frameworks around information use is a difficult and necessary job. The tech community should be helping more than it is.
  • Until those legal frameworks are indeed cleaned up, the only responsible alternative is to foot-drag on data collection, on data retention, and on the provision of data to governmental agencies.

Given the recent election of a US president with strong authoritarian tendencies, that foot-dragging is much more important than it was before.

Other important areas of technology/policy overlap include:

  • The new head of the Federal Communications Commission is hostile to network neutrality. (Perhaps my compromise proposal for partial, market-based network neutrality should get another look some day.)
  • There’s a small silver lining in Trump’s attacks on free trade; the now-abandoned (at least by the US) Trans-Pacific Partnership had gone too far on “intellectual property” rights.
  • I’m a skeptic about software patents.
  • Government technology procurement processes have long been broken.
  • “Sharing economy” companies such as Uber and Airbnb face a ton of challenges in politics and regulation, often on a very local basis.

And just over the past few days, the technology industry has united in opposing the Trump/Bannon restrictions on valuable foreign visitors.

Tech in the wider world

Technology generally has a huge impact on the world. One political/economic way of viewing that is:

  • For a couple of centuries, technological advancement has:
    • Destroyed certain jobs.
    • Replaced them directly with a smaller number of better jobs.
    • Increased overall wealth, which hopefully leads to more, better jobs in total.
  • Over a similar period, improvements in transportation technology have moved work opportunities from richer countries to poorer areas (countries or colonies as the case may be). This started in farming and extraction, later expanded to manufacturing, and now includes “knowledge workers” as well.
  • Both of these trends are very strong in the current computer/internet era.
  • Many working- and middle-class people in richer countries now feel that these trends are leaving them worse off.
    • To some extent, they’re confusing correlation and causality. (The post-WW2 economic boom would have slowed no matter what.)
    • To some extent, they’re ignoring the benefits of technology in their day to day lives. (I groan when people get on the internet to proclaim that technology is something bad.)
    • To some extent, however, they are correct.

Further, technology is affecting how people relate to each other, in multiple ways.

  • This is obviously the case with respect to cell phones and social media.
  • Also, changes to the nature of work naturally lead to changes in the communities where the workers live.

For those of us with hermit-like tendencies or niche interests, that may all be a net positive. But others view these changes less favorably.

Summing up: Technology induces societal changes of such magnitudes as to naturally cause (negative) political reactions.

And in case you thought I was exaggerating the political threat to the tech industry …

… please consider the following quotes from Trump’s most powerful advisor, Steve Bannon:

The “progressive plutocrats in Silicon Valley,” Bannon said, want unlimited ability to go around the world and bring people back to the United States. “Engineering schools,” Bannon said, “are all full of people from South Asia, and East Asia. . . . They’ve come in here to take these jobs.” …

“Don’t we have a problem with legal immigration?” asked Bannon repeatedly.

“Twenty percent of this country is immigrants. Is that not the beating heart of this problem?”

Related links

I plan to keep updating the list of links at the bottom of my post Politics and policy in the age of Trump.

Categories: Other

Politics and policy in the age of Trump

DBMS2 - Wed, 2017-02-01 23:28

The United States presidency was recently assumed by an Orwellian lunatic.* Sadly, this is not an exaggeration. The dangers — both of authoritarianism and of general mis-governance — are massive. Everybody needs in some way to respond.

*”Orwellian lunatic” is by no means an oxymoron. Indeed, many of the most successful tyrants in modern history have been delusional; notable examples include Hitler, Stalin, Mao and, more recently, Erdogan. (By way of contrast, I view most other Soviet/Russian leaders and most jumped-up-colonel coup leaders as having been basically sane.)

There are many candidates for what to focus on, including:

  • Technology-specific issues — e.g. privacy/surveillance, network neutrality, etc.
  • Issues in which technology plays a large role — e.g. economic changes that affect many people’s employment possibilities.
  • Subjects that may not be tech-specific, but are certainly of great importance. The list of candidates here is almost endless, such as health care, denigration of women, maltreatment of immigrants, or the possible breakdown of the whole international order.

But please don’t just go on with your life and leave the politics to others. Those “others” you’d like to rely on haven’t been doing a very good job.

What I’ve chosen to do personally includes:

  • Get and stay current in my own knowledge. That’s of course a prerequisite for everything else.
  • Raise consciousness among my traditional audience. This post is an example. :)
  • Educate my traditional audience. Some of you are American, well-versed in history and traditional civics. Some of you are American, but not so well-versed. Some of you are from a broad variety of other countries. The sweet spot of my target is the smart, rational, not-so-well-versed Americans. But I hope others are interested as well.
  • Prepare for such time as nuanced policy analysis is again appropriate. In the past, I’ve tried to make thoughtful, balanced, compromise suggestions for handling thorny issues such as privacy/surveillance or network neutrality. In this time of crisis, people don’t care, and I don’t blame them at all. But hopefully this ill wind will pass, and serious policy-making will restart. When it does, we should be ready for it.
  • Support my family in whatever they choose to do. It’s a small family, but it includes some stars, more articulate and/or politically experienced than I am.

Your choices will surely differ (and later on I will offer suggestions as to what those choices might be). But if you take only one thing from this post and its hopefully many sequels, please take this: Ignoring politics is no longer a rational choice.

Related links

This is my first politics/policy-related post since the start of the Trump (or Trump/Bannon) Administration. I’ll keep a running guide to others here, and in the comments below.

  • The technology industry in particular is now up to its neck in politics. I gave quite a few examples to show why for tech folks there’s no escaping politics now.
  • Some former congressional staffers put out a great guide to influencing your legislators. It’s focused on social justice and anti-discrimination kinds of issues, but can probably be applied more broadly, e.g. to Senator Feinstein’s (D-Cal) involvement in overseeing the intelligence community.
Categories: Other

Introduction to Crate.io and CrateDB

DBMS2 - Sat, 2016-12-17 23:27

Crate.io and CrateDB basics include:

  • Crate.io makes CrateDB.
  • CrateDB is a quasi-RDBMS designed to receive sensor data and similar IoT (Internet of Things) inputs.
  • CrateDB’s creators were perhaps a little slow to realize that the “R” part was needed, but are playing catch-up in that regard.
  • Crate.io is an outfit founded by Austrian guys, headquartered in Berlin, that is turning into a San Francisco company.
  • Crate.io says it has 22 employees and 5 paying customers.
  • Crate.io cites bigger numbers than that for confirmed production users, clearly active clusters, and overall product downloads.

In essence, CrateDB is an open source and less mature alternative to MemSQL. The opportunity for MemSQL and CrateDB alike exists in part because analytic RDBMS vendors didn’t close it off.

CrateDB’s not-just-relational story starts:

  • A column can contain ordinary values (of usual-suspect datatypes) or “objects”, …
  • … where “objects” presumably are the kind of nested/hierarchical structures that are common in the NoSQL/internet-backend world, …
  • … except when they’re just BLOBs (Binary Large OBjects).
  • There’s a way to manually define “strict schemas” on the structured objects, and a syntax for navigating their structure in WHERE clauses.
  • There’s also a way to automagically infer “dynamic schemas”, but it’s simplistic enough to be more suitable for development/prototyping than for serious production.

Crate gave an example of data from >800 kinds of sensors being stored together in a single table. This leads to significant complexity in the FROM clauses. But querying the same data in a relational schema would be at least as complicated, and probably worse.

One key to understanding Crate’s architectural choices is to note that they’re willing to have different latency/consistency standards for:

  • Writes and single-row look-ups.
  • Aggregates and joins.

And so it makes sense that:

  • Data is banged into CrateDB in a NoSQL-ish kind of way as it arrives, with RYW consistency.
  • The indexes needed for SQL functionality are updated in microbatches as soon as possible thereafter. (Think 100 milliseconds as a base case.) Crate.io characterizes the consistency for this part as “eventual”.

CrateDB will never have real multi-statement transactions, but it has simpler levels of isolation that may be called “transactions” in some marketing contexts.

CrateDB technical highlights include:

  • CrateDB records are stored as JSON documents. (Actually, I didn’t ask whether this was true JSON or rather something “JSON-like”.)
    • In the purely relational case, the documents may be regarded as glorified text strings.
    • I got the impression that BLOB storage was somewhat separate from the rest.
  • CrateDB’s sharding story starts with consistent hashing.
    • Shards are physical-only. CrateDB lacks the elasticity-friendly feature of there being many logical shards for each physical shard.
    • However, you can change your shard count, and any future inserts will go into the new set of shards.
  • In line with its two consistency models, CrateDB also has two indexing strategies.
    • Single-row/primary-key lookups have a “forward lookup” index, whatever that is.
    • Tables also have a columnar index.
      • More complex queries and aggregations are commonly done straight against the columnar index, rather than the underlying data.
      • CrateDB’s principal columnar indexing strategy sounds a lot like inverted-list, which in turn is a lot like standard text indexing.
      • Specific datatypes — e.g. geospatial — can be indexed in different ways.
    • The columnar index is shard-specific, and located at the same node as the shard.
    • At least the hotter parts of the columnar index will commonly reside in memory. (I didn’t ask whether this was via straightforward caching or some more careful strategy.)
  • While I didn’t ask about CrateDB’s replication model in detail, I gathered that:
    • Data is written synchronously to all nodes. (That’s sort of implicit in RYW consistency anyway.)
    • Common replication factors are either 1 or 3, depending on considerations such as the value of the data. But as is usual, some tables can be replicated across all nodes.
    • Data can be read from all replicas, for obvious reasons of performance.
  • Where relevant — e.g. the wire protocol or various SQL syntax specifics — CrateDB tends to emulate Postgres.
  • The CrateDB stack includes Elasticsearch and Lucene, both of which make sense in connection with Crate’s text/document orientation.

Crate.io is proud of its distributed/parallel story.

  • Any CrateDB node can plan a query. Necessary metadata for that is replicated across the cluster.
  • Execution starts on a shard-by-shard basis. Data is sorted at each shard before being sent onward.
  • Crate.io encourages you to run Spark and CrateDB on the same nodes.
    • This is supported by parallel Spark-CrateDB integration of the obvious kind.
    • Crate.io notes a happy synergy to this plan, in that Spark stresses CPU while CrateDB is commonly I/O-bound.

The CrateDB-Spark integration was the only support I could find for various marketing claims about combining analytics with data management.

Given how small and young Crate.io is, there are of course many missing features in CrateDB. In particular:

  • A query can only reshuffle data once. Hence, CrateDB isn’t currently well-designed for queries that join more than 2 tables together.
  • The only join strategy currently implemented is nested loop. Others are in the future.
  • CrateDB has most of ANSI SQL 92, but little or nothing specific to SQL 99. In particular, SQL windowing is under development.
  • Geo-distribution is still under development (even though most CrateDB data isn’t actually about people).
  • I imagine CrateDB administrative tools are still rather primitive.

In any case, creating a robust DBMS is an expensive and time-consuming process. Crate has a long road ahead of it.

Categories: Other

Command Line and Vim Tips from a Java Programmer

I’m always interested in learning more about useful development tools. In college, most programmers get an intro to the Linux command line environment, but I wanted to share some commands I use daily that I’ve learned since graduation.

Being comfortable on the command line is a great skill to have when a customer is looking over your shoulder on a Webex. They could be watching a software demo or deployment to their environment. It can also be useful when learning a new code base or working with a product with a large, unfamiliar directory structure with lots of logs.

If you’re on Windows, you can use Cygwin to get a Unix-like CLI to make these commands available.

Useful Linux commands Find

The command find helps you find files by recursively searching subdirectories. Here are some examples:

find .
    Prints all files and directories under the current directory.

find . -name '*.log'
  Prints all files and directories that end in “.log”.

find /tmp -type f -name '*.log'
   Prints only files in the directory “/tmp” that end in “.log”.

find . -type d
   Prints only directories.

find . -maxdepth 2
     Prints all files and directories under the current directory, and subdirectories (but not sub-subdirectories).

find . -type f -exec ls -la {} \;
     The 
-exec
flag runs a command against each file instead of printing the name. In this example, it will run 
ls -la filename
  on each file under the current directory. The curly braces take the place of the filename.

Grep

The command grep lets you search text for lines that match a specific string. It can be helpful to add your initials to debug statements in your code and then grep for them to find them in the logs.

grep foo filename
  Prints each line in the file “filename” that matches the string “foo”.

grep foo\\\|bar filename
Grep supports regular expressions, so this prints each line in the file that matches “foo” or “bar”.

grep -i foo filename
  Add -i for case insensitive matching.

grep foo *
  Use the shell wildcard, an asterisk, to search all files in the current directory for the string “foo”.

grep -r foo *
  Recursively search all files and directories in the current directory for a string.

grep -rnH foo filename
  Add -n to print line numbers and -H to print the filename on each line.

find . -type f -name '*.log' -exec grep -nH foo {} \;
  Combining find and grep can let you easily search each file that matches a certain name for a string. This will print each line that matches “foo” along with the file name and line number in each file that ends in “.log” under the current directory.

ps -ef | grep processName
  The output of any command can be piped to grep, and the lines of STDOUT that match the expression will be printed. For example, you could use this to find the pid of a process with a known name.

cat file.txt | grep -v foo
  You can also use -v to print all lines that don’t match an expression.

Ln

The command ln lets you create links. I generally use this to create links in my home directory to quickly cd into long directory paths.

ln -s /some/really/long/path foo
  The -s is for symbolic, and the long path is the target. The output of
ls -la
 in this case would be
foo -> /some/really/long/path
 .

Bashrc

The Bashrc is a shell script that gets executed whenever Bash is started in an interactive terminal. It is located in your home directory,

~/.bashrc
 . It provides a place to edit your $PATH, $PS1, or add aliases and functions to simplify commonly used tasks.

Aliases are a way you can define your own command line commands. Here are a couple useful aliases I’ve added to my .bashrc that have saved a lot of keystrokes on a server where I’ve installed Oracle WebCenter:

WC_DOMAIN=/u01/oracle/fmw/user_projects/domains/wc_domain
alias assets="cd /var/www/html"
alias portalLogs="cd $WC_DOMAIN/servers/WC_Spaces/logs"
alias domain="cd $WC_DOMAIN"
alias components="cd $WC_DOMAIN/ucm/cs/custom"
alias rpl="portalLogs; vim -R WC_Spaces.out"

After making changes to your .bashrc, you can load them with

source ~/.bashrc
 . Now I can type
rpl
 , short for Read Portal Logs, from anywhere to quickly jump into the WebCenter portal log file.

alias grep=”grep --color”

This grep alias adds the –color option to all of my grep commands.  All of the above grep commands still work, but now all of the matches will be highlighted.

Vim

Knowing Vim key bindings can be convenient and efficient if you’re already working on the command line. Vim has many built-in shortcuts to make editing files quick and easy.

Run 

vim filename.txt
  to open a file in Vim. Vim starts in Normal Mode, where most characters have a special meeting, and typing a colon,
:
 , lets you run Vim commands. For example, typing 
Shift-G
  will jump to the end of the file, and typing
:q
 while in normal mode will quit Vim. Here is a list of useful commands:

:q
  Quits Vim

:w
  Write the file (save)

:wq
  Write and quit

:q!
  Quit and ignore warnings that you didn’t write the file

:wq!
  Write and quit, ignoring permission warnings

i
  Enter Insert Mode where you can edit the file like a normal text editor

a
  Enter Insert Mode and place the cursor after the current character

o
  Insert a blank line after the current line and enter Insert Mode

[escape]
  The escape button exits insert mode

:150
  Jump to line 150

shift-G
  Jump to the last line

gg
  Jump to the first line

/foo
  Search for the next occurrence of “foo”. Regex patterns work in the search.

?foo
  Search for the previous occurrence of “foo”

n
  Go to the next match

N
Go to the previous match

*
  Search for the next occurrence of the searched word under the cursor

#
  Search for the previous occurrence of the searched word under the cursor

w
  Jump to the next word

b
  Jump to the previous word

``
  Jump to the last action

dw
  Delete the word starting at the cursor

cw
  Delete the word starting at the cursor and enter insert mode

c$
  Delete everything from the cursor to the end of the line and enter insert mode

dd
  Delete the current line

D
  Delete everything from the cursor to the end of the line

u
  Undo the last action

ctrl-r
 
ctrl-r
  Redo the last action

d[up]
  Delete the current line and the line above it. “[up]” is for the up arrow.

d[down]
  Delete the current line and the line below it

d3[down]
  Delete the current line and the three lines below it

r[any character]
  Replace the character under the cursor with another character

~
  Toggle the case (upper or lower) of the character under the cursor

v
  Enter Visual Mode. Use the arrow keys to highlight text.

shift-V
  Enter Visual Mode and highlight whole lines at a time.

ctrl-v
  Enter Visual Mode but highlight blocks of characters.

=
  While in Visual Mode, = will auto format highlighted text.

c
  While in Visual Mode, c will cut the highlighted text.

y
  While in Visual Mode, y will yank (copy) the highlighted text.

p
  In Normal Mode, p will paste the text in the buffer (that’s been yanked or cut).

yw
  Yank the text from the cursor to the end of the current word.

:sort
  Highlight lines in Visual Mode, then use this command to sort them alphabetically.

:s/foo/bar/g
  Highlight lines in Visual Mode, then use search and replace to replace all instances of “foo” with “bar”.

:s/^/#/
  Highlight lines in Visual Mode, then add # at the start of each line. This is useful to comment out blocks of code.

:s/$/;/
Highlight lines in Visual Mode, then add a semicolon at the end of each line.

:set paste
  This will turn off auto indenting. Use it before pasting into Vim from outside the terminal (you’ll want to be in insert mode before you paste).

:set nopaste
  Make auto indenting return to normal.

:set nu
  Turn on line numbers.

:set nonu
  Turn off line numbers.

:r!pwd
  Read the output of a command into Vim. In this example, we’ll read in the current directory.

:r!sed -n 5,10p /path/to/file
  Read lines 5 through 10 from another file in Vim. This can be a good way to copy and paste between files in the terminal.

:[up|down]
  Type a colon and then use the arrow keys to browse through your command history. If you type letters after the colon, it will only go through commands that matched that. (i.e., :se  and then up would help find to “:set paste” quickly).

Vimrc

The Vimrc is a configuration file that Vim loads whenever it starts up, similar to the Bashrc. It is in your home directory.

Here is a basic Vimrc I’d recommend for getting started if you don’t have one already. Run

vim ~/.vimrc
and paste in the following:

set backspace=2         " backspace in insert mode works like normal editor
syntax on               " syntax highlighting
filetype indent on      " activates indenting for files
set autoindent          " auto indenting
set number              " line numbers
colorscheme desert      " colorscheme desert
set listchars=tab:>-,trail:.,extends:>,precedes:<
set list                " Set up whitespace characters
set ic                  " Ignore case by default in searches
set statusline+=%F      " Show the full path to the file
set laststatus=2        " Make the status line always visible

 

Perl

Perl comes installed by default on Linux, so it is worth mentioning that it has some extensive command line capabilities. If you have ever tried to grep for a string that matches a line in a minified Javascript file, you can probably see the benefit of being able to filter out lines longer than 500 characters.

grep -r foo * | perl -nle'print if 500 > length'

Conclusion

I love learning the tools that are available in my development environment, and it is exciting to see how they can help customers as well.

Recently, I was working with a customer and we were running into SSL issues. Java processes can be run with the option 

-Djavax.net.ssl.trustStore=/path/to/trustStore.jks
  to specify which keystore to use for SSL certificates. It was really easy to run
ps -ef | grep trustStore
to quickly identify which keystore we needed to import certificates into.

I’ve also been able to use various find and grep commands to search through unfamiliar directories after exporting metadata from Oracle’s MDS Repository.

Even if you aren’t on the command line, I’d encourage everyone to learn something new about their development environment. Feel free to share your favorite Vim and command line tips in the comments!

Further reading

http://www.vim.org/docs.php

https://www.gnu.org/software/bash/manual/bash.html

http://perldoc.perl.org/perlrun.html

The post Command Line and Vim Tips from a Java Programmer appeared first on Fishbowl Solutions' C4 Blog.

Categories: Fusion Middleware, Other

Webinar Recording: Ryan Companies Leverages Fishbowl’s ControlCenter for Oracle WebCenter to Enhance Document Control Leading to Improved Knowledge Management

On Thursday, December 8th, Fishbowl had the privilege of presenting a webinar with Mike Ernst – VP of Contruction Operations – at Ryan Companies regarding their use case for Fishbowl’s ControlCenter product for controlled document management. Mike was joined by Fishbowl’s ControlCenter product manager, Kim Negaard, who provided an overview of how the solution was implemented and how it is being used at Ryan.

Ryan Companies had been using Oracle WebCenter for many years, but they were looking for some additional document management functionality and a more intuitive interface to help improve knowledge management at the company. Their main initiative was to make it easier for users to access and manage their corporate knowledge documents (policies and procedures), manuals (safety), and real estate documents (leases) throughout each document’s life cycle.

Mike provided some interesting stats that factored into their decision to implement ControlCenter for WebCenter:

  • $16k – the average cost of “reinventing” procedures per project (ex. checklists and templates)
  • $25k – the average cost of estimating incorrect labor rates
  • 3x – salary to onboard someone new when an employee leaves the company

To hear more about how Ryan found knowledge management success with ControlCenter for WebCenter, watch the webinar recording: https://youtu.be/_NNFRV1LPaY

The post Webinar Recording: Ryan Companies Leverages Fishbowl’s ControlCenter for Oracle WebCenter to Enhance Document Control Leading to Improved Knowledge Management appeared first on Fishbowl Solutions' C4 Blog.

Categories: Fusion Middleware, Other

Pages

Subscribe to Oracle FAQ aggregator - Other