DBA Blogs
Opposite of PL/SQL package init method
Pipelined function in APEX with poor performance
Oracle Dev Gym Databases for Developers certificate URL
Cannot convert working PL/SQL function with INTERVAL parameters to equivalent SQL MACRO ( neither TABLE not SCALAR )
Set operators like UNION, MINUS do not sort data anymore in Oracle 21c and 23ai
GitHub Codespaces: A Fast-Track to Development with Minimal Setup
As developer, I often spend a considerable amount of time setting up development environments and the project scaffolding before I even write a single line of code. Configuring dependencies, installing tools, and making sure everything runs smoothly across different machines can be tedious. IF you find this prep work time consuming and constraining then...
Enter GitHub Codespaces
GitHub Codespaces is cloud-based development environment that allows you to start coding instantly without the hassle of setting up a local machine on your browser!
Whether you’re working on an open-source project, collaborating with a team, or quickly prototyping an idea, Codespaces provides a streamlined workflow with minimal scaffolding.
Why GitHub Codespaces?-
Instant Development Environments
With a few clicks, you get a fully configured development environment in the cloud. No need to install dependencies manually—just launch a Codespace, and it’s ready to go. -
Pre-configured for Your Project
Codespaces can use Dev Containers (.devcontainer.json
) to define dependencies, extensions, and runtime settings. This means every team member gets an identical setup, reducing "works on my machine" issues. -
Seamless GitHub Integration
Since Codespaces runs directly on GitHub, pushing, pulling, and collaborating on repositories is effortless. No need to clone and configure repositories locally. -
Access from Anywhere
You can code from a browser, VSCode desktop, or even an iPad, making it an excellent option for developers who switch devices frequently. -
Powerful Compute Resources
Codespaces provides scalable cloud infrastructure, so even resource-intensive projects can run smoothly without overloading your local machine.
Imagine you’re starting a new Streamlit project on their community. Normally, you’d:
- Install Streamlit and other packages
- Set up a virtual environment
- Configure dependencies
- Ensure all team members have the same setup
With GitHub Codespaces, you can define everything in a requirements.txt and .devcontainer.json
file and launch your environment in seconds. No more worrying about mismatched Python versions or missing dependencies—just open a browser and start coding.
See below how I obtained this coding environment to built a Weather Streamlit app quickly and for FREE using the Streamlit community Cloud
GitHub Codespaces is a game-changer for modern development I think. It eliminates the friction of setting up environments, making collaboration effortless and speeding up development cycles. If you haven’t tried it yet, spin up a Codespace for your next project—you might never go back to traditional setups on your laptop anymore.
Have you used GitHub Codespaces? Let me know your experience in the comments!
AI Web Scraping for Free with DeepSeek R1 Locally with Crawl4AI and Ollama
This video shows how to do AI web-scraping with DeepSeek R1 locally with Ollama and Crawl4AI easily.
Code:
How does the data structure of a compound index in Oracle look like?
Capture user login logout in oracle database
Use UTL_SMTP signed email
Domains in 23ai Schema Development
I had heard of Data Use Case Domains in 23ai. However, the New Features nd Database Concepts documentation didn't provide enough examples for me to build on.
However, this blog post by Ulrike Schwinn (which was shared by @thatjeffsmith on X) helped me explore domains.
In this demo, I am using the Pre-Seeded Domains. However, you can see the example posted by Ulrike Schwimm or even read in the Database Concepts documentation to help build your own custom Domains.
A Data Use Case Domain is like defining a Custom DataType such that only valid values are permitted. The Domain name can be a self-identifier (just as "DATE" or "NUMBER" identifies the type of data being stored).
Here is my demonstration (I also use the Annotations feature -- the Data Use Case Domains documentation links above also lead to this feature)
SQL> set pages600 linesize 132 SQL> col contact_person format a32 SQL> col contact_email format a24 SQL> SQL> drop table forex_rates_contacts; Table dropped. SQL> SQL> SQL> create table forex_rates_contacts 2 ( 3 country_iso_code varchar2(3) domain country_code_d, -- preseeded SYS domain 4 currency_code varchar2(3) domain currency_code_d, -- preseeded SYS domain 5 contact_person varchar2(128), 6 contact_email varchar2(4000) domain email_d -- preseed SYS domain 7 ) 8 annotations (display 'Forex Contact Persons') 9 / Table created. SQL> SQL> desc forex_rates_contacts Name Null? Type ------------------------------------------------------------------------ -------- ------------------------------------------------- COUNTRY_ISO_CODE VARCHAR2(3) SYS.COUNTRY_CODE_D CURRENCY_CODE VARCHAR2(3) SYS.CURRENCY_CODE_D CONTACT_PERSON VARCHAR2(128) CONTACT_EMAIL VARCHAR2(4000) SYS.EMAIL_D SQL> SQL> SQL> set long 1000 SQL> set longc 1000 SQL> set serveroutput on SQL> SQL> rem FROM clause is no longer required in 23ai SQL> select dbms_metadata.get_ddl('TABLE','FOREX_RATES_CONTACTS','HEMANT'); DBMS_METADATA.GET_DDL('TABLE','FOREX_RATES_CONTACTS','HEMANT') ------------------------------------------------------------------------------------------------------------------------------------ CREATE TABLE "HEMANT"."FOREX_RATES_CONTACTS" ( "COUNTRY_ISO_CODE" VARCHAR2(3) DOMAIN "SYS"."COUNTRY_CODE_D", "CURRENCY_CODE" VARCHAR2(3) DOMAIN "SYS"."CURRENCY_CODE_D", "CONTACT_PERSON" VARCHAR2(128), "CONTACT_EMAIL" VARCHAR2(4000) DOMAIN "SYS"."EMAIL_D" ) SEGMENT CREATION DEFERRED PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING TABLESPACE "USERS" ANNOTATIONS("DISPLAY" 'Forex Contact Persons') SQL> SQL> SQL> SQL> rem MULTI-ROW Insert SQL> insert into forex_rates_contacts 2 values 3 ('US','USD','Mr Unknown','unknown@nowhere.gov'), 4 ('IN','INR','Someone at RBI','someone@rbi.gov.in') 5 / 2 rows created. SQL> SQL> commit; Commit complete. SQL> SQL> select * from forex_rates_contacts 2 order by country_iso_code 3 / COU CUR CONTACT_PERSON CONTACT_EMAIL --- --- -------------------------------- ------------------------ IN INR Someone at RBI someone@rbi.gov.in US USD Mr Unknown unknown@nowhere.gov SQL> SQL> -- Note that the country_code_d and currency_code_d do not check validity against really ISO codes SQL> -- thus, it does not disallow "ZZ" and "ZZZ" SQL> insert into forex_rates_contacts 2 values 3 ('ZZ','ZZZ','Mr Unknown','unknown@nowhere.zz') 4 / 1 row created. SQL> SQL> select * from forex_rates_contacts 2 order by country_iso_code 3 / COU CUR CONTACT_PERSON CONTACT_EMAIL --- --- -------------------------------- ------------------------ IN INR Someone at RBI someone@rbi.gov.in US USD Mr Unknown unknown@nowhere.gov ZZ ZZZ Mr Unknown unknown@nowhere.zz SQL> SQL> commit; Commit complete. SQL> SQL> -- But the rules for email validation are encoded SQL> insert into forex_rates_contacts 2 values 3 ('UK','GBP','Mr Someone','someone@x') 4 / insert into forex_rates_contacts * ERROR at line 1: ORA-11534: check constraint (HEMANT.SYS_C0013464) involving column CONTACT_EMAIL due to domain constraint SYS.SYS_DOMAIN_C0030 of domain SYS.EMAIL_D violated Help: https://docs.oracle.com/error-help/db/ora-11534/ SQL> SQL> select * from forex_rates_contacts 2 order by country_iso_code 3 / COU CUR CONTACT_PERSON CONTACT_EMAIL --- --- -------------------------------- ------------------------ IN INR Someone at RBI someone@rbi.gov.in US USD Mr Unknown unknown@nowhere.gov ZZ ZZZ Mr Unknown unknown@nowhere.zz SQL> SQL> spool off
I haven't added my own custom Domains but used the PreSeeded domains for Country, Currency and Email. Look at "10.1.12 Built-In Use Case Domains" in the documentation.
How-To Speedup Initial Load from Oracle to SQL Server with Oracle GoldenGate 23ai
Have you ever stared at your computer screen, watching the hours tick by as you wait for a massive data transfer to complete? If you're using Oracle GoldenGate to move data from an Oracle database to SQL Server, you might know the feeling all too well—especially when you're dealing with something like 200 million rows that take way too long. Let’s dive into how Oracle GoldenGate works and some simple tricks to speed up that initial load time.
What is Oracle GoldenGate?
First things first—what is Oracle GoldenGate? In simple terms, it's like a super-efficient courier service that gets your data from one place to another. It specializes in real-time data integration and replication, making sure every bit of information moves swiftly and accurately from your Oracle database to whatever destination you've chosen, like SQL Server, in this case. It's especially handy for businesses that need their data synchronized quickly and continuously.
Making the Initial Load Faster
Now, onto the good part: how can you speed up the initial load that seems to take forever? The good news is that there are several strategies you can use to make the process more efficient.
Using the BATCHSQL parameter is one approach. This allows you to bundle multiple SQL insert statements together, reducing the overall time spent on these operations. Creating a unique index on your target SQL Server for inserts is another useful tip. This helps your database manage the incoming data more efficiently, cutting down on the time it takes to sort and place the records.
Splitting your data into smaller batches is also a great way to speed things up. Instead of overwhelming your system with 200 million rows all at once, use the Range function or Column Filters to divide the data into more manageable chunks. This approach is especially useful for large tables, as it allows you to tackle the data in sections rather than trying to handle it all at once.
Finally, if you're working with multiple database instances that are all on the same version, consider creating multiple extracts that can connect to these different instances. This allows you to distribute the workload across various sources, which can significantly speed up the entire process.
By implementing these strategies, you can make your initial load process faster, smoother, and less stressful. Whether you're dealing with 200 million rows or even more, these tips can help you get the job done more efficiently.
How-To Fix Poco::IOException Error During Goldengate 23ai Upgrade
I recently upgraded Oracle GoldenGate 23ai. In this post, I'll share my experience and provide a step-by-step guide on how to upgrade Oracle GoldenGate 23ai using the GUI and especially a weird error which I received during this upgrade which is as follows:
Error:
If you want TLDR; then I had to apply patch 27788241 to get this resolved. For details, keep reading on.
To start the upgrade, I downloaded the latest Oracle GoldenGate 23ai software from the Oracle Technology Network or eDelivery. Then, I moved the software to a staging folder and unzipped it.
For Linux users, the commands are:
Next, I uploaded the Oracle GoldenGate 23ai software to a staging location on the server where the previous release of Oracle GoldenGate existed.
Upgrading the Service Manager
After installing the latest Oracle GoldenGate 23ai version, I upgraded the Service Manager. I logged into the Service Manager using the URL: https://hostname:servicemanager_port.
From the Service Manager Overview page, I selected the ServiceManager link in the Deployments section. Then, I clicked the pencil icon next to the Deployment Detail section to open the dialog box for editing the GoldenGate home path.
I updated the GoldenGate Home path with the full path to the new Oracle GoldenGate 23ai home and clicked Apply. Finally, I restarted the Service Manager using the Action dropdown.
Upgrading the Deployment
To upgrade the deployment, I stopped all Extract and Replicat processes. I checked for open transactions and Bounded Recovery.
Then, I updated the deployment with the location of the new Oracle GoldenGate 23ai Home directory. I edited the deployment details and updated the Oracle GoldenGate 23ai Home path.
Resolving the Error
During the upgrade, I got this error:
After researching, I found that applying patch 27788241 fixed the issue. I applied the patch, and the upgrade completed successfully.
Hope this helps.
Plan substitution for newly generated sql_id
DeepSeek R1 Coding Examples - Easy Tutorial for Beginners and Experts
23ai New Feature : Partition HIGH_VALUE in JSON format
A quick demonstration of the new HIGH_VALUE_JSON column in the USER_TAB_PARTITIONS view in 23ai :
[oracle@localhost Hemant]$ sqlplus hemant/hemant@freepdb1 SQL*Plus: Release 23.0.0.0.0 - for Oracle Cloud and Engineered Systems on Sun Jan 26 10:07:09 2025 Version 23.6.0.24.10 Copyright (c) 1982, 2024, Oracle. All rights reserved. Last Successful login time: Sun Jan 26 2025 10:05:18 +00:00 Connected to: Oracle Database 23ai Free Release 23.0.0.0.0 - Develop, Learn, and Run for Free Version 23.6.0.24.10 SQL> @demo_part_high_value.sql SQL> set pages600 linesize 132 SQL> set long 10000 SQL> SQL> spool demo_part_high_value SQL> SQL> -- 23ai has two new columns in the USER_TAB_PARTTIIONS view SQL> -- HIGH_VALUE_CLOB and HIGH_VALUE_JSON SQL> --- unlike HIGH_VALUE which is a LONG, these two can be used programmatically SQL> -- here I show HIGH_VALUE_JSON along with the HIGH_VALUE SQL> SQL> set pages600 linesize 132 SQL> set long 10000 SQL> col partition_name format a8 hea 'P_Name' SQL> col high_value format a56 trunc hea 'High_Value_LONG' trunc SQL> col high_value_json format a48 hea 'High_Value_JSON' SQL> SQL> SQL> drop table hkc_test_intvl; Table dropped. SQL> SQL> create table hkc_test_intvl 2 (date_column date, 3 data_column varchar2(50)) 4 partition by range (date_column) 5 interval (numtoyminterval(1,'MONTH')) 6 (partition P_1 values less than (to_date('01-FEB-2024','DD-MON-YYYY'))) 7 / Table created. SQL> SQL> SQL> insert into hkc_Test_intvl 2 values (to_date('15-AUG-2024','DD-MON-YYYY'), 'August Row') 3 / 1 row created. SQL> SQL> SQL> insert into hkc_test_intvl 2 values (to_date('15-OCT-2024','DD-MON-YYYY'),'October Row') 3 / 1 row created. SQL> SQL> insert into hkc_test_intvl 2 values (to_date('15-DEC-2024','DD-MON-YYYY'),'December Row') 3 / 1 row created. SQL> SQL> select partition_name, high_value, high_value_json 2 from user_tab_partitions 3 where table_name = 'HKC_TEST_INTVL' 4 / P_Name High_Value_LONG High_Value_JSON -------- -------------------------------------------------------- ------------------------------------------------ P_1 TO_DATE(' 2024-02-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS' {"high_value":"2024-02-01T00:00:00"} SYS_P447 TO_DATE(' 2024-09-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS' {"high_value":"2024-09-01T00:00:00"} SYS_P448 TO_DATE(' 2024-11-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS' {"high_value":"2024-11-01T00:00:00"} SYS_P449 TO_DATE(' 2025-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS' {"high_value":"2025-01-01T00:00:00"} SQL> SQL> SQL> spool off
Earlier, HIGH_VALUE was presented as a LONG.
Get PL/SQL Code coverage data in Function/Procedure level
How does sqlplus in 19c RAC find the spfile?
Matplotlib Oracle Example
Everyone who has heard the old saying “a picture is worth a thousand words” appreciates its simple wisdom. With Oracle databases you have situations where a graph of the output of a SQL query is easier to understand than the standard text output. It’s helpful to have a simple way to graph Oracle data, and Python has widely used libraries that make it easy.
This post describes a Python script that graphs data from an Oracle database using the Matplotlib graphics library. The script uses three widely used Python libraries: cx_Oracle, NumPy, and Matplotlib. This post provides a simple and easily understood example that can be reused whenever someone needs to graph Oracle data. It is written as a straight-line program without any functions or error handling to keep it as short and readable as possible. It demonstrates the pattern of cx_Oracle -> NumPy -> Matplotlib and the use of Matplotlib’s object-oriented approach.
Here is an example graph:

The script graphs database server percent CPU used versus a combination of the day of week and the hour of the day to see if there is any pattern of CPU usage throughout a typical week. This graph has about 6 weeks of hourly AWR snapshots graphed in a scatter plot with CPU percentage on the Y axis and (24 * day of week) + hour of day as the X axis. You could think of the X axis as the hour of the week. This graph might be helpful in performance tuning because it shows whether CPU usage follows a weekly pattern.
Here is the current version of the script: scatter.py.
The script has three main parts which correspond to the three non-internal Python libraries that I use:
- cx_Oracle – Query the CPU data from an Oracle database
- NumPy – Massage query data to get it ready to be graphed
- Matplotlib – Graph the data
These libraries all have lots of great documentation, but Matplotlib’s documentation is confusing at first. At least it was for me. Here are three useful links:
Quick start – This is a great overview. The picture of the “Parts of a Figure” is helpful. I don’t know if earlier versions of Matplotlib had this picture.
Axes – This is a nice list of all the methods of an Axes object. Most of the code in the example script involves calling these methods. I have trouble finding these methods using a Google search, so I bookmarked this link.
Figure – The example script does not call any Figure object methods, but I wanted to document where to find them here. I bookmarked this URL as well as the Axes one because a Matplotlib graph is composed of at least one Figure and Axes object. With the Quick start link and these two lists of methods you have all you need to write Matplotlib scripts.
cx_OracleThe query for this graph pulls operating system CPU metrics from the DBA_HIST_OSSTAT view and uses them to calculate the percent of the time the CPU is busy. It is made of two subqueries in a with statement and the final main query.
with
myoscpu as
(select
busy_v.SNAP_ID,
busy_v.VALUE BUSY_TIME,
idle_v.VALUE IDLE_TIME
from
DBA_HIST_OSSTAT busy_v,
DBA_HIST_OSSTAT idle_v
where
busy_v.SNAP_ID = idle_v.SNAP_ID AND
busy_v.DBID = idle_v.DBID AND
busy_v.INSTANCE_NUMBER = idle_v.INSTANCE_NUMBER AND
busy_v.STAT_NAME = 'BUSY_TIME' AND
idle_v.STAT_NAME = 'IDLE_TIME'),
The myoscpu subquery pulls the CPU busy and idle times from the view along with the snapshot id. I think these are totals since the database last came up, so you have to take the difference between their values at two different points in time to get the CPU usage for that time.
myoscpudiff as
(select
after.SNAP_ID,
(after.BUSY_TIME - before.BUSY_TIME) BUSY_TIME,
(after.IDLE_TIME - before.IDLE_TIME) IDLE_TIME
from
myoscpu before,
myoscpu after
where before.SNAP_ID + 1 = after.SNAP_ID
order by before.SNAP_ID)
The myoscpudiff subquery gets the change in busy and idle time between two snapshots. It is built on myoscpu. My assumption is that the snapshots are an hour apart which is the case on the databases I work with.
select
to_number(to_char(sn.END_INTERVAL_TIME,'D')) day_of_week,
to_number(to_char(sn.END_INTERVAL_TIME,'HH24')) hour_of_day,
100*BUSY_TIME/(BUSY_TIME+IDLE_TIME) pct_busy
from
myoscpudiff my,
DBA_HIST_SNAPSHOT sn
where
my.SNAP_ID = sn.SNAP_ID
order by my.SNAP_ID
The final query builds on myoscpudiff to give you the day of the week which ranges from 1 to 7 which is Sunday to Saturday, the hour of the day which ranges from 0 to 23, and the cpu percent busy which ranges from 0 to 100.
import cx_Oracle
...
# run query retrieve all rows
connect_string = username+'/'+password+'@'+database
con = cx_Oracle.connect(connect_string)
cur = con.cursor()
cur.execute(query)
# returned is a list of tuples
# with int and float columns
# day of week,hour of day, and cpu percent
returned = cur.fetchall()
...
cur.close()
con.close()
The cx_Oracle calls are simple database functions. You connect to the database, get a cursor, execute the query and then fetch all the returned rows. Lastly you close the cursor and connection.
print("Data type of returned rows and one row")
print(type(returned))
print(type(returned[0]))
print("Length of list and tuple")
print(len(returned))
print(len(returned[0]))
print("Data types of day of week, hour of day, and cpu percent")
print(type(returned[0][0]))
print(type(returned[0][1]))
print(type(returned[0][2]))
I put in these print statements to show what the data that is returned from fetchall() is like. I want to compare this later to NumPy’s version of the same data. Here is the output:
Data type of returned rows and one row
<class 'list'>
<class 'tuple'>
Length of list and tuple
1024
3
Data types of day of week, hour of day, and cpu percent
<class 'int'>
<class 'int'>
<class 'float'>
The data returned by fetchall() is a regular Python list and each element of that list is a standard Python tuple. The list is 1024 elements long because I have that many snapshots. I have 6 weeks of hourly snapshots. Should be about 6*7*24 = 1008. The tuples have three elements, and they are normal Python number types – int and float. So, cx_Oracle returns database data in standard Python data types – list, tuple, int, float.
So, we are done with cx_Oracle. We pulled in the database metric that we want to graph versus day and hour and now we need to get it ready to put into Matplotlib.
NumPyNumPy can do efficient manipulation of arrays of data. The main NumPy type, a ndarray, is a multi-dimensional array and there is a lot of things you can do with your data once it is in an ndarray. You could do the equivalent with Python lists and for loops but a NumPy ndarray is much faster with large amounts of data.
import numpy as np
...
# change into numpy array and switch columns
# and rows so there are three rows and many columns
# instead of many rows and three columns
dataarray = np.array(returned).transpose()
# dataarray[0] is day of week
# dataarray[1] is hour of day
# dataarray[2] is cpu percent
The function np.array() converts the list of tuples into a ndarray. The function transpose() switches the rows and columns so we now have 3 rows of data that are 1024 columns long whereas before we had 1024 list elements with size 3 tuples.
I added print statements to show the new types and numbers.
print("Shape of numpy array after converting returned data and transposing rows and columns")
print(dataarray.shape)
print("Data type of transposed and converted database data and of the first row of that data")
print(type(dataarray))
print(type(dataarray[0]))
print("Data type of the first element of each of the three transposed rows.")
print(type(dataarray[0][0]))
print(type(dataarray[1][0]))
print(type(dataarray[2][0]))
Here is its output:
Shape of numpy array after converting returned data and transposing rows and columns
(3, 1024)
Data type of transposed and converted database data and of the first row of that data
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
Data type of the first element of each of the three transposed rows.
<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.float64'>
The shape of a ndarray shows the size of each of its dimensions. In this case it is 3 rows 1024 columns as I said. Note that the overall dataarray is a ndarray and any given row is also. So, list and tuple types are replaced by ndarray types. Also, NumPy has its own number types such as numpy.float64 instead of the built in int and float.
Now that our CPU data is in a NumPy array we can easly massage it to the form needed to plot points on our graph.
# do 24 * day of week + hour of day as x axis
xarray = (dataarray[0] * 24) + dataarray[1]
# pull cpu percentage into its own array
yarray = dataarray[2]
My idea for the graph is to combine the day of week and hour of day into the x axis by multiplying day of week by 24 and adding hour of the day to basically get the hours of the week from Sunday midnight to Saturday 11 pm or something like that. The nice thing about NumPy is that you can multiply 24 by the entire row of days of the week and add the entire row of hour of the day all in one statement. xarray is calculated in one line rather than writing a loop and it is done efficiently.
Here are some print statements and their output:
print("Shape of numpy x and y arrays")
print(xarray.shape)
print(yarray.shape)
Shape of numpy x and y arrays
(1024,)
(1024,)
Now we have two length 1024 ndarrays representing the x and y values of the points that we want to plot.
So, we have used NumPy to get the data that we pulled from our Oracle database using cx_Oracle into a form that is ready to be graphed. Matplotlib works closely with NumPy and NumPy has some nice features for manipulating arrays of numbers.
MatplotlibNow we get to the main thing I want to talk about, which is Matplotlib. Hopefully this is a clean and straightforward example of its use.
import matplotlib.pyplot as plt
...
# get figure and axes
fig, ax = plt.subplots()
First step is to create a figure and axes. A figure is essentially the entire window, and an axes object is an x and y axis that you can graph on. You can have multiple axes on a figure, but for a simple graph like this, you have one figure, and one axes.
# point_size is size of points on the graph
point_size = 5.0
...
# graph the points setting them all to one size
ax.scatter(xarray, yarray, s=point_size)
This actually graphs the points. A scatter plot just puts a circle (or other shape) at the x and y coordinates in the two arrays. I set all points to a certain size and figured out what size circle would look best by trying different values. Note that scatter() is a method of the Axes type object ax.
# add title
ax.set_title(database+" database CPU by day of week and hour of day")
# label the x and y axes
ax.set_xlabel("24 * Day of week (1-7) + Hour of day (0-23)")
ax.set_ylabel("CPU percent used")
More methods on ax. Sets title on top center of graph. Puts labels that describe the x axis and the y axis.
# add vertical red lines for days
for day_of_week in range(8):
ax.axvline(x=(day_of_week+1)*24, color='red', linestyle='--',linewidth=1.0)
The previous lines are all I really needed to make the graph. But then I thought about making it more readable. As I said before the X axis is basically the hour of the week ranging from 24 to 191. But I thought some red lines marking the beginning and end of each day would make it more readable. This puts 8 lines at locations 24, 48,…,192. I set the linewidth to 1.0 and used the dashes line style to try to keep it from covering up the points. I think axvline means vertical line on Axes object.
# Calculate the y-coordinate for day names
# It should be a fraction of the range between the minimum and maximum Y values
# positioned below the lower bound of the graph.
# The minimum and maximum CPU varies depending on the load on the queried database.
lower_bound = ax.get_ylim()[0]
upper_bound = ax.get_ylim()[1]
yrange = upper_bound - lower_bound
fraction = .025
y_coord = lower_bound - (fraction * yrange)
xloc = 36
for day in ['Sun','Mon','Tue','Wed','Thu','Fri','Sat']:
ax.text(xloc, y_coord, day, fontsize=8, color='red', ha='center',fontweight='ultralight')
xloc += 24
I kept messing with the script to try to make it better. I didn’t want to make it too complicated because I wanted to use it as an example in a blog post. But then again, this code shows some of the kinds of details that you can get into. The text() method of ax just puts some text on the graph. I made it red like the dashed lines and tried to make the letters light so they wouldn’t obscure the main parts of the graph. The x coordinates were just the center of the word and essentially the middle of the day. The first day starts at x=24 so 12 hours later or x=36 would be halfway through the day, approximately. I just had a list of the three-character day names and looped through them bumping the x location up by 24 hours for each day.
But the y coordinate was more complicated. I started out just choosing a fixed location for y like -5. For one database this worked fine. Then I tried another database, and it was way off. The reason is that Matplotlib scales the y coordinates based on the graphed data. If your database’s cpu is always around 30% then the range of visible y coordinates will be close to that. If your database’s cpu varies widely from 0% to 100% then Matplotlib will set the scale wide enough so the entire range 0 to 100 is visible. So, to put the text where I wanted it, just below the y axis line, I needed to make it a percentage of the visible y range below the lowest visible value. The get_ylim() method shows the calculated lower and upper bounds of the y axis which were calculated based on the y values of the graphed points. I manually messed with the value for the variable fraction until it looked right on the screen. Then I ran the script with a variety of databases to make sure it looked right on all of them.
# show graph
plt.show()
Lastly you just show the graph. Note that like the subplots() call this is not a method of an axes or figure object but just a matplotlib.pyplot call. Everything else in this example is a call to a method of the ax Axes type object.
ConclusionThis post shows how to graph Oracle database data using Python libraries cx_Oracle, NumPy, and Matplotlib. It first shows how to pull Oracle data into Python’s native data structures like lists and tuples. Then it shows how to convert the data into NumPy’s ndarrays and manipulate the data so that it can be graphed. Lastly it shows how to use Matplotlib Axes object methods to graph the data and add useful elements to the graph such as labels, vertical lines, and text.
This is a simple example, and all the software involved is free, open-source, widely used, easy to install, and well-documented. Give it a try!
how get width and height image in BLOB
Pages
