code-for-a-living June 30, 2021

Using collections to make your SQL access easier and more efficient

To overcome SQL limitations, many companies introduced programming languages for building applications that interface with SQL databases—for example PL/SQL from Oracle. PL/SQL applications can be inefficient. But you can streamline this interface using data type collections, making PL/SQL applications faster and more efficient.

Almost every application or piece of software today interacts with a database of some sort, with relational databases in particular. Although there are a number of languages for manipulating relational databases, SQL remains a standard. While SQL is a robust language, it has some limitations.

To overcome SQL limitations, many companies introduced programming languages for building applications that interface with SQL databases—for example PL/SQL from Oracle. 

PL/SQL (Procedural Language extensions to the Structured Query Language) is an easy to use programming language that simplifies the creation of powerful SQL queries. Although the term itself is specific to the Oracle programming language, the general concept of extending SQL queries with procedural logic applies to many programming languages (e.g. Microsoft’s T-SQL). For simplicity in this article, though, we will refer to PL/SQL throughout.

PL/SQL applications combine procedural logic and SQL statements to retrieve and process data from an SQL database. Because they switch between procedural code and SQL queries, PL/SQL applications can be inefficient. But you can streamline this interface using data type collections, making PL/SQL applications faster and more efficient.

Why use PL/SQL rather than another language like Java?

Although there are many benefits to using other programs such as Java, each has its own specialties. PL/SQL is purely server side, and focuses on interacting with the relational database. Java is an object-oriented language that has both client side and server side functionality. If your focus is on non-database-oriented functionality, Java is your choice. But when it comes to pure database functionality, Java code can result in reduced application speed and performance. Because PL/SQL is tightly integrated with SQL and the underlying database, properly constructed PL/SQL code can limit the number of database calls needed, thereby speeding up data access.

What are some of the potential issues with Java use for database-intensive applications? For one, the imposition of object-oriented structure on the database can lead to unnecessary database connections, which consumes resources and reduces speed. Java may also pre-load large data sets, reducing overall application performance.

Can you use PL/SQL and Java together? Absolutely. PL/SQL can call Java and vice versa. Those programmers who feel comfortable with both PL/SQL and Java can and should find ways to optimize overall application performance by distributing tasks where they are most-well suited and most efficiently performed. If you are familiar with simple relational database programming, PL/SQL is a good choice for you. If you are strong in object-oriented programming, you should have no issues using both.

Each language has its fans and its detractors, and you should carefully consider both the needs of your particular application and your own programming skills when selecting whether to use PL/SQL, Java, both or another language.

Simplifying SQL access with collections

Collections are essentially indexed groups of data that have the same type—for instance arrays or lists (arrays, for instance, are collections of index-based elements). Most programming languages in general provide support for collections.

Collections reduce the number of database calls due to caching (cached by the collections themselves) of regularly accessed static data. Reduced calls equals higher speed and efficiency. Collections can also reduce the total code needed for an application, further increasing efficiency. 

Each element in a collection has a unique identifier called a subscript. Collections come with their own set of methods for operating on the individual elements. PL/SQL includes methods for manipulating individual elements or the collection in bulk.

PL/SQL uses three types of collections: associative arrays, nested tables, and varrays. Choosing the right collection (or collections) for your application requires understanding the differences between collection types and how each works. The primary differences are:

  1. Whether they are bounded or unbounded (i.e. fixed number of elements or variable). 
  2. Whether they are dense or sparse (i.e. are there any gaps between elements.
  3. Whether they are stored in the application or in the database.

Associative arrays

Earlier versions of PL/SQL used what were known first as PL/SQL tables and later index-by tables. In a PL/SQL table, collections were indexed using an integer. Individual collection elements could then be referenced using the index value. 

Because it was not always easy to identify an element by its subscript, PL/SQL tables evolved to include indexing by alphanumeric strings. The ability to index by the data itself rather than the index value facilitates easier and faster identification of data for manipulation such as deletion or update. 

PL/SQL tables or index-by tables are now known as associative arrays. Associative arrays are single-dimensional collections, also referred to as key-value pair sets. You can think of it as a single column table. Multi-dimensional associative arrays can be created by making collections of collections.   

Associative arrays are unbounded, so there is no need to specify a size when creating one. They can be either dense or sparse. You can, for example, define the values of elements 1, 3, 10, and 200 of an associative array without setting values for any other elements. Associative arrays, however, must be stored in the PL/SQL application rather than in the SQL database, so they cannot be manipulated with SQL statements.

Best uses for associative arrays

Associative arrays are your best choice when you prioritize speed and flexibility of indexing over storage location. A real life use case for associative arrays would be in fundamental programming programming patterns (like memoization, or where function calls are returned to the original cached result to help speed up programs).

Nested tables

Like associative arrays, nested tables are unbounded and require no size specification on creation. Nested tables always begin as dense collections of elements with sequential subscripts. However, because nested tables allow for manipulation of individual elements, nested tables can become sparse through the deletion of elements.

Nested tables can be created either in the database or in a PL/SQL block at the application level. When nested tables are created in the database, they are stored in a separate database table from other columns. However, all nested table types for a database table are stored together in the same separate database table.

Best uses for nested tables

Nested tables are your best choice when you need the ability to dynamically alter individual elements, whether the number of elements or the contents of the elements. Nested tables are also useful when you want to store elements in the SQL database rather than the PL/SQL application.

Varrays

Varrays (short for variable arrays) are bounded—the creation of a varray requires setting the maximum number of elements for the varray. Varrays are therefore useful for storing fixed-size sequential collections of elements that have the same type. 

Varrays store elements in the same order as they are added to the array in contiguous memory addresses. The lowest address represents the first element of the varray and the highest address represents the last element. 

Although the initial size of a varray is fixed, the individual elements each have maximum sizes that can be altered dynamically. In addition, you can modify the length of a varray using the EXTEND and TRIM methods. Individual elements within a varray (other than the ends), however, cannot be updated or deleted, and therefore varrays are always dense.

Best uses for varrays

Varrays are your best choice when you are working with a fixed and delimited set of elements, for example when you need to preserve an ordered list. Varrays are also useful when you need or want to store data in the SQL database itself.

Examples of using collections in PL/SQL applications

As a basis for providing example coding for the various collection types, we will use the following example.

Suppose that you work as a cybersecurity professional in the IT department of a large corporation. The IT director asks you to start creating monthly logs of identified vulnerabilities created by SQL injections, identifying the leading four vulnerability types every month by location.

In addition, for each vulnerability, you should log specific information about severity from several sources, including the National Vulnerability Database (NVD) Common Vulnerabilities and Exposures (CVE) ID number, the associated Common Weakness Enumeration (CWE) ID number, the Common Vulnerability Scoring System (CVSS) score, and the type of vulnerability from the SQL injection.

To begin with, what are SQL injections? According to cybersecurity expert Mark Preston of Cloud Defense, “SQL injections are attacks that occur when cybercriminals exploit vulnerabilities within search queries executed by a host database. This allows attackers to get access to sensitive information or even change authorizations or user permissions. They can also destroy or manipulate sensitive data found in that database.”

You will also need to associate every vulnerability with an internal project for rectifying it. Consider how you might use the various collections to implement the various pieces of this job assignment. Let’s start with the assembly of the top four vulnerabilities by month and location. Since you know you need a fixed size collection, as you are only tracking four vulnerabilities per location per month, you could use a varrary.

Let’s begin by creating a four element varray, top_vulns, in an existing SQL table. While normal practice would be to use the full form ‘vulnerability,’ to make the code examples easier to read, we have shortened it to vuln or vulns. top_vulns will have the same data type as column vuln_type in table vulns.

CREATE TYPE top_vulns IS VARRAY(4) of vulns.vuln_type%TYPE;
/

Next, create a table for the log data, with the top monthly vulnerabilities set as the top_vulns type just created.

CREATE TABLE vuln_log (
	month_id	DATE,
	location_id	VARCHAR2(30)
	monthly_top_vulns	top_vulns);
/

Add an entry for April:

INSERT INTO vuln_log (month_id, location_id, monthly_top_vulns)
	VALUES (TO_DATE(‘04/21’, ‘MM/YY’), ‘San Diego’, top_vulns(‘DoS’, ‘Code execution’,
‘Overflow’, ‘SQL injection’));

Because you are using a varray, if you subsequently output the data into a report, the order of the top vulnerabilities will be shown in the order in which they were entered. 

Recall that you cannot modify the individual elements in a varray. There are, however, ways to change the length of the varray, using the EXTEND and TRIM methods. Be careful when using EXTEND and TRIM, however, because they count any deleted elements.

Let’s say the director now wants to track only the top three vulnerabilities. You can use TRIM to eliminate the extra data. If the director decided to reduce the logging even further however, you could not use another undimensioned TRIM method, because TRIM still thinks there are four elements in the varray. 

If you want to be able to manipulate the individual elements of the collection, or you think the IT director hasn’t made up her mind yet about how many vulnerabilities you should track each month, you could use a nested table instead of a varray.

Syntax is slightly different from the beginning. We will create the top vulnerabilities type as a table rather than a varray, but again with the same data type as the vuln_type column in table vulns. As nested tables are not fixed size collections, we will not include a size parameter. Comparing the syntax to the traditional PL/SQL table syntax, you can see that no indexing language is used.

CREATE TYPE top_vulns_tab IS TABLE OF vulns.vuln_type%TYPE
/

Next, we again create the log table. Recall that for nested tables, the data is stored in a separate table so add an instruction naming that table. 

CREATE TABLE vuln_log (
	month_id 	DATE,
	location_id	VARCHAR2(30),
	monthly_top_vulns_tab	top_vulns_tab)
	NESTED TABLE monthly_top_vuns_tab STORE AS top_vulns_table;
/

Now, we can start adding data to the database. SQL injections in particular are among the most common network vulnerabilities that programmers have to contend with. In fact, in April 2021, the top four types of vulnerabilities and attacks at the San Diego facility were DoS (denial of service), code execution, overflow, and SQL injections.

With this in mind, let’s say you need to input an entry for March 2021 at the Houston facility, where the top four vulnerabilities are XSS (cross-site scripting), memory corruption, DoS, and SQL injection.

INSERT INTO vuln_log (month_id, location_id, monthly_top_vulns_tab)
VALUES (TO_DATE(‘03/21’, ‘MM/YY’), ’Houston’, monthly_top_vulns_tab(‘XSS’, 
‘memory corruption’, ‘DoS’, ‘SQL injection’) );
/

Now you find out that you were given the wrong data; you should have had overflow instead of memory corruption. With the varray, there would be nothing you could do other than remove the entire entry and create a new one. With the nested table, however, you can simply update the entry.

Use the TABLE command to manipulate data within a nested table collection.

UPDATE TABLE(SELECT monthy_top_vulns_tab
		     FROM vuln_log
		     WHERE month_id = ‘03/21’
			AND location_id = ‘Houston’)
	SET column_value = ‘overflow’
	WHERE column_value = ‘memory corruption’)

How can we apply associative arrays in this example? As noted above, the data we have for each vulnerability includes the CVE, CWE, and CVSS. Associative arrays are single-dimensional (i.e a single column), so we will work with a table that lists the vulnerability CVEs in each month. We want to index by the CVE, which is an alphanumeric string, rather than by numeric identifiers, as in prior versions of PL/SQL Tables. 

First, remember that associative arrays are stored in the application rather than the database, so use the appropriate syntax.

DECLARE
	TYPE top_vulns_CVE_tab IS TABLE OF vulns.CVE%TYPE
		INDEX by vulns.CVE%TYPE;
	monthly_top_vulns_CVE top_vulns_CVE_tab;

Now, let’s say that we want to remove a record for a vulnerability identified by CVE-2021-0370. Rather than having to search through the table using a loop as we would have had to do if the table was indexed by the numeric identifier, we can now directly reference and manipulate the specific entry.

monthly_top_vulns_cve.DELETE(‘CVE-2021-370’)

Conclusion

When developing applications to work with SQL databases, the ability to minimize the number of database calls is an important consideration for efficient operation. Use the special collections features in your PL/SQL or other programming language to ensure your application is as fast and efficient as possible. By creating ordered or indexed collections of data of the same type, you can simplify searches for and eventual manipulation of individual collection elements. 

Tags: ,
Podcast logo The Stack Overflow Podcast is a weekly conversation about working in software development, learning to code, and the art and culture of computer programming.

Related

The Overflow Newsletter Banner
newsletter April 30, 2021

The Overflow #71: Copied and pasted

Welcome to ISSUE #71 of the Overflow! This newsletter is by developers, for developers, written and curated by the Stack Overflow team and Cassidy Williams at Netlify. This week: we know what you copied and pasted last summer, checking out our biggest fans at rest, and the Clapper is back but in your browser this time. Introducing Stack Overflow for…