Saturday, October 01, 2005

May be now time to test ;)

This will really test our potential...(Just kidding)
How do you create a permutation & combination in sql

A permutation is an ordered arrangement of elements of a set.
For example, if I have the set {1, 2, 3}, the permutations of those elements are
(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), and (3, 2, 1).
The rule is that for n elements, you have a factorial n! number of permutations.

Try to make the answer easy to generalize for more numbers like suppose i give {1,2,3,4,5} i shud get the permutation of the numbers [ in total 5! ].

Let us try a basic one.

           CREATE TABLE Elements (i INTEGER NOT NULL);

i'm trying for the set {1,2,3}.
so inserting three values in the table

INSERT INTO Elements VALUES (1);
INSERT INTO Elements VALUES (2);
INSERT INTO Elements VALUES (3);

don't read more. try at ur hand in machine infront of u,
take it as challenge to do this ;)

FIRST ANSWER
SELECT e1.i A,e2.i B,e3.i C
FROM Elements e1,Elements e2,Elements e3
WHERE e1.i NOT IN (e2.i,e3.i)
AND e2.i NOT IN (e1.i,e3.i)
AND e3.i NOT IN (e2.i,e1.i)

This looks obvious and horrible answer. isn't ?.
This monster predicate will guarantee that all column values
in a row are unique.An improvement on this query can be made
by adding one more predicate to the where clause

FIRST ANSWER [ with modification ]
SELECT e1.i A,e2.i B,e3.i C
FROM Elements e1,Elements e2,Elements e3
WHERE (E1.i + E2.i + E3.i) = 6
AND e1.i NOT IN (e2.i,e3.i)
AND e2.i NOT IN (e1.i,e3.i)
AND e3.i NOT IN (e2.i,e1.i)

This improves things because most optimizers will see a predicate
of the structure [EXPRESSION] = [CONSTANT] and will execute it
before the and-ed chain of in() predicates. While not all rows
that total 6 are a permutation, all permutations will total to 6
for this set of integers.
It will definetly improve the query performance , agreed ?

Do you think this will be the solution ? No ,
we can try to write in different way.
Let's carry the totals trick one step further.
First,
redefine the Elements table to have a weight for each element
in the set: The weights are powers of two of the elements.

CREATE TABLE Elements_with_wgt(
i INTEGER NOT NULL ,
wgt INTEGER NOT NULL);

INSERT INTO Elements_with_wgtVALUES (1, 1);
INSERT INTO
Elements_with_wgtVALUES (2, 2);
INSERT INTO
Elements_with_wgtVALUES (3, 4);

Now, see the where clause how it becomes:

SECOND ANSWER
SELECT e1.i A,e2.i B,e3.i C
FROM Element1 e1,Element1 e2,Element1 e3
WHERE (E1.wgt + E2.wgt + E3.wgt) = 7

This does the whole filtering job for you and the in() predicates
are all unnecessary. In the previous solution we have seen the
elements shud be numbers as we are summming the value & adding
that to predicate, so the elements cant be other than numbers.
But this second answer also has another beneficial effect:
The elements can now be of any data type and
are not limited just to numbers.

A Thought

"You can't be truly rude until you understand good manners."

I read this somewhere . good one :)


A word abt INTEGERS

Recently, I gotta chance to do something which changed my way of understanding .Believed me until that point thought an integer is a number that can only store whole numbers (or more mathematically expressed: the positive natural number 1, 2, 3..., the negative natural numbers -1, -2, -3, as well as zero). Unfortunately, that is not always the case with Oracle.

                declare
i integer;
n number(38);
begin
i := 3.14159265;
n := 2.17111891;
dbms_output.put_line('i: ' || i);
dbms_output.put_line('n: ' || n);
end;
/

try this u will find the expected answer, the variables cannot store
the fractional parts of the numbers being assigned to the variables.
In fact, Oracle rounds the numbers towards the closest natural number:

i: 3
n: 3

Now let us remove the declaration part & try the following
create procedure takes_integer(i in integer, p in pls_integer)
is
begin
dbms_output.put_line('i [integer]: ' || i || ', p [pls_integer]: ' || p);
end takes_integer;
/


Things change slightly, but significantly, when the declaration
is not in the DECLARE part of a PL/SQL block but in the argument
list of a procedure.
Exec this
exec takes_integer(3.14159265, 2.71828183);

This procedure is called, again with two non-natural numbers
did u get surprised by the result
i [integer]: 3.14159265, p [pls_integer]: 3

The argument i an integer can obviously store non integers.
However, a pls_integer can not.

Point to be kept in mind :
if you want to have an integer passed to a procedure/function,
use pls_integer instead of integer. so that u canmake sure that
u are passing only the integer not as non-integer.

how it happened in case of procedure ??
try desc ing the procedure
desc takes_integer;

SQL> desc takes_integer
PROCEDURE takes_integer
Argument Name Type In/Out Default?
------------------------------ ----------------------- ------ --------
I NUMBER(38) IN
P BINARY_INTEGER IN


An integer, being an ansi_datatype , is (rightly) implicitely converted
into number(38).But number(38) should not be able to stored 3.14159265.

R u still using INTEGER or NUMBER for ur looping ?

We always looking to squeeze that extra performance to make our PL/SQL programs, functions and procedures run just a little bit faster?
Hey, you already have them optimized, right ? have u ?

For example in a loop or record counter,
which one u will use NUMBER or INTEGER... what thinking abt answer to this question ?.
if u have answered anyone of this u r wrong.
Answer shud h've been "options not enough"
Still using number or integer for the counter field, throw up those variables , use PLS datatypes.
Well, consider using the PL/SQL datatype pls_integer instead of the number datatype where only integers are needed.
When declaring an integer variable, PLS_INTEGER is the most efficient numeric datatype because its values require less storage than INTEGER or NUMBER values, which are represented internally as 22-byte Oracle numbers. The number datatype is a memory hog compared to pls_integer, which is designed to consume very little resource at runtime, specifically for PL/SQL.
Also, PLS_INTEGER operations use machine arithmetic, so they are faster than BINARY_INTEGER, INTEGER, or NUMBER operations,which use library arithmetic.

Let us see what is this datatype

This datatype is available in PL/SQL Release 2.3 and above.

Variables declared as PLS_INTEGER store signed integers. The magnitude range for this datatype is -2147483647 through 2147483647. Oracle recommends that you use PLS_INTEGER for all integer calculations which do not fall outside of its range. PLS_INTEGER values require less storage than NUMBER values, and operations on PLS_INTEGER's use machine arithmetic, making them more efficient.

Variables declared as pls_integer and binary_integer have the same range, but they are treated differently. When a calculation involving pls_integer overflows, pl/sql raises an exception. However, similar overflow involving binary_integers will not raise an exception if the result is being assigned to a number variable.

Usage :
DECLARE
fast_variable pls_integer;

wait for more posting to get what is " Library / machine arithmetic " .

Thursday, September 29, 2005

SQL Injection ? Its not Medical Injection Huh now...what is that

SQL injection is a basic attack used to either gain unauthorized access to a database or to retrieve information directly from the database.

SQL injection attacks are simple in nature – an attacker passes string input to an application in hopes manipulating the SQL statement to his or her advantage. The complexity of the attack involves exploiting a SQL statement that may be unknown to the attacker. Open-source applications and commercial applications delivered with source code are more vulnerable since an attacker can find potentially vulnerable statements prior to an attack.

There are four main categories of SQL Injection attacks against Oracle databases –

  • 1. SQL Manipulation
  • 2. Code Injection
  • 3. Function Call Injection
  • 4. Buffer Overflows
  • We will see SQL Manipulation With Example
    SQL manipulation involves modifying the SQL statement through set operations (e.g., UNION) or altering the WHERE clause to return a different result. Many documented SQL injection attacks are of this type. The most well known attack is to modify the WHERE clause of the user authentication statement so the WHERE clause always results in TRUE.

    The classic SQL manipulation is during the login authentication. A simplistic web application may check user authentication by executing the following query and checking to see if any rows were returned

    – SELECT * FROM users WHERE username = 'bob' and PASSWORD = 'mypassword'

    The attacker attempts to manipulate the SQL statement to execute as

    – SELECT * FROM users WHERE username = 'bob' and PASSWORD = 'mypassword' or 'a' = 'a'

    Based on operator precedence, the WHERE clause is true for every row and the attacker has gained access to the application.

    The set operator UNION is frequently used in SQL injection attacks. The goal is to manipulate a SQL statement into returning rows from another table. A web form may execute the following query to return a list of available products

    –SELECT product_name FROM all_products WHERE product_name like '%Chairs%'

    The attacker attempts to manipulate the SQL statement to execute as

    SELECT product_name FROM all_products WHERE product_name like '%Chairs' UNION SELECT username FROM dba_users WHERE username like '%'

    The list returned to the web form will include all the selected products, but also all the database users in the application.

    What’s Vulnerable

    A web application is vulnerable to SQL injection for only one reason – end user string input is not properly validated and is passed to a dynamic SQL statement. The string input is usually passed directly to the SQL statement. However, the user input may be stored in the database and later passed to a dynamic SQL statement. Because of the stateless nature of many web applications, it is common to write data to the database between web pages. This indirect type of attack is much more complex and requires in-depth knowledge of the application.
    FYI --
    Oracle E-Business Suite was having Vulnerabilities which was rectified in later release.
    Although u know there are 4 Vulnerabilities ,
    oracle is the only database which is stable wen considering all the attacks,.
    one of the reason as far as i know is.
    First Reason is it(Oracle) doesnt support multiple queries,...........
    The rest are

    Oracle has generally faired well against SQL injection attacks as there is no multiple SQL statement support (SQL Server and PostgreSQL), no EXECUTE statement (SQL Server), and no INTO OUTFILE function (MySQL). Also, the use of bind variables in Oracle environments for performance reasons provides the most effective protection against SQL injection attacks.

    Oracle may provide stronger and more inherent protections against SQL injection attacks than other databases, however, Oracle-based applications without proper defenses against these types of attacks can still be vulnerable.

    Monday, August 22, 2005

    Knew abt what u create

    A word abt loop is necessary as it is the basic one which strikes every programmer when he sees anything to get repeated :)
    But mostly the code written inside the loops should re-looked carefully again, if u are to make sure that the code is fine tuned.As in loops any inefficiency inside a loop's body will be magnified by the multiple executions of that code.
    A common mistake is it put the code that is static for the loop, inside the loop.
    Identify those then extract the static code, assign the outcomes of that code to one or more attribute, and then reference those attributes inside the loop.
    For ex:

    CREATE OR REPLACE PROCEDURE summarize_reviews (
    summary_title_in IN VARCHAR2,
    isbn_in IN book.isbn%TYPE)
    IS
    CURSOR review_cur IS
    SELECT text, TO_CHAR (SYSDATE, 'MM/DD/YYYY') today
    FROM book_review
    WHERE isbn = isbn_in;
    BEGIN
    FOR review_rec IN review_cur
    LOOP
    IF LENGTH (review_rec.text) > 100
    THEN
    review_rec.text := SUBSTR (review_rec.text, 1, 100);
    END IF;

    review_pkg.summarize (
    UPPER (summary_title_in),
    today,
    UPPER (review_rec.text)
    );
    END LOOP;
    END;
    Here looking at the example above
    1 . Cursor review_cur select SYSDATE with each row of query.SYSDATE is going to be same
    2.
    the field of the review_rec record is overwritten again & again , is it needed ?
    3. Look at the summary_title_in argument never changes, it UPPER case'd in each iteration of the loop. can't we put this outside the loop :).
    4. And one more things is , watch out ...
    checking the length of the text for each row and then SUBSTR (and UPPER case), why not just SUBSTR inside SQL?
    Which one is better Pl/Sql or Sql ? to write this...

    Just re-written the code to make the changes said
    CREATE OR REPLACE PROCEDURE summarize_reviews (
    summary_title_in IN VARCHAR2,
    isbn_in IN book.isbn%TYPE)
    IS
    l_summary book_types.summary_t
    := UPPER (summary_title_in);

    l_today CONSTANT DATE := TRUNC (SYSDATE);

    CURSOR review_cur IS
    SELECT UPPER (SUBSTR (text, 1, 100)) text
    FROM book_review
    WHERE isbn = isbn_in;
    BEGIN
    FOR review_rec IN review_cur
    LOOP
    review_pkg.summarize (
    l_summary, l_today, review_rec.text
    );
    END LOOP;
    END;
    /

    In general, we can expect the performance of built-in functions such
    as SUBSTR to work more efficiently in SQL than in PL/SQL.
    so always mv the builtin function inside the Sql rather than Pl/Sql.
    processing to the SQL layer whenever possible.

    Friday, July 15, 2005

    ( Try to ) Exit once

    Here's part of a program that compares two files for equality.
    After reading the next line from each file, it checks for the following
    conditions:
    1. Did it reach the end of both files?
    2. Are the lines different?
    3. Did I reach the end of just one file?
    In each case, set the "return value" for the function and also issue an EXIT statement:
    LOOP
    read_line (file1, line1, file1_eof);
    read_line (file2, line2, file2_eof);

    IF (file1_eof AND file2_eof)
    THEN
    retval := TRUE;
    EXIT; -- first time
    ELSIF (line1 != line2)
    THEN
    retval := FALSE;
    EXIT; -- Second time repeated
    ELSIF (file1_eof OR file2_eof)
    THEN
    retval := FALSE;
    EXIT; -- Third time repeated
    END IF;
    END LOOP;


    Sometimes it can be difficult to come up with just one EXIT statement.
    This usually occurs when we need to check a condition at the beginning
    and end of a loop.
    We should also be careful to initialize your return value and
    your loop terminator variable, to avoid unwanted NULL values that
    might disrupt your logic.


    Then rewrite this loop body as follows

    LOOP
    read_line (file1, line1, file1_eof);
    read_line (file2, line2, file2_eof);

    IF (file1_eof AND file2_eof)
    THEN
    retval := TRUE;
    exit_loop := TRUE;
    ELSIF (line1 != line2)
    THEN
    retval := FALSE;
    exit_loop := TRUE;
    ELSIF (file1_eof OR file2_eof)
    THEN
    retval := FALSE;
    exit_loop := TRUE;
    END IF;
    EXIT WHEN exit_loop; -- Only one time
    END LOOP;

    A single EXIT is especially important in large, complex loop bodies;
    It allows you to more easily trace and debug your code.

    Challenges lies on how badly the loop was written initially,
    we may need to perform substantial restructuring to improve the loop
    code.


    Dont ever try to redo the thing !

    Sometimes, we will may write conditional statements that may be valid still, and are unnecessary and cumbersome. Such statements often reflect a lack of understanding about how you can and should use language structures.
    for exempli gratia :

    DECLARE
    boolean_variable BOOLEAN;
    BEGIN
    IF boolean_variable = TRUE
    THEN
    ...
    ELSIF boolean_variable = FALSE
    THEN
    ...
    END IF;
    END;


    what we are gonna acheive by redoing the thing already
    the compiler is doing for us.
    ( havent u heard of word like " never try to re-invent the wheel " ) ;;)

    here goes the simplification of the code .

    DECLARE
    boolean_variable BOOLEAN;
    BEGIN
    IF boolean_variable
    THEN
    ...
    ELSIF NOT boolean_variable
    THEN
    ...
    END IF;
    END;

    It is implicit that the what the boolean is going to do for us,
    one more classic example for the inefficient use of the boolena variable we can find at here;

    IF DOB '<' SYSDATE THEN
    celebrated := TRUE;
    ELSE
    celebrated := FALSE;
    END IF;
    here u can see the idea behind this is to find the boolean, has it been celebrated.
    may be simple thing here to notice will be.
    there are only two state for the conditional statements.
    celebrated := DOB '<' SYSDATE;
    here watch out carefully, the DOB can be NULL ;
    celebrated := NVL( DOB '<' SYSDATE);
    The above statement offers a comparable expression ;) is it nt ?
    ( doing this may make ur code look expressive )

    Best practices IF ELSE

    When we need to write conditional logic that has several mutually exclusive clauses ;)
    (in other words, if one clause is TRUE, no other clause evaluates to TRUE) or
    (only one condition to be evaluated at a given point of time)
    use the ELSIF construct: , never try to use IF ENDIF construct

    suppoose ur intention is to write something like this :

    BEGIN
    IF count = 1
    THEN
    process_A;
    END IF;
    IF count = 2
    THEN
    process_B;
    END IF;
    ...
    IF count = 3
    THEN
    process_C;
    END IF;
    END;


    in the above case
    Every IF statement is executed and each condition evaluated.
    You should rewrite such logic as follows

    BEGIN
    IF count = 1
    THEN
    process_A;
    ELSIF count = 2
    THEN
    process_B;
    ...
    ELSIF count = 3
    THEN
    process_C;
    END IF;
    END;
    what is the benefit of these type of code

    This structure clearly expresses the underlying "reality"
    of your business logic:
    if one condition is TRUE, no others can be TRUE.

    ELSIF offers the most efficient implementation for
    processing mutually exclusive clauses. When one
    clause evaluates to TRUE, all subsequent clauses are ignored.

    May be in the real world scenario we need to deal with
    some convoluted logical expressions like following:
    IF condA AND NOT ( condB OR condC ) 
    THEN
    process_A;
    ELSIF condA AND (condB OR condC)
    THEN
    process_B;
    ELSIF NOT condA AND condD
    THEN
    process_C;
    END IF;
    from the above example u can very well see that
    condition B & C are repeated in two of these conditional statement

    It's also fairly common to get a headache trying to make sense
    of the code we have written. You can often reduce the trauma
    of headache by trading off the simplicity of the( code )
    IF statement itself.
    Rather than having seperate level for each condtion
    we can club the clauses within multiple levels:

    we will see ,how to re-write it :

    IF condA 
    THEN
    IF (condB OR condC)
    THEN
    process_B;
    ELSE
    process_A;
    END IF;
    ELSIF condD
    THEN
    process_C;
    END IF;
    re-writing the code what we acheive

    "if and when the logic changes, you can change one IF
    clause without affecting the logic of others."

    Always make your code easier to read and maintain. ;)
    ( may be for that only we are paid )

    may b , now u may start to ask the question
    how to write the code which is maintainable ?
    Breaking an expression into smaller pieces can aid maintainability;