Re: Squeezing spaces out of a string

From: John Gilson <jag_at_acm.org>
Date: 12 Feb 2004 23:57:36 -0800
Message-ID: <909c94d1.0402122357.71654546_at_posting.google.com>


joe.celko_at_northface.edu (--CELKO--) wrote in message news:<a264e7ea.0402041037.9013551_at_posting.google.com>...
> Assume '<' and '>' do not appear in col_x.
>
> UPDATE Foobar
> SET col_x
> = REPLACE (
> REPLACE (
> REPLACE(col_x, SPACE(1), '<> '),
> '><', SPACE(0)),
> '<>', SPACE(1));
>
> This is due to a guy named Carnegie; Steve Kass found it in an old
> posting. The only problem is that it fails if the the first function
> call overflows the maximum string length. You might get errors, or
> truncation depending on your SQL.

Very clever. If one doesn't want to be that clever and also not worry about string overflow or whether a certain character is present or not in the string, then one would want to stay within the realm of space replacement. As an upper bound, by simply doing a 2-to-1 space replacement with each call to the REPLACE function, for a VARCHAR(N) we would need CEILING(LOG2(N))nested REPLACE calls. SQL Server's VARCHAR can have a maximum length of 8000 characters so 13 (2^13 = 8192) successive 2-to-1 space replacements will always be sufficient. But this is an upper bound. Replacing more than 2 spaces at each step will yield a better result. But how much better? And what number of spaces should be replaced at each step to guarantee a correct answer?

Here's a stored procedure, written in T-SQL, that will return all sequences of divisors that will reduce a VARCHAR of a given length. The procedure can be called to find all such sequences, regardless of length, for a VARCHAR of a given length or simply the shortest sequences.

CREATE TABLE DivisorSequences
(
divisor_sequence VARCHAR(100) NOT NULL,
number_of_divisors INT NOT NULL CHECK (number_of_divisors >= 1), first_divisor INT NOT NULL CHECK (first_divisor >= 2), max_dividend INT NOT NULL CHECK (max_dividend >= 2), PRIMARY KEY (number_of_divisors, divisor_sequence) )

CREATE VIEW Digits (d)
AS
SELECT 0
UNION ALL
SELECT 1
UNION ALL
SELECT 2
UNION ALL
SELECT 3
UNION ALL
SELECT 4
UNION ALL
SELECT 5
UNION ALL
SELECT 6
UNION ALL
SELECT 7
UNION ALL
SELECT 8
UNION ALL
SELECT 9 CREATE TABLE NonnegativeIntegers
(
n INT NOT NULL PRIMARY KEY CHECK (n >= 0) )

INSERT INTO NonnegativeIntegers (n)
SELECT Ones.d + 10 * Tens.d + 100 * Hundreds.d + 1000 * Thousands.d +

       10000 * TenThousands.d
FROM Digits AS Ones

     CROSS JOIN
     Digits AS Tens
     CROSS JOIN
     Digits AS Hundreds
     CROSS JOIN
     Digits AS Thousands
     CROSS JOIN
     Digits AS TenThousands

CREATE PROCEDURE InsertDivisorSequences
_at_target_dividend INT,
_at_shortest_only CHAR(1) = 'Y' -- 'N' for all sequences regardless of length
AS
IF _at_target_dividend >= 2
BEGIN -- IF
  DECLARE _at_level_number INT
  DECLARE _at_max_level_number INT
  SET _at_level_number = 1
  SET _at_max_level_number =

      CAST(CEILING(LOG(_at_target_dividend)/LOG(2)) AS INT)   INSERT INTO DivisorSequences
  (divisor_sequence, number_of_divisors, first_divisor, max_dividend)   VALUES ('2', _at_level_number, 2, 2)
  WHILE _at_level_number < @max_level_number   BEGIN -- WHILE
    IF _at_shortest_only = 'Y'
    BEGIN -- IF

      IF EXISTS (SELECT *
                 FROM DivisorSequences
                 WHERE number_of_divisors = _at_level_number AND
                       max_dividend >= _at_target_dividend)
        BREAK
      ELSE
        INSERT INTO DivisorSequences
        (divisor_sequence, number_of_divisors, first_divisor,
max_dividend)
        SELECT CAST(N.n AS VARCHAR) + '.' + divisor_sequence,
               D.number_of_divisors + 1,
               N.n,
               (D.max_dividend - N.n + 2) * N.n + N.n - 2
               -- unsimplified version of above is
               -- ((((D.max_dividend + 1) - (N.n - 1)) * N.n) + (N.n -
1)) - 1
        FROM DivisorSequences AS D
             INNER JOIN
             NonnegativeIntegers AS N
             ON D.number_of_divisors = _at_level_number AND
                N.n BETWEEN D.first_divisor AND max_dividend + 1
    END -- IF
    ELSE
    BEGIN -- ELSE
      INSERT INTO DivisorSequences
      (divisor_sequence, number_of_divisors, first_divisor,
max_dividend)
      SELECT CAST(N.n AS VARCHAR) + '.' + divisor_sequence,
             D.number_of_divisors + 1,
             N.n,
             (D.max_dividend - N.n + 2) * N.n + N.n - 2
      FROM DivisorSequences AS D
           INNER JOIN
           NonnegativeIntegers AS N
           ON D.number_of_divisors = _at_level_number AND
              D.max_dividend < _at_target_dividend AND
              N.n BETWEEN D.first_divisor AND max_dividend + 1
    END -- ELSE
    SET _at_level_number = @level_number + 1   END -- WHILE
END -- IF
  • Find all sequences of divisors to reduce a VARCHAR(10) EXEC InsertDivisorSequences 10, 'N'

SELECT divisor_sequence, number_of_divisors, max_dividend FROM DivisorSequences
WHERE max_dividend >= 10
ORDER BY number_of_divisors, divisor_sequence

divisor_sequence number_of_divisors max_dividend

3.2.2 3 10
3.3.2 3 10
4.2.2 3 10
4.3.2 3 10
2.2.2.2 4 16
3.2.2.2 4 22
4.2.2.2 4 26
5.2.2.2 4 28
5.5.2.2 4 28
5.5.3.2 4 28
6.2.2.2 4 28
6.5.2.2 4 28
6.5.3.2 4 28
7.2.2.2 4 26
7.5.2.2 4 26
7.5.3.2 4 26
8.2.2.2 4 22
8.5.2.2 4 22
8.5.3.2 4 22
9.2.2.2 4 16
9.5.2.2 4 16
9.5.3.2 4 16

We see that there are solutions of length 3 and 4 that can reduce a VARCHAR(10).
The divisor_sequence 4.3.2, for example, gives the divisors 4, 3, and 2 in that order. To see only the shortest sequences

DELETE FROM DivisorSequences

EXEC InsertDivisorSequences 10

For a maximum-length VARCHAR, size 8000, the shortest sequences are of length 6. And there are over 200,000 to choose from! By the way, finding the
shortest sequences for length 8000 took under 30 seconds to execute on a 1.7 GHz P4 with 512 MB so investigation is not prohibitive.

Regards,
jag Received on Fri Feb 13 2004 - 08:57:36 CET

Original text of this message