Back to EveryPatent.com
United States Patent | 5,542,073 |
Schiefer ,   et al. | July 30, 1996 |
A method for choosing join selectivities in a query optimizer in a relational database management system is disclosed which facilitates the estimation of join result sizes by a query optimizer in a relational database system, wherein a new relation R is to be joined with an intermediate relation I, and wherein the selectivity values for each eligible join predicate are known. The method has the steps of determining the equivalence classes for a plurality of join attributes and then computing for each relation an estimate of the cardinality and the number of distinct values in each attribute after all the local predicates have been included. These are used in further computation of join selectivities and join result sizes. The join predicates must then be processed by correctly choosing the join selectivities. The join result sizes can then be correctly calculated.
Inventors: | Schiefer; Klaus B. (Scarborough, CA); Swami; Arun N. (San Jose, CA) |
Assignee: | International Business Machines Corporation (Armonk, NY) |
Appl. No.: | 510078 |
Filed: | August 1, 1995 |
Current U.S. Class: | 707/2 |
Intern'l Class: | G06F 017/30 |
Field of Search: | 395/600 |
4774657 | Sep., 1988 | Anderson et al. | 364/200. |
5091852 | Feb., 1992 | Tsuchida et al. | 395/600. |
5287493 | Feb., 1994 | Jacopi | 395/600. |
5301317 | Apr., 1994 | Lohman et al. | 395/600. |
5325525 | Jun., 1994 | Shan et al. | 395/650. |
5335345 | Aug., 1994 | Frieder et al. | 395/600. |
5345585 | Sep., 1994 | Iyer et al. | 395/600. |
5367675 | Nov., 1994 | Cheng et al. | 395/600. |
5412804 | Apr., 1995 | Krisna | 395/600. |
5412806 | May., 1995 | Du et al. | 395/600. |
5423035 | Jun., 1995 | De Prez | 395/600. |
5469568 | Nov., 1995 | Schiefer et al. | 395/600. |
P. J. Haas and A. N. Swami, Sequential Sampling Procedures for Query Size Estimation, Proceedings of the 1992 ACM SIGMOD International Conference on Mangement of Data, SIGMOD Record, vol. 21, Issue 2, Jun. 1992. T. Y. Cheung, A Statistical Model for Estimating the Number of Records in a Relational Database, Info. Process. Lett. (Netherlands), vol. 15, No. 3, pp. 115-118, Oct. 1982. T. Mostardi, Estimating the Size of Relational SPOJ Operation Results: An Analytical Approach, Info. Sys. vol. 15, No. 5, pp. 591-601, 1990. A. S. Rosenthal, Note on the Expected Size of a Join, ACM, SIGMOD Record vol. 11, No. 4, Jul. 1981. S. Christodoulakis, Estimating Block Transfers and Join Sizes, ACM, IEEE vol. 13, No. 4, pp. 40-54, pp. 40-54, May 1983. B. Da et al., Pragmatic Estimation of Join Sizes and Attribute Correlations, Proceedings Fifth International Conference on Data Engineering, pp. 76-84, 1989. M. V. Mannino et al., Statistical Profile Estimation in Database Systems, Comput. Surv. (USA), vol. 20, No. 3, pp. 191-221, Sep. 1988. J. Y. Tien et al., Comments on "Hash-based and Index-Based Join Algorithms for Cube and Ring Connected Multicomputers", IEEE Trans. Knowl, Data Eng. (USA), vol. 3, No. 3, pp. 387-389, Sep. 1991. D. Gardy, et al., On the Effect of Join Operations on Relation Sizes, ACM Trans. Database Syste. (USA), vol. 14, No. 4, pp. 574-603, Dec. 1989. J. M. Lee et al., An Upper Bound on Buffer Size for Join Operation Using Nonclustered Indexes, Info. Process. Lett. (Netherlands), vol. 36, No. 2, pp. 85-90, 15 Oct. 1990. A. Kumar et al., The Effect of Join Selectivities on Optimal Nesting Order, ACM, SIGMOD Record, vol. 16, No. 1, pp. 28-41, Mar. 1987. R. J. Lipton et al., Practical Selectivity Estimation through Adaptive Sampling, ACM, SIGMOD International Conf. on Management of Data, Atlantic City, USA, May 23-25 May 1990. P. Valduriez et al., Join and Semijoin Algorithms for a Multiprocessor Database Machine, ACM Transactions on Database Systems, vol. 9, No. 1, pp. 133-161, Mar. 1984. T. Ibaraki et al., On the Optimal Nesting Order for Computing N-Relational Joins, ACM Transactions on Database Systems, vol. 9, No. 3, pp. 482-502, Sep. 1984. D. M. Dias et al., Methods for Improving the Efficiency of Parallel Sort Merge Joins in the Presence of Data Skew, IBM Technical Disclosure Bulletin, vol. 33 No. 10A, pp. 166-167, Mar. 1991. M. N. Wegman, Method of Estimating the Size of a Set, IBM Technical Disclosure Bulletin, vol. 16, No. 7B, pp. 3893-3894, Dec. 1983. Swami et al., Computer Database System Using a Polynomial Time Algorithm for Optimizing, Ser. #07/801,306 filed 11/12/91 (IBM Doc. #SA991076). Elmasri et al., Fundamentals of Database Systems, Benjamin/Cummings 1989, pp. 502-505. |
______________________________________ /* PROCEDURE TO HANDLE JOIN ATTRIBUTES FROM RELATION R THAT BELONG TO THE SAME EQUIVA- LENCE CLASS */ 1: more.sub.-- relations := TRUE 2: 3: while (more.sub.-- relations = TRUE) 4: R = get.sub.-- next.sub.-- relation() 5: more.sub.-- attributes := TRUE 6: while(more.sub.-- attributes = TRUE) 7: join.sub.-- attr.sub.-- A := get.sub.-- next.sub.-- join.sub.-- attribute(R) 8: if (equivalence.sub.-- class[join.sub.-- attr.sub.-- A] > 0) 9: more.sub.-- other.sub.-- join.sub.-- attr := TRUE 10: while (more.sub.-- other.sub.-- join.sub.-- attr = TRUE) 11: join.sub.-- attr.sub.-- B := get.sub.-- next.sub.-- join.sub.-- attribute(R) 12: if ((join.sub.-- attr.sub.-- A .noteq. join.sub.-- attr.sub.-- B) and 13: (equivalence.sub.-- class[join.sub.-- attr.sub.-- A] = equivalence.sub.-- class[join.sub.-- attr.sub.-- B])) 14: if (d.sub.-- A < d.sub.-- B) 15: card.sub.-- R := card.sub.-- R/d.sub.-- B 16: equivalence.sub.-- class[join.sub.-- attr.sub.-- B] := 0 17: else 18: card.sub.-- R := card.sub.-- R/d.sub.-- A 19: equivalence.sub.-- class[join.sub.-- attr.sub.-- A] := 0 20: if no more other join attributes then 21: more.sub.-- other.sub.-- join.sub.-- attr := FALSE 22: if no more join attributes then 23: more.sub.-- attributes := FALSE 24: if no more relations then 25: more.sub.-- relations := FALSE ______________________________________
______________________________________ /* PROCEDURE TO CHOOSE THE JOIN SELECTIVITIES TO COMPUTE JOIN RESULT SIZES */ 1: Let K denote the total number of equivalence classes 2: 3: for (i := 1; i .ltoreq. K; i := i+1) 4: eq.sub.-- join.sub.-- self[i] := 0 5: more.sub.-- exists := TRUE 6: while(more.sub.-- exists = TRUE) 7: join.sub.-- pred := get.sub.-- next.sub.-- join.sub.-- predicate() 8: join.sub.-- attr := join attribute of relation R in join.sub.-- pred 9: k := equivalence.sub.-- class[join.sub.-- attr] 10: join.sub.-- sel := join selectivity of join.sub.-- pred 11: if (join.sub.-- sel > eq.sub.-- join.sub.-- sel[k]) 12: eq.sub.-- join.sub.-- sel[k] := join.sub.-- sel 13: if (no more join predicates) 14: more.sub.-- exists := FALSE 15: 16: card.sub.-- IR := card.sub.-- I .times. card.sub.-- R 17: for (k := 1; k .ltoreq. K; k := k+1) 18: if (eq.sub.-- join.sub.-- sel[k]>0) 19: card.sub.-- IR := card.sub.-- IR .times. eq.sub.-- join.sub.-- sel[k] ______________________________________