Home > tips > Removing duplicate rows from a table

Removing duplicate rows from a table


Removing duplicate rows from a table has several ways. I prefer below method, because it is easy to implement. For table, schema1.table1 having columns col1, col2, col3 which will be primary key or unique key columns after removing duplicate entries will be accomplished by below SQL:
DELETE FROM schema1.table1 a
WHERE a.ROWID >
ANY (SELECT b.ROWID
FROM schema1.table1 b
WHERE a.col1 = b.col1
AND a.col2 = b.col2
AND a.col3 = b.col3);

Advertisements
Categories: tips Tags: ,
  1. May 5, 2008 at 8:06 am

    Performans bakış açısı ile bir örnek;

    SELECT /*+ parallel */
    year_month, COUNT(subscriber_id), COUNT(DISTINCT subscriber_id)
    FROM eul.dm_subscriber_history_01 t
    GROUP BY year_month;

    200711 60791328 60261922

    avrupa@oracle $ cat tmp.sql
    spool tmp.log

    conn eul/eul

    SET LINESIZE 2000
    SET AUTOTRACE TRACEONLY
    SET TIME ON
    SET TIMING ON

    ALTER SESSION ENABLE PARALLEL DDL ;
    ALTER SESSION ENABLE PARALLEL DML ;

    alter session set max_dump_file_size=unlimited ;
    alter session set timed_statistics = true ;
    alter session set STATISTICS_LEVEL = ALL ;
    alter session set “_rowsource_execution_statistics” = true ;
    alter session set tracefile_identifier = duplicates ;

    SELECT /*+ parallel */
    COUNT(*)
    FROM eul.dm_subscriber_history_01
    WHERE ROWID NOT IN (SELECT /*+ parallel */ a.rid
    FROM (SELECT /*+ parallel */
    MAX(ROWID) rid, year_month, subscriber_id, COUNT(1)
    FROM eul.dm_subscriber_history_01
    GROUP BY year_month, subscriber_id
    HAVING COUNT(1) > 1) a);

    /*
    Elapsed: 00:08:37.55

    1886131 consistent gets
    2005144 physical reads
    */

    SELECT /*+ parallel */
    COUNT(*)
    FROM eul.dm_subscriber_history_01
    WHERE ROWID IN (SELECT /*+ parallel */ rid
    FROM (SELECT /*+ parallel */
    ROWID rid,
    row_number() over(PARTITION BY year_month, subscriber_id ORDER BY ROWID) rn
    FROM eul.dm_subscriber_history_01)
    WHERE rn 1);

    /*
    Elapsed: 00:03:12.03

    1472472 consistent gets
    1067141 physical reads
    */

    exit;

    Silinecek kayıt miktarı tüm kayıtların bence ~%10 civarına ulaşınca direkt CTAS nologging paralel girmek daha doğru olabilir.

    Kaynakça : http://www.oracle.com/technology/oramag/oracle/04-jul/o44asktom.html

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Don Charisma

because anything is possible with Charisma

Carol no Mundo!

Aventuras de uma intercambista a trabalho pelo mundo!

Blog do PHP

Oracle DBA for rookies!

nimaidba

Welcome to the world of Oracle with me....

Tech

News and reviews from the world of gadgets, gear, apps and the web

WordPress.com

WordPress.com is the best place for your personal blog or business site.

Gurcan Orhan's ODI and DWH Blog

Some words about Oracle Data Integrator and Data Warehousing.

%d bloggers like this: