Pages in topic:   [1 2] >
Remove duplicate rows in Excel
Thread poster: Samuel Murray
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 22:54
Member (2006)
English to Afrikaans
+ ...
Mar 20, 2021

Hello

Do you know of a way to remove duplicate rows in Excel? I mean, I want to remove all instances of duplicates, not just all of the subsequent ones.

So, if I have these values:

One
One
One
Two
Three
Three
Four

Then I want to end up with:

Two
Four

Thanks
Samuel


 
Joakim Braun
Joakim Braun  Identity Verified
Sweden
Local time: 22:54
German to Swedish
+ ...
Advanced filtering Mar 20, 2021

Advanced data filtering or the UNIQUE() function looks like the way to go.
(Haven't tried it)


 
Kay-Viktor Stegemann
Kay-Viktor Stegemann
Germany
Local time: 22:54
English to German
In memoriam
This might be helpful Mar 20, 2021

This looks like what you are looking for. The standard "unique" functions won't help you since they keep one of the duplicates.
https://www.extendoffice.com/documents/excel/2900-excel-remove-both-duplicates.html


bris97
 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 22:54
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
I spent Mar 20, 2021

I spent an interesting hour last night writing an AutoIt script to do it, and this morning I thought: wait a second... shouldn't Excel be able to do this?

I can mark all duplicates in a certain colour (Home > Styles > Conditional Formatting > Highlight Cell Rules > Duplicate Values), and then add a column with the row number in it, and then sort by that colour, and then sort by the row number column again... but I was hoping for a few-steps-less solution.

Data > Data To
... See more
I spent an interesting hour last night writing an AutoIt script to do it, and this morning I thought: wait a second... shouldn't Excel be able to do this?

I can mark all duplicates in a certain colour (Home > Styles > Conditional Formatting > Highlight Cell Rules > Duplicate Values), and then add a column with the row number in it, and then sort by that colour, and then sort by the row number column again... but I was hoping for a few-steps-less solution.

Data > Data Tools > Remove Duplicates removes all but one instance.
Collapse


 
Joakim Braun
Joakim Braun  Identity Verified
Sweden
Local time: 22:54
German to Swedish
+ ...
Outline Mar 20, 2021

So you have a column of values that may or may not be unique, and want to remove all values that occur more than once.

1. Use the filter creating a separate column 2 of all unique values
2. Add another column 3, "occurrences"
3. Use some counting function (COUNTIF()?) to count the occurrences in column 1 of each of the unique values in column 2, and put the result in column 3. Each of the unique values then has the number of occurrences next to it.
4. Filte
... See more
So you have a column of values that may or may not be unique, and want to remove all values that occur more than once.

1. Use the filter creating a separate column 2 of all unique values
2. Add another column 3, "occurrences"
3. Use some counting function (COUNTIF()?) to count the occurrences in column 1 of each of the unique values in column 2, and put the result in column 3. Each of the unique values then has the number of occurrences next to it.
4. Filter column 2 where column 3 > 1 (or copy-paste columns 2-3 by value, then sort by number of occurrences and remove ">1" rows manually)
Collapse


 
Narcis Lozano Drago
Narcis Lozano Drago  Identity Verified
Spain
Local time: 22:54
Member (2007)
English to Spanish
+ ...
Solution Mar 20, 2021

Assuming A is the column with the text to sort:
1) Create 2 more columns B, C
2) In C, create a sequence of numbers 1, 2, 3... in case you need to go back to the original order.
3) Sort A
4) In B, starting in B2 write:
=OR(EXACT(A1;A2);EXACT(A2;A3))
Expand for all the column B
5) Filter B by FALSE
6) Reorder using C

Cheers,

Narcis


 
mikhailo
mikhailo
Local time: 00:54
English to Russian
+ ...
re Mar 20, 2021

Data tab -> Remove diuplicate

 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 22:54
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
@Mikhailo Mar 21, 2021

mikhailo wrote:
Data tab -> Remove duplicate


Thanks, but that option doesn't remove all instances of all duplicates -- only subsequent instances.

If I have this in the column:

One
One
One
Two
Three
Three
Four

then that option changes it into:

One
Two
Three
Four

but I want it to be changed into

Two
Four

because "One" and "Three" are/have duplicates.


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 22:54
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
@Narcis Mar 21, 2021

Narcis Lozano Drago wrote:
=OR(EXACT(A1;A2);EXACT(A2;A3))


Thanks, I'll investigate this further, but for now, my Excel (365) does not accept that formula.


 
Narcis Lozano Drago
Narcis Lozano Drago  Identity Verified
Spain
Local time: 22:54
Member (2007)
English to Spanish
+ ...
EXACT Mar 21, 2021

Samuel Murray wrote:

Narcis Lozano Drago wrote:
=OR(EXACT(A1;A2);EXACT(A2;A3))


Thanks, I'll investigate this further, but for now, my Excel (365) does not accept that formula.


Basically it means you have to check whether the current cell is the same as the previous one OR the same as the next one.

I was using the Spanish version of Excel, and although I checked for the English equivalent, for some reason
apparently English uses commas instead of semicolons. Therefore it may be:
=OR(EXACT(A1,A2),EXACT(A2,A3))
or if you have another language version, you may need to find the equivalent for your language.

I tested the algorithm, and it works.

Kind regards,

Narcis


 
Rolf Keller
Rolf Keller
Germany
Local time: 22:54
English to German
To be (equal) or not to be (equal) Mar 21, 2021

Most solutions will fail if the equality is fuzzy, e. g. because there is a superflous space or a typo in one of two "identical" cells.

And think of upper vs. lower case.


 
Dan Lucas
Dan Lucas  Identity Verified
United Kingdom
Local time: 21:54
Member (2014)
Japanese to English
Maybe VBA? Mar 21, 2021

Samuel Murray wrote:
Do you know of a way to remove duplicate rows in Excel?

Hacky and non-performant, but probably works. Just select cells in the range you want to check. And be careful. EDIT It assumes the data you are comparing is in column 1, as you can see. So select (say) A1 to A200 and see how it goes. If there's a lot of data you'd want to uncomment the ScreenupDating calls, and comment out any debug prints.

Code:

Sub DeDupRows()
mycol = 1
'Application.ScreenUpdating = False
For Each c In Selection.Cells
start_row = c.Row
last_row = start_row
base_val = c.Value
Dim delete_rows() As Variant
'Debug.Print "Start of For loop: " & start_row & ", base val: " & base_val
ReDim Preserve delete_rows(0)
delete_rows(0) = c.Row
isFirst = 1
Do While Cells(last_row, mycol) ""
check_val = Cells(last_row, mycol).Value
'Debug.Print "Check val: " & check_val
If check_val = base_val And isFirst 1 Then
'Debug.Print "Found match, Base value: " & base_val & ", Current value: " & check_val & " (row = " & last_row & ")"
'Debug.Print "Ubound delete_rows: " & UBound(delete_rows)
ReDim Preserve delete_rows(UBound(delete_rows) + 1)
delete_rows(UBound(delete_rows)) = last_row
End If
'Debug.Print "check"
isFirst = 0
last_row = last_row + 1

Loop

If UBound(delete_rows) > 0 Then
For i = UBound(delete_rows) To 0 Step -1
'Debug.Print "Delete row " & delete_rows(i)
Rows(delete_rows(i)).Delete
Next i
End If
'Exit Sub
Next c
'Application.ScreenUpdating = True
End Sub




[Edited at 2021-03-21 13:29 GMT]


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 22:54
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
@Rolf Mar 21, 2021

Rolf Keller wrote:
Most solutions will fail if the equality is fuzzy, e. g. because there is a superflous space or a typo in one of two "identical" cells. And think of upper vs. lower case.


True, but in my case a case-sensitive exact match is perfectly adequate.

The thing is, I have a client for whom I proofread translations, but the client's portal can only provide the translations in a simple XLIFF format, which includes the source text of all segments plus the translations of translated segments, and no segments statuses, date/time, user ID etc. And the segments do not get translated in the same sequence as they appear in the downloaded XLIFF file, and using a standard DIFF utility doesn't work due to different segment orders each day. (Some web-based CAT tool's programmer didn't think further than his nose.)

The translator keeps on translating every day, and every day I must proofread the most recently translated segments, and report on the edits that need to be made (the edits aren't made in XLIFF).

So, I need a way to extract only the segments that were translated since I downloaded the previous XLIFF file. I do this as follows: download the latest version of the XLIFF file, convert the segments to a simple one-column format, add it together with all the previous days' segments, remove source=target segments (i.e. untranslated segments), and then... then I remove any segment that occurs more than once. This leaves me with only the newly translated segments.

Like I said, I slapped something together in AutoIt that does it, but I was just wondering if there is a simple way of doing this in Excel (and by "simple" I mean something that may need a bit of work setting up, but only a few steps when actually using, so... a macro or a template with pre-filled formulas would also fit the bill).

[Edited at 2021-03-21 14:22 GMT]


 
Daniel Frisano
Daniel Frisano  Identity Verified
Italy
Local time: 22:54
Member (2008)
English to Italian
+ ...
Here's how I'd do it Mar 21, 2021

Say your values are in column A.

1. In column B put =COUNTIF(A:A;A1), =COUNTIF(A:A;A2), ...

This is what you get:
Untitled-2

2. Sort everything by column B smallest to largest
3. Scroll down to the last "1" in column B and delete all rows below
4. Delete column B

The order should be maintained.

It's similar to your solutions with colours. It takes about 1 minute and just 4 steps. Hard to beat, I'd say.



[Edited at 2021-03-21 21:22 GMT]


 
Daniel Frisano
Daniel Frisano  Identity Verified
Italy
Local time: 22:54
Member (2008)
English to Italian
+ ...
  Mar 21, 2021

Well if you don't mind extra rows/columns it can be done in TWO steps:

1. In column A put =COUNTIF(A:A;A1), =COUNTIF(A:A;A2), ...
2. In column B put =IF(B1=1;A1;""), =IF(B2=1;A2;""), ...

And that's it.

Untitled-3


 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Remove duplicate rows in Excel






Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »