Metastatic Tags Cancer
Thread poster: Adieu
Adieu
Adieu  Identity Verified
Ukrainian to English
+ ...
Jan 15, 2021

So, I've got a client that doesn't clean texts properly after OCR before tossing them into memoQ.

AFAIK, the system won't let us deliver documents with ignored tags, unless there's some trick I am missing? Anyway, usually it just wastes a few minutes per translation, and I've learned from being Reviewer on other translator's work that the "accepted" procedure is just piling the tags in the tail end of the segment if they are erroneous. Annoying, but so far doable.

Anyw
... See more
So, I've got a client that doesn't clean texts properly after OCR before tossing them into memoQ.

AFAIK, the system won't let us deliver documents with ignored tags, unless there's some trick I am missing? Anyway, usually it just wastes a few minutes per translation, and I've learned from being Reviewer on other translator's work that the "accepted" procedure is just piling the tags in the tail end of the segment if they are erroneous. Annoying, but so far doable.

Anyway, today I got to experience the dubious combination of horror and relief of being reviewer on a translation where that unfortunate soul had literally OVER 1,000 tags in a 2500 word job.... all of them seemingly erroneous greyscale gradations of uniformly black font badly OCR'd.

The document spent like 5 hours between "100% translated" and actual delivery, which I presume was that person moving tags around trying to get it to deliver.

What a farce....


Any solutions other than returning the document back to the translation manager with a refusal if such joy ever lands in my lap? And how many erroneous tags do you guys figure is "tolerable", where is the breaking point???
Collapse


 
gfichter
gfichter  Identity Verified
United States
Local time: 15:40
English
+ ...
Hidden tags Jan 15, 2021

When Adobe Acrobat exports to Word, it is not obvious from the viewed document, but if you look under the covers you will see that nearly every character is formatted separately. I didn't believe this when someone told me, so I unzipped the .docx file and was shocked to see all that useless formatting.
For our project the only solution was to "wash" the text through a plain text editor and rebuild the documents. That turned out not to be all that hard.
Rather than throw the files ba
... See more
When Adobe Acrobat exports to Word, it is not obvious from the viewed document, but if you look under the covers you will see that nearly every character is formatted separately. I didn't believe this when someone told me, so I unzipped the .docx file and was shocked to see all that useless formatting.
For our project the only solution was to "wash" the text through a plain text editor and rebuild the documents. That turned out not to be all that hard.
Rather than throw the files back to the client, I would recommend asking permission to recreate the files. That will save loads of time.
Good luck
George
Collapse


 
Adieu
Adieu  Identity Verified
Ukrainian to English
+ ...
TOPIC STARTER
Sadly afaik impossible Jan 15, 2021

We're locked out of the source file in server based MemoQ projects where we are in the roles of Translator or Reviewer, afaik. It is very permissions-based.

If this were me and my own file that I ran through MemoQ on my own volition, you bet I wouldn't settle for any of this nonsense.


 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 22:40
English to Russian
I wonder Jan 15, 2021

Why do you exclude returning the document back to the manager?
If you refuse it once, they will clean it next time. It takes seconds to remove tags. In 90% cases you can do it directly in MS Word by selecting all and setting the font Scale/Spacing/Position values to 100%/Normal/Normal. You don't even need any conversion to plain text for that. In some rare cases (10%), you can use free apps like codezapper or TransTools.
Obviously there is no way to remove tags unless you fix the sou
... See more
Why do you exclude returning the document back to the manager?
If you refuse it once, they will clean it next time. It takes seconds to remove tags. In 90% cases you can do it directly in MS Word by selecting all and setting the font Scale/Spacing/Position values to 100%/Normal/Normal. You don't even need any conversion to plain text for that. In some rare cases (10%), you can use free apps like codezapper or TransTools.
Obviously there is no way to remove tags unless you fix the source file.
Collapse


Platary (X)
Elena Feriani
Philippe Etienne
 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 22:40
English to Russian
No need to recreate Jan 15, 2021

gfichter wrote:
For our project the only solution was to "wash" the text through a plain text editor and rebuild the documents.

Try this next time:
1. Press Ctrl+A to select all.
2. Press Ctrl+D to invoke the font settings menu, then go to the 'Advanced' tab.
3. Select 100% for Scale, Normal for Spacing, and Normal again for Position.

This will work in most cases. If not, use https://www.translatortools.net/downloads#TransTools
The basic version (other than TransTools+) is for free. It includes the tag cleaner tool.
*The file you need to download is TransTools_Installer.exe (16367KB)

[Edited at 2021-01-15 23:43 GMT]


esperantisto
Philippe Etienne
 
Adieu
Adieu  Identity Verified
Ukrainian to English
+ ...
TOPIC STARTER
If it was THIS monstrosity of a file Jan 15, 2021

And I was the translator on it, then yes, I think I WOULD return it.

Wanted to post a pic of the sheer swamp of tags per segment, but can't find a way to post pictures here.

Anyway...DEFINITELY more tag tetris than actual translating on this one. Some segments had 20+ tags in one segment.

But it wasn't me doing the tetris, so I just left a comment that something was very messed up there in the source.

With smaller messes though... 20-30 stray
... See more
And I was the translator on it, then yes, I think I WOULD return it.

Wanted to post a pic of the sheer swamp of tags per segment, but can't find a way to post pictures here.

Anyway...DEFINITELY more tag tetris than actual translating on this one. Some segments had 20+ tags in one segment.

But it wasn't me doing the tetris, so I just left a comment that something was very messed up there in the source.

With smaller messes though... 20-30 stray tags, I'll whine about a bit hoping somebody takes note, but still do it anyway. Question is, how forgiving is too forgiving, so I don't end up in this poor soul's shoes someday?

[Edited at 2021-01-15 23:27 GMT]
Collapse


 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 22:40
English to Russian
Unlike cancer, you can easily fix it. It doesn't matter whether you have 20+ or 2000+ stray tags... Jan 15, 2021

...it takes seconds to fix it. Not much to talk about. Just tell your manager it is possible. Probably they just don't know how to do that.

 
esperantisto
esperantisto  Identity Verified
Local time: 22:40
Member (2006)
English to Russian
+ ...
SITE LOCALIZER
Some more Jan 16, 2021

Stepan Konev wrote:

Try this next time:
1. Press Ctrl+A to select all.
2. Press Ctrl+D to invoke the font settings menu, then go to the 'Advanced' tab.
3. Select 100% for Scale, Normal for Spacing, and Normal again for Position.


4. Set a single font color (automatic or whatever) for the entire text.
5. Set the language to none (and re-run spellcheck if it is a MS Word DOCX file).
6. Replace multiple spaces with one.


Philippe Etienne
Adieu
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Metastatic Tags Cancer






Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »