smartCAT: match analysis
Thread poster: Chiara Foppa Pedretti
Chiara Foppa Pedretti
Chiara Foppa Pedretti  Identity Verified
Italy
Local time: 09:57
English to Italian
+ ...
Jun 13, 2017

Hello everyone,
I've just started using smartCAT (https://www.smartcat.ai/) and I'm pretty happy with its basic functions.
However, I can't find a way to run a match analysis on the texts I've uploaded.
Is it just me or is this function actually missing?

Thanks in advance!


 
Mikhail Zavidin
Mikhail Zavidin
Local time: 11:57
English to Russian
+ ...
Statistics tab Jun 13, 2017

You can try to get the analysis by clicking on Statistics tab in your project window.

Hope this helps.


 
Chiara Foppa Pedretti
Chiara Foppa Pedretti  Identity Verified
Italy
Local time: 09:57
English to Italian
+ ...
TOPIC STARTER
Yes! Jun 13, 2017

Thank you so much, Mikhail, that's it!
I'm afraid it's not totally reliable, since it counts at least 300 words less than Word does, but that's another story...
Thanks again!


 
Pavel Doronin
Pavel Doronin
Local time: 11:57
word count differencies Jun 28, 2017

Dear Chiara,
Every software has its own algorithm. Usually, the statistics are calculated following these steps:
Text extraction. Different software may or may not extract the text from footers, headers, tables of contents and embedded objects. This affects the total number of words or symbols. For example:
MS Word ignores header text, however it is included in the word count in SDL Trados and Smartcat.
MS Word does not include automatically generated page numbers in its
... See more
Dear Chiara,
Every software has its own algorithm. Usually, the statistics are calculated following these steps:
Text extraction. Different software may or may not extract the text from footers, headers, tables of contents and embedded objects. This affects the total number of words or symbols. For example:
MS Word ignores header text, however it is included in the word count in SDL Trados and Smartcat.
MS Word does not include automatically generated page numbers in its statistics, while SDL Trados does.
MS Word counts the words in the table of contents as separate words, while SDL Trados and Smartcat do not (we believe it makes sense since it’s created automatically based on the titles and subtitles which will be translated anyway, so after the translation is completed, you will just need to update the table of contents).
Text segmentation (splitting the document into sentences). This is not applicable to MS Word. Here, the approach may be different, depending on:
What is considered a “segment” — For example, a line that contains only spaces will not be seen as a segment by both Smartcat and Trados, so the spaces won’t be counted as characters. However in MS Word, they will be considered characters, and included in the statistics.
Which characters (combination of characters, line breaks) are treated as segment delimiters — this may also affect the number of TM matches (in the cases when a Trados TM is used in a Smartcat document or a Smartcat TM is used in a Trados document).
The segments-into-words splitting can also work differently in different software and even different versions of the same software, as each of them utilize different algorithms. The differences may include:
Apostrophes or slashes are not treated as word delimiters in MS Word, unlike Trados and Smartcat (“Student’s Book” counts as 3 words).
Trados 2011 does not consider digits-only segments to be containing any words, while Trados 2007 and MS Word do.
Dashes are treated as delimiters in Trados 2007, but not in the other software.
MS Word counts numbers in numbered lists as separate words, while Trados and Smartcat ignore them.
Various character sequences, such as ________ or ***** are treated as words in MS Word but are not considered to be such by Trados and Smartcat.
PowerPoint statistics are a total mess.
And the list goes on.
Matches and repetitions — if two lines are almost identical and the only difference between them is a number, a tag or a certain kind of character, they will be considered to be repeating. For TM matches it works in a similar way.


[Edited at 2017-06-28 18:00 GMT]
Collapse


 
Chiara Foppa Pedretti
Chiara Foppa Pedretti  Identity Verified
Italy
Local time: 09:57
English to Italian
+ ...
TOPIC STARTER
Interesting Jun 29, 2017

Wow Pavel, thank you for this thorough explanation.
I've never needed to use CAT tools before, so that's all new to me.
However, after a couple of weeks, I can really confirm I love working with smartCAT, maybe even more after reading your post.


 
Iris Schmerda
Iris Schmerda  Identity Verified
France
Local time: 09:57
Member (2016)
French to German
+ ...
Can't find the statistics tab Jul 6, 2017

Hello,

is it possible for me to see this match analysis if I am not the person who uploaded the texts?
The agency did it, and I would like to check the statistics, but somehow don't manage to find them.

Thank you very much.


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

smartCAT: match analysis







Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »