CAT tools: I'm doing some experiments, advice needed.
Thread poster: phrasin
Jul 11, 2011

CAT tools: I'm doing some experiments, advice needed.

Hello everyone,
Lately I came across the colorful world of computer assisted translation tools, and I couldn't help but notice that all the main players seems stuck at 1997, especially when it comes to user interface design, but also because of their classic enterprise software approach on the translation task. Even those new cloud-based solutions seems to be affected by the very same disease.

At the beginning,
... See more
CAT tools: I'm doing some experiments, advice needed.

Hello everyone,
Lately I came across the colorful world of computer assisted translation tools, and I couldn't help but notice that all the main players seems stuck at 1997, especially when it comes to user interface design, but also because of their classic enterprise software approach on the translation task. Even those new cloud-based solutions seems to be affected by the very same disease.

At the beginning, I've been quite skeptical on the real usefulness of such tools. I have looked closely at people using one well know market leader software, and I've seen a common pattern: time wasted at checking out wrong suggestions, even generated with a high fuzzy value, was often more than the time saved by the use of correct matches. It varied a lot depending on the type of source text. The main value of using that software seemed to be constituted by the workflow and terminology management features, and by the very large number of file formats and language pairs supported.

So, last week I decided to tackle the problem and build a very simple, basic, experimental web-based prototype that helps translating word documents. After a few failed attempts, I found a way to elegantly and intuitively edit documents on the web while keeping source and target language text on screen shown in its original layout, I discovered some clever ways of showing terminology suggestions and I'm trying to solve the string matching problem using massive non-relational data structures and a pattern frequency approach, instead of length-based Levenshtain distance derivative fuzzy algorithms.

Before diving completely into the code though, I'd like to hear some thoughts from real world translators, and that's why I made the following list of questions that I'm inviting you to answer. This is particularly aimed to those of you who actually use CAT tools.

Some of these questions might sound silly to you, just answer the ones you feel comfortable with. Any single answer is really appreciated.


  • Do you always get to choose which CAT tool to use? Are you being asked by your clients to use a specific CAT tool? If so, are you a freelancer or do you work for a larger organization?

  • How much do you think your current CAT tool speeds up your translation process as opposed to not using any CAT tool?

  • How often do you use the internet as a reference tool to understand terms or expressions you have to translate? What's the most common type of search? (google, online dictionaries/glossaries, wikipedia, others..)

  • Do you consider previewing the final result, in its original layout, an important feature in the overall translation task?

  • Do you make use of machine translated content? If so, in what percentage does it contribute to your translations?

  • What do you think is the main advantage of using a translation tool: enhancement in accuracy or a faster translation speed?

  • How long did it take to gain acceptable proficiency at producing documents with your current CAT tool?

  • How many different translation memories do you use for any single client? Without considering Translation memories that you are sharing with other translators cooperating in the same project, do you keep separate TMs for the same client?

  • Do you make use of standardized segmentation rule definitions such as SRX?

  • It's understandable that text in target language might be very different in length from the source text. Do you make layout adjustments on your final results? how often?

  • If you could lower, or even reset to zero, the price of your favorite CAT tool, in exchange for bits of your translation memory to be included in a 'collective' memory, would you consider the option? How about if you'd get to choose which segments to share, as to avoid privacy concerns?


    Disclaimer: I'm an engineer, not a translator, if you find errors in the text above, feel free to correct me
    Please excuse me if I sounded a bit arrogant in the introduction, I know existing professional CAT tools involved deep industry knowledge, many skilled resources and years of fine tuning, I just want to see if I can pull out a functioning toy. ▲ Collapse


  •  
    Michael Beijer
    Michael Beijer  Identity Verified
    United Kingdom
    Local time: 23:07
    Member (2009)
    Dutch to English
    + ...
    @phrasin Jul 11, 2011

    You wouldn't happen to be the person who made: http://phras.in/ would you? I love this site and use it all of the time.

    I am extremely busy at the moment, but will get back and answer your questions ASAP.

    Michael


     
    Antoní­n Otáhal
    Antoní­n Otáhal
    Local time: 00:07
    Member (2005)
    English to Czech
    + ...
    a few replies Jul 11, 2011

    Hi,

    I do find your attitude arrogant, but I do not think it is such a bad thing.

    phrasin wrote:

  • Do you always get to choose which CAT tool to use? Are you being asked by your clients to use a specific CAT tool? If so, are you a freelancer or do you work for a larger organization?


  • Mostly I do. Some clients do want specific CAT tools, but ofr some time I hav eonly been owrking for those who choose the ones I like. I am a freelancer.

    phrasin wrote:
  • How much do you think your current CAT tool speeds up your translation process as opposed to not using any CAT tool?


  • Very much. No quantification offered since it would not make much sense anyway.



    phrasin wrote:
  • How often do you use the internet as a reference tool to understand terms or expressions you have to translate? What's the most common type of search? (google, online dictionaries/glossaries, wikipedia, others..)


  • Daily; usually Google to check on "normal usage".

    phrasin wrote:
  • Do you consider previewing the final result, in its original layout, an important feature in the overall translation task?


  • It depends on the character of the job. But usually, yes.

    phrasin wrote:
  • Do you make use of machine translated content? If so, in what percentage does it contribute to your translations?


  • Never.

    phrasin wrote:
  • What do you think is the main advantage of using a translation tool: enhancement in accuracy or a faster translation speed?


  • Neither. The main point is the unified eenvironment for my work.

    phrasin wrote:
  • How long did it take to gain acceptable proficiency at producing documents with your current CAT tool?


  • Hard to say. A month, maybe?

    phrasin wrote:
  • How many different translation memories do you use for any single client? Without considering Translation memories that you are sharing with other translators cooperating in the same project, do you keep separate TMs for the same client?


  • In Transit, which is my most preferrred tool, your question lacks real sense (old translations themselves are used as reference material, more or less). But in principle, I tend to classify and utilise reference material more by topic than by customer.

    phrasin wrote:
  • Do you make use of standardized segmentation rule definitions such as SRX?


  • No.

    phrasin wrote:
  • It's understandable that text in target language might be very different in length from the source text. Do you make layout adjustments on your final results? how often?


  • Again, are we talking about translating a book of fiction or localising a software tool? The reply very much depends on that.

    phrasin wrote:
  • If you could lower, or even reset to zero, the price of your favorite CAT tool, in exchange for bits of your translation memory to be included in a 'collective' memory, would you consider the option? How about if you'd get to choose which segments to share, as to avoid privacy concerns?



  • I do not like the idea of such a trade-off at all. I do not think it would be very useful anyway.


    I wish you pleasant experimenting. My advice: you should try doing some translating (and not just a little) yourself if you really want to achieve anything in the direction you project.

    Antonin


     
    Tomás Cano Binder, BA, CT
    Tomás Cano Binder, BA, CT  Identity Verified
    Spain
    Local time: 00:07
    Member (2005)
    English to Spanish
    + ...
    My two cents Jul 11, 2011

    phrasin wrote:
  • Do you always get to choose which CAT tool to use? Are you being asked by your clients to use a specific CAT tool? If so, are you a freelancer or do you work for a larger organization?

  • I am an independent professional. I try to have the customers' tool just for the interface with them, but internally I use memoQ for everything.

    phrasin wrote:
  • How much do you think your current CAT tool speeds up your translation process as opposed to not using any CAT tool?

  • Approximately 30%, or more if I have a bigger memory.

    phrasin wrote:
  • How often do you use the internet as a reference tool to understand terms or expressions you have to translate? What's the most common type of search? (google, online dictionaries/glossaries, wikipedia, others..)

  • All the time. All kinds of resources.

    phrasin wrote:
  • Do you consider previewing the final result, in its original layout, an important feature in the overall translation task?

  • Absolutely. It helps a lot if available. However, it does not make sense in some formats (like string files, XML files, etc.).

    phrasin wrote:
  • Do you make use of machine translated content? If so, in what percentage does it contribute to your translations?

  • Never, and think I will never do. The results are appalling.

    phrasin wrote:
  • What do you think is the main advantage of using a translation tool: enhancement in accuracy or a faster translation speed?

  • Higher consistency, better end quality, better control and use of customer preferences, more speed when reusing full segments or chunks of them, accurate control of the terminology.

    phrasin wrote:
  • How long did it take to gain acceptable proficiency at producing documents with your current CAT tool?

  • Approximately 3-4 days.

    phrasin wrote:
  • How many different translation memories do you use for any single client? Without considering Translation memories that you are sharing with other translators cooperating in the same project, do you keep separate TMs for the same client?

  • One memory per customer and, if they are agencies, one memory per end customer. If the same end customer has several unconnected division, then also one memory per division. I think it makes sense to keep things separate in a business in which you are always working for companies and their competitors.

    phrasin wrote:
  • Do you make use of standardized segmentation rule definitions such as SRX?

  • No.

    phrasin wrote:
  • It's understandable that text in target language might be very different in length from the source text. Do you make layout adjustments on your final results? how often?

  • Yes. Usually in all jobs unless specifically requested not to.

    phrasin wrote:
  • If you could lower, or even reset to zero, the price of your favorite CAT tool, in exchange for bits of your translation memory to be included in a 'collective' memory, would you consider the option? How about if you'd get to choose which segments to share, as to avoid privacy concerns?

  • I would never share anything I translate with anyone other than the corresponding customer. Shared memories are nothing but a violation of our customers' privacy and plain bad for business if you ask me.

    Why on earth would I help my competitors (other translators) benefit from my hard-work researching things and trying to make good translations? My knowledge, experience, and ability to translate to my customers' satisfaction are my main assets and I would not give them to anyone for free.


     
    Anton Konashenok
    Anton Konashenok  Identity Verified
    Czech Republic
    Local time: 00:07
    French to English
    + ...
    My answers Jul 11, 2011

    Do you always get to choose which CAT tool to use? Are you being asked by your clients to use a specific CAT tool? If so, are you a freelancer or do you work for a larger organization?

    I am self-employed and in most cases I choose what I like. However, one of my biggest clients often forces me to use another tool, and I am seriously considering raising my rates by 15-20% for its use because of a buggy user interface, leading to lost production time.
    As a onetime software developer heavily involved in user interface design, I have fairly strong feelings about interface ergonomics, and actually plan to write a paper about the beautiful and ugly features of existing and future CAT tool interfaces.
    How much do you think your current CAT tool speeds up your translation process as opposed to not using any CAT tool?

    15-25% even without a TM, due to the segmentation feature alone. With a TM, anywhere from 15 to 1000%.
    How often do you use the internet as a reference tool to understand terms or expressions you have to translate? What's the most common type of search? (google, online dictionaries/glossaries, wikipedia, others..)

    I don't translate texts I don't understand; however, I use Internet for research all the time, unless I am working on the road - mostly search engines and Wikipedia via search engines. As to dictionaries, I strongly prefer offline ones.
    Do you consider previewing the final result, in its original layout, an important feature in the overall translation task?

    I preview it in the native software, never in the CAT tool.
    Do you make use of machine translated content? If so, in what percentage does it contribute to your translations?

    Never, ever. With the kind of texts I translate, editing an MT takes longer than translating the text from scratch.
    What do you think is the main advantage of using a translation tool: enhancement in accuracy or a faster translation speed?

    Enhanced consistency.
    How long did it take to gain acceptable proficiency at producing documents with your current CAT tool?

    About one hour. Speaking of interface ergonomics, most tasks should be self-evident.
    How many different translation memories do you use for any single client? Without considering Translation memories that you are sharing with other translators cooperating in the same project, do you keep separate TMs for the same client?

    For each language pair I work in, I have a catch-all TM for all (or most) clients, plus separate TMs for large projects (by the way, sometimes the same project comes to me via different clients), plus sometimes a separate TM for "garbage" I don't want in my main database.
    Do you make use of standardized segmentation rule definitions such as SRX?

    Not yet.
    It's understandable that text in target language might be very different in length from the source text. Do you make layout adjustments on your final results? how often?

    Only for some jobs, maybe 10%, where the client explicitly requests it or a spontaneous reflow produces poor results.
    If you could lower, or even reset to zero, the price of your favorite CAT tool, in exchange for bits of your translation memory to be included in a 'collective' memory, would you consider the option? How about if you'd get to choose which segments to share, as to avoid privacy concerns?

    No way, and not because I am so conscious about intellectual property rights. I will gladly use terminological glossaries provided by the clients, but I hardly ever use their TMs. Unless you work on a big project managed by a very experienced chief translator vetting and editing all TM entries, using someone else's TM may do more harm than good. Sharing a TM is like sharing a toothbrush. By the way, if a client requires me to use an Internet-based TM, I will either refuse a job or impose a 50% surcharge.
    Having said that, I'll partially reverse myself by saying I may take some time to produce a special TM to be shared with everyone for educational purposes. However, I'll probably want to assert my authorship of it.


     
    Alex Lago
    Alex Lago  Identity Verified
    Spain
    Local time: 00:07
    English to Spanish
    + ...
    Still room for new players Jul 11, 2011

    Glad to hear people are still interested in developing new CAT tools I also fell CAT tools user interfaces have not kept up with the times, also many of them are far too resource intense and take up a lot of memory and can be slow with large projects are large TMs, better processing power use is needed, certainly none of them are written for dual or quad core processing.

    Getting into your questions:
    Do you always get to choose which CAT tool to use? Are you being asked by your clients to use a specific CAT tool? If so, are you a freelancer or do you work for a larger organization

    No many agencies have a preferred CAT tool you have to use if you work with them

    How much do you think your current CAT tool speeds up your translation process as opposed to not using any CAT tool?

    Anywhere between 10 to 40% depending on the job

    How often do you use the internet as a reference tool to understand terms or expressions you have to translate? What's the most common type of search? (google, online dictionaries/glossaries, wikipedia, others..)

    All the time and I use all of those

    Do you consider previewing the final result, in its original layout, an important feature in the overall translation task?

    It's useful with some file formats not with others

    Do you make use of machine translated content? If so, in what percentage does it contribute to your translations?

    I use it to reduce time, I would say I save about 25% of my time using it.

    What do you think is the main advantage of using a translation tool: enhancement in accuracy or a faster translation speed?

    First of all I think the best thing is having a structured document, seeing the target and source on one screen, many times your TM is useless but having a structured workspace is a great help.

    How long did it take to gain acceptable proficiency at producing documents with your current CAT tool?

    About 1 week

    How many different translation memories do you use for any single client? Without considering Translation memories that you are sharing with other translators cooperating in the same project, do you keep separate TMs for the same client?

    I work with two memories in each job, one which has data from all my jobs and one that is client specific

    Do you make use of standardized segmentation rule definitions such as SRX?

    Yes

    It's understandable that text in target language might be very different in length from the source text. Do you make layout adjustments on your final results? how often?

    This really depends on the document and any space constraints, sometimes I got months without doing it and then you have 3 jobs in a row where you have to do it.

    If you could lower, or even reset to zero, the price of your favorite CAT tool, in exchange for bits of your translation memory to be included in a 'collective' memory, would you consider the option? How about if you'd get to choose which segments to share, as to avoid privacy concerns?

    That sounds like a great idea if I get to choose the segments to share, however I don't see how would you then profit from the TM you created, bear in mind that the quality of the segments would be difficult to guarantee with hundreds/thousands of translators of varying skills submitting segments.


     
    phrasin
    phrasin
    TOPIC STARTER
    I really appreciate your answers Jul 11, 2011

    @Michael J.W. Beijer: I'm the very same. Phras.in is just a little toy I put togheter one rainy afternoon, it's not 100% accurate and should be taken with a grain of salt. There are a couple improvements in the pipeline yet to be made (namely, language detection and therefore localized search, and something that shows results from reputable sources on top of the list)

    @Antoní­n Otáhal: Thanks for your answers. I'm already trying to do some translations, also using existing tools,
    ... See more
    @Michael J.W. Beijer: I'm the very same. Phras.in is just a little toy I put togheter one rainy afternoon, it's not 100% accurate and should be taken with a grain of salt. There are a couple improvements in the pipeline yet to be made (namely, language detection and therefore localized search, and something that shows results from reputable sources on top of the list)

    @Antoní­n Otáhal: Thanks for your answers. I'm already trying to do some translations, also using existing tools, but sometimes it's hard to get a view at the whole spectrum of all the different situations a translator may encounter. Your answer on the advantages of using a CAT tool is really interesting, I understand what you mean.

    @Tomás Cano Binder: I see how you strongly feel about not sharing your translation memory, and your reply just confirms my thoughts on the subject. Obviously all the layout questions are referring to a classic word document translation, which now I believe it's a much less common type of job then I previously thought.

    @Anton Konashenok : I'd be interested to know your thoughts about good and bad trends in CAT tools user experience and interface design, if you'll ever write a paper on the subject, I'd be curious to have a look. I think it's interesting that you're much more willing to rely on terminological glossaries provided by your clients rather than using their TMs.

    @Alex Lago: Thanks for your answers, You confirm the view of CAT tools as structured, unified environment. I'm a bit surprised by your answer about machine translation, do you employ any special precautions to avoid excessive reworking?


    Thanks to everyone who replied, I can see some common trends, but it's still early to draw conclusions, please keep them coming.
    Collapse


     
    lobel (X)
    lobel (X)
    English to Dutch
    My two cents Jul 12, 2011

    phrasin wrote:
  • Do you always get to choose which CAT tool to use? Are you being asked by your clients to use a specific CAT tool? If so, are you a freelancer or do you work for a larger organization?

  • I use Wordfast, in both versions (Word plugin and independent java-based tool).

    phrasin wrote:
  • How much do you think your current CAT tool speeds up your translation process as opposed to not using any CAT tool?

  • About 30% overall. As most of my clients don't ask for it, the use of my CAT-tool is generally beneficiary.

    phrasin wrote:
  • How often do you use the internet as a reference tool to understand terms or expressions you have to translate? What's the most common type of search? (google, online dictionaries/glossaries, wikipedia, others..)

  • as Tomas, all the time. All kinds of resources. Most of my resources are not online (standard dictionaries on CD, I loaded into my pc).

    phrasin wrote:
  • Do you consider previewing the final result, in its original layout, an important feature in the overall translation task?

  • Yes, but proofreading the output in the original software is crucial.

    phrasin wrote:
  • Do you make use of machine translated content? If so, in what percentage does it contribute to your translations?

  • No, never, and don't want to.

    phrasin wrote:
  • What do you think is the main advantage of using a translation tool: enhancement in accuracy or a faster translation speed?

  • Same as Tomas.

    phrasin wrote:
  • How long did it take to gain acceptable proficiency at producing documents with your current CAT tool?

  • I delivered my first job after two hours of selftraining, and it took me a week or so to obtain proficiency.

    phrasin wrote:
  • How many different translation memories do you use for any single client? Without considering Translation memories that you are sharing with other translators cooperating in the same project, do you keep separate TMs for the same client?

  • One memory per end client. I have two language pairs, in most cases they are mixed up in the same memory, but in some cases I separated them and have now two memories per end client.

    phrasin wrote:
  • Do you make use of standardized segmentation rule definitions such as SRX?

  • No. Don't know what this is.

    phrasin wrote:
  • It's understandable that text in target language might be very different in length from the source text. Do you make layout adjustments on your final results? how often?

  • Yes always. Especially for Powerpoint files. I am asked to deliver files which are ready to be used, in the original software, without errors of whatever kind. The only exception is when the source text comes in PDF, then I deliver a Word file.

    phrasin wrote:
  • If you could lower, or even reset to zero, the price of your favorite CAT tool, in exchange for bits of your translation memory to be included in a 'collective' memory, would you consider the option? How about if you'd get to choose which segments to share, as to avoid privacy concerns?

  • Don't understand this question. I can only share the memory with my client.


     
    FarkasAndras
    FarkasAndras  Identity Verified
    Local time: 00:07
    English to Hungarian
    + ...
    Answers Jul 12, 2011

    phrasin wrote:

  • Do you always get to choose which CAT tool to use? Are you being asked by your clients to use a specific CAT tool? If so, are you a freelancer or do you work for a larger organization?


  • No, some clients ask for a specific CAT tool, some of them with good reason. I'm a freelancer.

    phrasin wrote:

  • How much do you think your current CAT tool speeds up your translation process as opposed to not using any CAT tool?


  • With some jobs, several hundred percent. With other jobs, not measurably.

    phrasin wrote:

  • How often do you use the internet as a reference tool to understand terms or expressions you have to translate? What's the most common type of search? (google, online dictionaries/glossaries, wikipedia, others..)

  • Once every 15 minutes. I just Google stuff and go from there, there is not much point in going to one specific site. Online dictionaries are very unlikely to contain much of use, and glossaries... well, good online glossaries I've found are already in my termbases.

    phrasin wrote:

  • Do you consider previewing the final result, in its original layout, an important feature in the overall translation task?


  • Of course.

    phrasin wrote:

  • Do you make use of machine translated content? If so, in what percentage does it contribute to your translations?


  • No and I don't plan to, ever.

    phrasin wrote:

  • What do you think is the main advantage of using a translation tool: enhancement in accuracy or a faster translation speed?


  • Convenient lookup (TMs and terminology) less typing to do, repetitions inserted automatically. I.e. both, and convenience.

    phrasin wrote:

  • How long did it take to gain acceptable proficiency at producing documents with your current CAT tool?


  • One week, say.

    phrasin wrote:

  • How many different translation memories do you use for any single client? Without considering Translation memories that you are sharing with other translators cooperating in the same project, do you keep separate TMs for the same client?


  • Define "client". For an end client, usually one unless they have wildly differing projects. For an agency, as many as there are end clients.

    phrasin wrote:

  • Do you make use of standardized segmentation rule definitions such as SRX?


  • No. Maybe my CATs do internally, I have no idea. Translators don't mess around with that stuff.

    phrasin wrote:

  • It's understandable that text in target language might be very different in length from the source text. Do you make layout adjustments on your final results? how often?


  • Yes, maybe in a bit less than half the documents I translate.

    phrasin wrote:

  • If you could lower, or even reset to zero, the price of your favorite CAT tool, in exchange for bits of your translation memory to be included in a 'collective' memory, would you consider the option? How about if you'd get to choose which segments to share, as to avoid privacy concerns?


  • Not really. I have no problem becoming part of the hive mind in principle, but getting permission from the client is too much trouble to be worth it.


    If you're giving this a serious shot, remember that:
    - Your main problem will probably be input formats. Supporting .doc is a must, but it's impossible to do properly. See "Formatting is gone in TWB" and "Can't generate target text" threads, as well as "Track changes is on" etc. You'll have to do your best to clean up the mess made by MS and document authors. You'll also need to support a wide range of input formats (and bilingual formats) if you want to get anywhere with this.
    - Translators don't know jack about computers. They really don't. Make it easy for them.
    - Support TMX, TBX and XLS as TM/terminology import/export formats.


     
    MikeTrans
    MikeTrans
    Germany
    Local time: 00:07
    Italian to German
    + ...
    Translation Memory in a misunderstanded feature of a CAT system Jul 12, 2011

    Dear Phrasin,

    I'm satisfied a developer like you is making his inquiry for a CAT system.
    Other than going through all your questions (I'm just too lazy with all these quotes), let me tell you this about CAT tools:

    I could have all the Translation Memories in this world, and still this would not change very much my translation performance as for finishing my work earlier.
    When I receive a document that contains 500 sentences to translate, maybe there will be
    ... See more
    Dear Phrasin,

    I'm satisfied a developer like you is making his inquiry for a CAT system.
    Other than going through all your questions (I'm just too lazy with all these quotes), let me tell you this about CAT tools:

    I could have all the Translation Memories in this world, and still this would not change very much my translation performance as for finishing my work earlier.
    When I receive a document that contains 500 sentences to translate, maybe there will be 10 segments that give me a 100% match and other 20 matches with 70+%. This doesn't count for me, I won't get excited, also because my Outsourcers are so intelligent to stripe out such repetitions before sending me a translation (yes, they also have their TMs filled to the top!). Fuzzy Matching is just an old hat.

    Big databases are however good to see some phrases in context both in the source and in the target language (we call that Concordance). This is a tool that makes sense, but retrieving such information costs you time, you're not working faster. And this applies for a lot of useful features that could be included in a CAT tool: They would be useful, but not saving you time, nor should they save you that time.

    Terminology databases, dictionaries, (both in CAT tool and online), online research:
    I throw a very thick line between all these, because their purposes are very different.
    A Terminology database is just a list filled with words or phrases and their equivalents together with their translations. Their only purpose is to get them quicky in the text when you translate. The quicker, the better. Some CAT tools have added some very interesting features lately in order to accomplish this. For other purposes, these databases are useless, especially when you translate another topic later on. You then have to change and switch to other databases.
    Dictionaries: These are the starting point of your research for an exact translation. Note that I've said a starting point, not a tool that can be blindly followed. I never want to use them or even to see them in a CAT tool, also just for avoiding to blindly copy and take what they propose.
    Online search: My most useful online search is not Wikipedia, IATE, Eurodic, etc... but a simple search in Google that look like this:
    +"prio-" +benzol +der +die -nitrate
    I've made a lot of different scripts to automate such searches.
    This would allow me to see what I search in a larger text, in the language I want. The author and the font of such a link would be very important too, much more than the terms I could retrieve with the search.

    Personally I work with several CAT tools, yet trying to reduce my work on a single one, but there are times when I don't use them at all: I'm voicing-over my translation and do the polish after. This polishing has not to be quick, but very high in quality; and this quality has nothing to do with the way I'm working in a CAT tool, although it may contain useful Quality Assurance features.

    When I find the time I will go with your questions above.

    Greets,
    Mike

    [Edited at 2011-07-13 10:37 GMT]
    Collapse


     
    Michael Beijer
    Michael Beijer  Identity Verified
    United Kingdom
    Local time: 23:07
    Member (2009)
    Dutch to English
    + ...
    SINGLE SEARCH BOX Jul 12, 2011

    Dear Phrasin,

    Have you ever considered trying to develop some sort of SINGLE SEARCH BOX, that accesses all of the sites below, and presents the (bilingual) results in a handy two-column page with contextual info, etc.

    ~ ~ ~ ~ ~ ~ ~ ~ ~

    1. (your own) http://phras.in/ site
    2. IntelliWebSear
    ... See more
    Dear Phrasin,

    Have you ever considered trying to develop some sort of SINGLE SEARCH BOX, that accesses all of the sites below, and presents the (bilingual) results in a handy two-column page with contextual info, etc.

    ~ ~ ~ ~ ~ ~ ~ ~ ~

    1. (your own) http://phras.in/ site
    2. IntelliWebSearch (http://www.intelliwebsearch.com/index.asp)
    3. http://www.webitext.com/bin/webitext.cgi
    4. http://www.proz.com/search/
    5. http://eur-lex.europa.eu/RECH_mot.do
    6. http://www.mijnwoordenboek.nl/
    7. http://iate.europa.eu/iatediff/SearchByQueryEdit.do
    8. http://mymemory.translated.net/
    7. http://www.websters-online-dictionary.org/
    9. http://translatorscafe.com/tcterms/EN/
    10. http://www.tecdic.com/
    11. https://www.tausdata.org/
    12. http://translate.google.com/
    13. http://www.microsofttranslator.com/
    14. http://www.globalglossary.org/
    etc.

    ~ ~ ~ ~ ~ ~ ~ ~ ~

    As far as translating is concerned, I am more than happy with memoQ.
    However, I would very much like to have access to one single search box (in a Chrome tab), where I could instantly search/scrape/dig through ALL of the various online databases I consult on a daily basis, via all of their different interfaces.

    Michael
    Collapse


     
    Samuel Murray
    Samuel Murray  Identity Verified
    Netherlands
    Local time: 00:07
    Member (2006)
    English to Afrikaans
    + ...
    Some answers (without having read the thread) Jul 12, 2011

    phrasin wrote:
  • Do you always get to choose which CAT tool to use? Are you being asked by your clients to use a specific CAT tool? If so, are you a freelancer or do you work for a larger organization?


  • I'm a freelancer, and I work mostly for agencies. Half of my agencies don't require CAT tools, but of those who do, generally prefer that I do it in a tool of their asking. They either tell me which tool they want me to use, or they tell me in what format they want the translation delivered (and that is a dead give-away, e.g. ttx means Trados, txml means Wordfast, etc), or they send the files and/or TM in a format that is specific to a very specific CAT tool. That said, very few (if any) of my clients insist that I use *no other tool* than the one they have in mind... as long as the file that is delivered complies with their expectations and I'm able to make use of their reference files.

  • How much do you think your current CAT tool speeds up your translation process as opposed to not using any CAT tool?


  • For texts with no repetitions and no TM matches, the CAT tool makes me about twice as fast as I would have been without it. The same applies to situations where I have to use an unfamiliar or non-preferred CAT tool -- using my preferred CAT tool is twice as fast (or faster).

  • How often do you use the internet as a reference tool to understand terms or expressions you have to translate? What's the most common type of search? (google, online dictionaries/glossaries, wikipedia, others..)


  • Usually, I first check the client's TM itself, then I check my generic subject-field TM for that field, then I check dictionaries installed on my computer, and finally I check the internet. I use Google, but I sometimes add keywords if I want a specific result (e.g. I add "wiki" if I want a Wikipedia result).

  • Do you consider previewing the final result, in its original layout, an important feature in the overall translation task?


  • Only if the client wants me to deliver it in that format.

  • Do you make use of machine translated content? ...


  • Yes, sometimes. But the main advantage of it for me is to increase typing speed -- I still change between half and 3/4 of the suggested translation anyway. I think that when speech-to-text becomes available for my target language some time in the future, I'll probably stop using MT altogether.

  • What do you think is the main advantage of using a translation tool: enhancement in accuracy or a faster translation speed?


  • [Edited, because I misread your question] Both. But even without a speed increase, the increase in consistency would still be a major advantage.

  • How long did it take to gain acceptable proficiency at producing documents with your current CAT tool?


  • Acceptable proficiency: half an hour.
    Expert-user proficiency: about two years.

  • How many different translation memories do you use for any single client? Without considering Translation memories that you are sharing with other translators cooperating in the same project, do you keep separate TMs for the same client?


  • I keep separate TMs for separate jobs, although I do add some subject-specific TMs to a large TM which I then consult for concordance searches only. I translate a lot of similar material but for clients who have very different terminological preferences, so "sharing" TMs between clients produce very little increase in productivity.

  • Do you make use of standardized segmentation rule definitions such as SRX?


  • No, that is for geeks only. I segment the text as I go along. If the source file is in a presegmented format, then obviously it has to be kept that way, although I will subsegment it temporarily (to get better matching from the TM) if the situation requires it and if I'm using a program that allows it.

  • It's understandable that text in target language might be very different in length from the source text. Do you make layout adjustments on your final results? how often?


  • If the situation seems to warrant it, I try to keep the translation the same or similar length as the source text. This applies to e.g. software localisation or PowerPoint presentations. However, for a normal document with paragraph text, I don't worry about segment length.

  • If you could lower, or even reset to zero, the price of your favorite CAT tool, in exchange for bits of your translation memory to be included in a 'collective' memory, would you consider the option? How about if you'd get to choose which segments to share, as to avoid privacy concerns?


  • No, because sharing the TM means breaching confidentiality. I consider the translation to be my property (with copyright belonging to me), but the issue of non-disclosure (whether agreed in writing or just assumed by both parties) makes it impossible to share it.



    [Edited at 2011-07-12 20:22 GMT]

    [Edited at 2011-07-12 20:29 GMT]


     
    Samuel Murray
    Samuel Murray  Identity Verified
    Netherlands
    Local time: 00:07
    Member (2006)
    English to Afrikaans
    + ...
    More comments, on your initial mail Jul 12, 2011

    phrasin wrote:
    Lately I came across the colorful world of computer assisted translation tools, and I couldn't help but notice that all the main players seems stuck at 1997, especially when it comes to user interface design, but also because of their classic enterprise software approach on the translation task. Even those new cloud-based solutions seems to be affected by the very same disease.


    What do you mean? Can you give some examples of aspects of UI design that are 1997-ish? What is meant by a "classic enterprise approach"?

    I've seen a common pattern: time wasted at checking out wrong suggestions, even generated with a high fuzzy value, was often more than the time saved by the use of correct matches.


    Time wasted checking out suggestions may also be due to design issues, e.g. if the suggestion is displayed in a location that is far away from where the translation is to be typed, or if the suggestion does not tell the user how it's previous source text differs from the current source text.

    The main value of using that software seemed to be constituted by the workflow and terminology management features, and by the very large number of file formats and language pairs supported.


    Yes. In olden days the main advantage of CAT tools were regarded as the increase in speed due to fuzzy matching of segments, but that is really only a small part of it, since many CAT tools have now matured to include other features which increase quality and productivity that does not necessarily involve fuzzy matching. Examples include easy glossary management, searching of reference materials and old translations specifically for terminology, typing assistants, and automated QA utilities.

    I discovered some clever ways of showing terminology suggestions...


    Good UI design can have a great influence on productivity, yes. Or rather, bad UI design can.

    I'm trying to solve the string matching problem using massive non-relational data structures and a pattern frequency approach, instead of length-based Levenshtain distance derivative fuzzy algorithms.


    In my line of work, better fuzzy matching would only be useful if it relates to sub-50% matching. I guess it also depends on the language combination, right? I'm quite satisfied with my current tool's fuzzy matching capabilities. It only becomes a problem if there are many, many matches for a segment (e.g. if a segment is quite long and has ten or more matches).


     
    phrasin
    phrasin
    TOPIC STARTER
    Thanks! Jul 13, 2011

    @FarkasAndras: When I say client, I mean end-client. I can see most of you use two TMs per document, one is specific for the end-client, sometimes including a ™ provided by the client, and the other one is basically a collection of all the TMs generated in your previous works on the subject. Supporting many formats, especially office files is a bit of a challenge, and it's even more challenging to do so in a web-based application. Right now, I've found a way to deal with office files that allo... See more
    @FarkasAndras: When I say client, I mean end-client. I can see most of you use two TMs per document, one is specific for the end-client, sometimes including a ™ provided by the client, and the other one is basically a collection of all the TMs generated in your previous works on the subject. Supporting many formats, especially office files is a bit of a challenge, and it's even more challenging to do so in a web-based application. Right now, I've found a way to deal with office files that allows seamless editing on the original layout, but there is still lots of unit test to do. TMX and other open TMs formats would be definitely supported if a serious application will be deployed, at the moment I'm just experimenting.
    You're absolutely right about 'making it easy', or rather, keeping it simple. Translating means taking a text and rewriting it in another language, while holding the very same meaning. it's pretty straightforward, and a translation tool should just *help* translators do that. (And not act as a 'shortcut').

    @MikeTrans: Thanks for your comment. I see your point, and I also have the feeling that translation memory matching is an overrated feature, and shouldn't be regarded sa the focal point of a CAT tool.

    @Michael J.W. Beijer: I'm not planning to make any kind of single search box thing, but I understand your request. The issue here is that I could programmatically do that only if each one of those services were exposing some type of public interface, otherwise I won't be able to access their data. I'll have a look into it but I can't promise anything. Anyway, that's a list of useful resources, some of which I didn't know, so thanks for sharing.

    @Samuel Murray: I can understand agencies requirements in terms of deliverable formats, and how it can be a constraint. I was more interested to understand if you, internally, can use the workbench of your choice, and I think you explained it well, thanks.
    To answer your second post:
    What do you mean? Can you give some examples of aspects of UI design that are 1997-ish? What is meant by a "classic enterprise approach"?

    By classic enterprise approach I mean a type of modus operandi driven by the eager of selling more copies, aiming to include as many functionalities as possible. This leads certainly to powerful software, but with time it also make developers lose focus on the core problem their application is trying to address, which in turns makes the application a confusing rubble of components, each with its own configurations and settings to do, that scares the unexperienced user and the average one as well.
    It's not easy for me to define the single aspects that makes a software feel 1997ish, it's a combination of things. For instance, the excessive number of mouse clicks needed to get things done, the need to change window or fire up functionalities for simple lookups, drop down menus on top, using remote resources through complicated remote database connections, and many others I can't list here.
    This is probably a wrong example, but here you go:
    (Money 2000) http://www.youtube.com/watch?v=QYGPbXZr0S8
    (Mint.com) http://www.youtube.com/watch?v=rK6WLHNYjwM
    These two software have been created with the very same goal, manage personal finance. Can you spot the difference? You can't really get that from this videos, but the last one isn't just 'prettier', it's definitely better from any point of view. IMHO current translation tools still feels like the first video.
    Collapse


     
    MikeTrans
    MikeTrans
    Germany
    Local time: 00:07
    Italian to German
    + ...
    Here my answers as promised... Jul 14, 2011

    phrasin wrote:


  • Do you always get to choose which CAT tool to use? Are you being asked by your clients to use a specific CAT tool? If so, are you a freelancer or do you work for a larger organization?

    I'm freelancer using several tools and have the choice to do so. Under specific conditions I will agree to use the client's CAT tool upon request.

  • How much do you think your current CAT tool speeds up your translation process as opposed to not using any CAT tool?

    With some technical texts with short sentences quite a lot; otherwise, not at all.

  • How often do you use the internet as a reference tool to understand terms or expressions you have to translate? What's the most common type of search? (google, online dictionaries/glossaries, wikipedia, others..)

    At the very beginning of my work to clarify some topic-related keys; a simple Google search well done helps a lot.

  • Do you consider previewing the final result, in its original layout, an important feature in the overall translation task?

    This is part of my job as a translator.

  • Do you make use of machine translated content? If so, in what percentage does it contribute to your translations?

    Very sparingly if at all. Years ago MT was more a joke than a feature. If I am in somewhat good mood - and not too stressed - I may give it a try just to laugh afterwards.

  • What do you think is the main advantage of using a translation tool: enhancement in accuracy or a faster translation speed?

    The accuracy is my job alone. The CAT should help me to get my translation as quick as possible in a written form.

  • How long did it take to gain acceptable proficiency at producing documents with your current CAT tool?

    30 minutes, 2 hours (with all different tools). The post-learning curve is very time-consuming in some cases.

  • How many different translation memories do you use for any single client? Without considering Translation memories that you are sharing with other translators cooperating in the same project, do you keep separate TMs for the same client?

    My personal databases are topic-related, not client-related, also for confidentiality reasons.

  • Do you make use of standardized segmentation rule definitions such as SRX?

    eeh?

  • It's understandable that text in target language might be very different in length from the source text. Do you make layout adjustments on your final results? how often?

    It's part of my job as a translator to return my work in the form (format) the client has sent to me. If my client persists in the delivery of a specific format and I cannot handle it, then I must refuse such a job. The working on the final layout is the most essential: The client doesn't care about the output of my CAT tool and that's not what I'm paid for, but instead he wants to see exactly the same layout, but with one difference: The text must have been translated. Also, an experienced translator will adapt his translation to the layout: Reading the manual of your car when you have an electric failure is different than reading the same text on a webpage having the time to go indepth (I agree that this is hard to really understand for a non-translator.)
    For the translator, the layout task is not always that simple! As a feature in a CAT tool I would *highly* enjoy plugins for any possible formats used in the edtion industry in order to be able to open the native formats and make adjustments to the layout if necessary. No CAT tool has this possibility.

  • If you could lower, or even reset to zero, the price of your favorite CAT tool, in exchange for bits of your translation memory to be included in a 'collective' memory, would you consider the option? How about if you'd get to choose which segments to share, as to avoid privacy concerns?

    Dici sul serio? Will your dentist or your medic spread your stories on the street ?




  • Please also read my (updated) post a little above. I hope this will help you.

    Greets,
    Mike


    [Edited at 2011-07-14 23:35 GMT]


     


    To report site rules violations or get help, contact a site moderator:


    You can also contact site staff by submitting a support request »

    CAT tools: I'm doing some experiments, advice needed.







    Wordfast Pro
    Translation Memory Software for Any Platform

    Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

    Buy now! »
    Protemos translation business management system
    Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

    The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

    More info »