Pages in topic: [1 2] > | From Ms Word table to TMX file Thread poster: Hans Lenting
|
At FB in the mQ forum someone asked: I have a bilingual .docx document. Is there any way to create TM from it without necessity to create a separate source file for every language? Since he has closed the post for replies, I'm replying here, since this may be of interest for others too. So we have: And would like to get this, without any file splitting or conversion: That shouldn't be too complicated, macro-wise, would it? | | | Samuel Murray Netherlands Local time: 10:31 Member (2006) English to Afrikaans + ...
You may be able to write a single macro that does everything, but I would not be able to do that. If I were to do this, some steps would be manual. The very first thing to do, is to use find/replace to remove or convert any characters that can break the XML. For example, replace & with &, or replace < with <. Then, you have to add a column to the left, and then populate the column with numbers starting from 1. I have a macro that I can't remembe... See more You may be able to write a single macro that does everything, but I would not be able to do that. If I were to do this, some steps would be manual. The very first thing to do, is to use find/replace to remove or convert any characters that can break the XML. For example, replace & with &, or replace < with <. Then, you have to add a column to the left, and then populate the column with numbers starting from 1. I have a macro that I can't remember where I got it from, that does that (you type "1" in the first cell and then run the macro, and it adds numbers to the cells below it): Sub AddNumbersToTable() Dim RowNum As Long Dim ColNum As Long Dim iStartNum As Integer Dim J As Integer If Selection.Information(wdWithInTable) Then RowNum = Selection.Cells(1).RowIndex ColNum = Selection.Cells(1).ColumnIndex iStartNum = Val(Selection.Cells(1).Range.Text) If iStartNum 0 Then iStartNum = iStartNum + 1 For J = RowNum + 1 To ActiveDocument.Tables(1).Rows.Count ActiveDocument.Tables(1).Cell(J, ColNum).Range.Text = iStartNum iStartNum = iStartNum + 1 Next Else MsgBox "Cell doesn't contain a non-zero starting number." Exit Sub End If Else MsgBox "Not in table" End If End Sub The next step is to convert the table to tabbed text. You can do that manually, but I use a macro for that: Sub TablesConvert_to_tab() For Each aTable In ActiveDocument.Tables aTable.ConvertToText wdSeparateByTabs, True Next aTable End Sub (this macro processes all tables in the file, though) Then, you have to make sure that there is one blank line at the top of the text, and one blank line underneath the text. And then you just record a find/replace macro: Sub atable2tmx() ' ' atable2tmx Macro ' Macro recorded 8/20/2022 ' Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "^p" .Replacement.Text = "^l" .Forward = True .Wrap = wdFindContinue End With Selection.Find.Execute Replace:=wdReplaceAll Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "^p" .Replacement.Text = "^l" .Forward = True .Wrap = wdFindContinue End With Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "^l([0-9]@)^t" .Replacement.Text = "^l" .Forward = True .Wrap = wdFindContinue End With Selection.Find.Execute Replace:=wdReplaceAll Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "^t" .Replacement.Text = "" .Forward = True .Wrap = wdFindContinue End With Selection.Find.Execute Replace:=wdReplaceAll Selection.HomeKey Unit:=wdStory Selection.Delete Unit:=wdCharacter, Count:=1 Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "^l" .Replacement.Text = "^l" .Forward = True .Wrap = wdFindContinue End With Selection.Find.Execute Replace:=wdReplaceAll Selection.HomeKey Unit:=wdStory Selection.TypeText Text:="###" Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "###" .Replacement.Text = _ "^p^p" .Forward = True .Wrap = wdFindContinue End With Selection.Find.Execute Replace:=wdReplaceAll Selection.HomeKey Unit:=wdStory Selection.EndKey Unit:=wdStory Selection.TypeText Text:="###" Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "###" .Replacement.Text = "" .Forward = True .Wrap = wdFindContinue End With Selection.Find.Execute Replace:=wdReplaceAll With Selection.Find .Text = "^l" .Replacement.Text = "^p" .Forward = True .Wrap = wdFindContinue End With Selection.Find.Execute Replace:=wdReplaceAll End Sub You can also create a macro that calls other macros, so you can automate everything in separate macros and then call them all from a single macro that you run once. ▲ Collapse | | |
I've done this in the past with Heartsome TMX Editor. Bilingual table in words to TMX is a supported functionality just requiring a few clicks. I think you need to have the language initials at the top of the columns, but to confirm the formatting just export a TMX to Word to see how it looks (or use it as a template). | | | Stepan Konev Russian Federation Local time: 11:31 English to Russian Language codes | Aug 20, 2022 |
Philippe Locquet wrote: I've done this in the past with Heartsome TMX Editor. Bilingual table in words to TMX is a supported functionality just requiring a few clicks. I think you need to have the language initials at the top of the columns, but to confirm the formatting just export a TMX to Word to see how it looks (or use it as a template). You are absolutely right. All you need is just a 2-column table with a pair of language codes at the top of it (headers) and the remaining part of the table below (segments). | |
|
|
Excel Bilingual file in Trados | Aug 21, 2022 |
My easiest way is copying and pasting the Word texts into an Excel file. In Trados I set Excel file type as bilingual: first column is source and second column as target text. By setting the file type to automatically verify the translation as "translated" in Trados, the bilingual Trados file can be exported through any TM such as TMX format. This procedure is quick and transpatent. Regards, Soonthon Lupkitaro | | | Samuel Murray Netherlands Local time: 10:31 Member (2006) English to Afrikaans + ...
Hans Lenting wrote: At FB in the mQ forum someone asked: I have a bilingual .docx document. Is there any way to create TM from it without necessity to create a separate source file for every language? What does he mean by "separate source file"? | | | Samuel Murray Netherlands Local time: 10:31 Member (2006) English to Afrikaans + ...
Philippe Locquet wrote: I've done this in the past with Heartsome TMX Editor. Yes, although Heartsome doesn't include TU IDs: | | | Stepan Konev Russian Federation Local time: 11:31 English to Russian
Samuel Murray wrote: Yes, although Heartsome doesn't include TU IDs Why do you need TU IDs? When you import a tmx file into a CAT tool, all the entries get arranged accordingly and obtain their IDs and other attributes. Or probably I misunderstand something... | |
|
|
Hans Lenting Netherlands Member (2006) German to Dutch TOPIC STARTER Do not split | Aug 21, 2022 |
Samuel Murray wrote: What does he mean by "separate source file"? As far as I know, he didn't want to create 2 files: one with the left column, one with the right column. | | | Hans Lenting Netherlands Member (2006) German to Dutch TOPIC STARTER Not necessary | Aug 21, 2022 |
Samuel Murray wrote: Yes, although Heartsome doesn't include TU IDs: And I don't think these are necessary in this context. | | | Hans Lenting Netherlands Member (2006) German to Dutch TOPIC STARTER Very thorough | Aug 21, 2022 |
Your approach is very thorough. For instance: I hadn't thought of conversion of these ampersands. My initial thought was: it would be nice to have a macro to use when you're reading/proofreading an Ms Word document that contains a bilingual table with two columns and you want to create a simple TMX file from that table. Perhaps retaining the bold and italic formatting, nothing fancy. Language codes can either be asked via a prompt or set to e.g. us_US and de_DE to change them later. So, the macro should: - Copy the whole table.
- Replace bold and italic formatting and ampersands with markup.
- Replace TAB characters and NEWLINE characters with the correct strings to create valid TUs.
- Prepend the clipboard's context with the first lines of a TMX file (up to the BODY markup)
- Append the /TMX closing markup to the clipboard's context.
- Write the clipboard's context to a file with the extension '.tmx'.
[Edited at 2022-08-21 12:41 GMT] | | | Samuel Murray Netherlands Local time: 10:31 Member (2006) English to Afrikaans + ...
Hans Lenting wrote: Replace bold and italic formatting and ampersands with markup. Before you deal with formatting, you have to ask yourself what kind of a TMX file your target system will accept. If it's a fairly modern system, it should be able to handle standard TMX formatting tags, but it may be that your CAT tool has specific additional requirements, e.g. that the formatting tags must look a certain way. For example: The sentence "The cat sat on the mat." would have to end up like this: <seg>The <bpt type="b">{{b}}</bpt>cat<ept> {{/b}}</ept> sat on the <bpt type="i">{{i}} </bpt><bpt type="i">{{u}}</bpt>cat<ept> {{/u}}</ept><ept>{{/i}}</ept>.</seg> The problem is, it's easy to replace a bold character with the same character plus markup, but it's not easy to replace a set of bold characters with the same character plus markup. And I don't think any CAT tool can automatically convert this: <b>t</b><b>h</b><b>i</b><b>s</b> into this: <b>this</b> Can you think of a Find syntax in Word that would find a piece of bold text and select the entire bold text? I can't. This is because Word regex is non-greedy, so you can't tell it to select an entire piece of bold text. That said (just thinking out loud), you could tell it to replace this: </b><b> with nothing.
[Edited at 2022-08-21 13:58 GMT] | |
|
|
Hans Lenting Netherlands Member (2006) German to Dutch TOPIC STARTER
Sub TabletoTMX() Dim rngTemp As Range Dim tableTemp As Table Options.AutoFormatReplaceQuotes = False Selection.Tables(1).Select Selection.Copy Documents.Add Selection.Paste Set tableTemp = ActiveDocument.Tables(1) Set rngTemp = _ tableTemp.ConvertToText(Separator:=wdSeparateByTabs) Selection.Delete Selection.Find.ClearFormatting Se... See more Sub TabletoTMX() Dim rngTemp As Range Dim tableTemp As Table Options.AutoFormatReplaceQuotes = False Selection.Tables(1).Select Selection.Copy Documents.Add Selection.Paste Set tableTemp = ActiveDocument.Tables(1) Set rngTemp = _ tableTemp.ConvertToText(Separator:=wdSeparateByTabs) Selection.Delete Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "^p" .Replacement.Text = _ "«/seg»«/tuv»«/tu»^p«tu»«tuv xml:lang=""en-US""»«seg»" .Forward = False .Wrap = wdFindAsk .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With Selection.Find.Execute Replace:=wdReplaceAll Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "^t" .Replacement.Text = "«/seg»«/tuv»«tuv xml:lang=""nl-NL""»«seg»" .Forward = False .Wrap = wdFindAsk .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With Selection.Find.Execute Replace:=wdReplaceAll Selection.TypeText Text:="«/seg»«/tuv»«/tu»«/body»«/tmx»" Selection.HomeKey Unit:=wdStory Selection.TypeText Text:="«?xml version=""1.0"" encoding=""utf-8""?»«tmx version=""1.4""»«header»«/header»«body»«tu»«tuv xml:lang=""en-US""»«seg»" ActiveDocument.SaveAs2 FileName:="memory.tmx", FileFormat:= _ wdFormatText, LockComments:=False, Password:="", AddToRecentFiles:=True, _ WritePassword:="", ReadOnlyRecommended:=False, EmbedTrueTypeFonts:=False, _ SaveNativePictureFormat:=False, SaveFormsData:=False, SaveAsAOCELetter:= _ False, Encoding:=65001, InsertLineBreaks:=False, AllowSubstitutions:= _ False, LineEnding:=wdLFOnly End Sub
[Edited at 2022-08-22 10:52 GMT] ▲ Collapse | | | Hans Lenting Netherlands Member (2006) German to Dutch TOPIC STARTER Different approach | Aug 22, 2022 |
Since you cannot Find and Replace in the clipboard (perhaps I'm missing something?), I chose another approach, via a temporary document. See the posting above. This approach doesn't handle bold and italics, nor the ampersand. Perhaps I'll add that some day.
[Edited at 2022-08-22 11:02 GMT] | | | Hans Lenting Netherlands Member (2006) German to Dutch TOPIC STARTER
Demo: | | | Pages in topic: [1 2] > | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » From Ms Word table to TMX file Trados Business Manager Lite | Create customer quotes and invoices from within Trados Studio
Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.
More info » |
| Anycount & Translation Office 3000 | Translation Office 3000
Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |