HalvdanN of Confex Dev
2/25/2020 - 9:27 AM

Oversettelse av dbo.titleElementsOfficial_N_S_E_tmp; offisielle titler N-S-E_splittet i enkel elementkolonne

Dette oversetter fra kildespråk til manglende språk. Man kan velge mellom å bruke googletrans eller hjemmelagde funksjoner for Google API. For senere volumoversetting av elementer i titler fra Inputmaster, må man bruke Googles nedlastede fil. For dette trenger man ågjøre et avansert oppsett på Google Cloud. Overlater dette til Jesper. Når denne filen er klar kan oversettelsesene verifiseres manuelt. Resultat av dette havner i den verifiserte titteltabellen. De kassifiserte elementene bør også over i ElementMaster med språk som tilleggsfelt. Titler vi selv har lagt inn i titteladministrasjonen i Vicky er ikke mer her per 250220, bare de offisielle tabellene fra SSB, SCB og ILO. S = Same, O = Otriginal, G = Google.

Feilmelding fra Google når antal requests er nådd:

File "C:\Users\mw10\AppData\Local\Continuum\anaconda3\lib\site-packages\google\cloud_http.py", line 423, in api_request raise exceptions.from_http_response(response)

ServiceUnavailable: 503 POST https://translation.googleapis.com/language/translate/v2: The service is unavailable at this time.

Oversett offisielle titler splittet i enkeltord.py

Google_Translate_ v2_functions.py


# -*- coding: utf-8 -*-
"""
Created on Mon Feb 24 09:57:46 2020

@author: mw10
Oversett offisielle titler splittet i enkeltord

Dette oversetter i tabellen titleElementsOfficial_N_S_E_tmp

Man velger mellom googletrans og direkte på Google Translate API.
googletrans kræsjer ofte og det er begrensning på hva man får av antall fra Google Translate. Må bytte på med metode etetrsom det krasjer


"""
import pandas as pd

import numpy as np
import pyodbc

"""
INPUT HER:
1 = googletrans, 2 = Google API    
Sett antall poster som skal kjøres
"""

groupsToTranslate = 1000            # number of records in language groups to translate
translationMethod = 2               # 1 = googletrans, 2 = Google API  


if translationMethod == 1:

   from googletrans import Translator
   translator = Translator()
else:
    
    import importlib
    importlib.import_module('Google_Translate_ v2_functions') # custom functions as backup for googletrans

    # perform check on credentials to get started using the Google API. You may loose credentials at any moment
    if not IsGoogleAPICredentials():
    
       # set the credentials. You may have to do this every so often. Notice path format with r'    
       SetGoogleAPIcredentials(r'W:\CEMM\Python\Balthazzar-62e6f82154a7.json')   # get credentials from Google Cloud



langList =["en","no","sv"]                      # from dbo.countriesLU
nLang = len(langList) + (len(langList)-1)       # add from more than length of language list in case something has upset the group system

sourceLetter = "O"                # O = Original

groupsToTranslate = range(groupsToTranslate)


conn = pyodbc.connect('Driver={SQL Server};'
                      'Server=MW-SXD0E-008;'
                      'Database=Balthazzar;'
                      'Trusted_Connection=yes;', autocommit=True)

cursor = conn.cursor()    # get all data to tranlate


sqlWIP ='SELECT TOP (?) elementID, elementGroupID, language, element, source, groupStatus, wordcnt, verified, groupTranslated FROM dbo.titleElementsOfficial_N_S_E_tmp WHERE (groupTranslated <> 1) ORDER BY elementGroupID'


df = pd.DataFrame(columns = ["elementID", "elementGroupID", "language", "element", "source", "groupStatus", "wordcnt", "verified", "groupTranslated"])


for i in groupsToTranslate:
    
    df = df[0:0]      # empty dataframe before next round
    
    cursor.execute(sqlWIP,(nLang))   # select unmatched records equal to number of languages plus slack in case something has bombed earlier
    recordGroup = cursor.fetchall()
   
    groupID = recordGroup[0][1]
    
    #print(i,"groupID",groupID,"\n\n")
    
    for record in recordGroup:           # loop through query result
        if record[1] == groupID:         # this groupID only
            #print(i,"len(df)",len(df),"\n")
            df.loc[len(df)] = [record[0],record[1],record[2],record[3],record[4],record[5],record[6],record[7],record[8]]
            if record[4] == sourceLetter:            # source language detected
              original = record[2] 
              element = record[3]
     
    
    for ind in df.index:
        if df.loc[ind,"language"] != original:
           
 
           if translationMethod == 1:        
              translated = translator.translate(element,src = original, dest =df.loc[ind,"language"])
              translated = translated.text
           
           elif translationMethod == 2: 
               
              translated = gTransWithSource(element, original , df.loc[ind,"language"])
              
          
           translated = translated.capitalize()
           wordCount = len(translated.split(" "))
    
           cursor.execute("UPDATE  Balthazzar.dbo.titleElementsOfficial_N_S_E_tmp  SET groupTranslated = 1 WHERE elementGroupID = ? AND language = ?",(groupID,df.loc[ind,'language'] ))
           
           cursor.execute("UPDATE  Balthazzar.dbo.titleElementsOfficial_N_S_E_tmp  SET element = ? WHERE elementGroupID = ? AND language = ?",(translated,groupID,df.loc[ind,'language'] ))
           
           
           cursor.execute("UPDATE  Balthazzar.dbo.titleElementsOfficial_N_S_E_tmp  SET source = 'G' WHERE elementGroupID = ? AND language = ?",(groupID,df.loc[ind,'language'] ))
           cursor.execute("UPDATE  Balthazzar.dbo.titleElementsOfficial_N_S_E_tmp  SET wordcnt = ? WHERE elementGroupID = ? AND language = ?",(wordCount,groupID,df.loc[ind,'language'] ))
          
           print(ind, element,original,"=",translated, df.loc[ind,"language"])
           
    
    # set satus for original at the end in case translation crashes
    cursor.execute("UPDATE  Balthazzar.dbo.titleElementsOfficial_N_S_E_tmp  SET groupTranslated = 1 WHERE elementGroupID = ? AND language = ?",(groupID,original ))
    print("   ", i," ----------------  Ferdig oversatt:",element,original)