Sep-17-2024, 03:40 PM
I have a column of countries. However mixed within this country column are spanish country names.
I have found the list of names in Spanish, translated them to english but now I want to either create another column showing the original column plus where my translate country will be or just replace the entry.
I have my translated list
Any ideas how i can show either the new translations replaced but still show all others countries already in english?
I have found the list of names in Spanish, translated them to english but now I want to either create another column showing the original column plus where my translate country will be or just replace the entry.
Quote:OriginalColumn of Countries
France
Portugal
Spain
México
Suiza
Perú
Alemania
Suecia
Reino Unido
I have my translated list
Quote:MexicoNow I want to put in another column and where i have found the translations in english, to insert them there (or if that fails, just update the original column with the translation so I should then have a list of all english countries
Swiss
Peru
Germany
Sweden
United Kingdom
import pycountry
import pandas as pd
from pathlib import Path
from langdetect import detect
from googletrans import Translator
FILENAME = r"POS.xlsx"
COUNTRYNAME = 'Country'
df = pd.read_excel(FILENAME)
def all_names() -> set[str]:
# all Country objects have a "name" attribute
return {country.name for country in pycountry.countries} # type: ignore
def all_official_names() -> set[str]:
s: set[str] = set()
for country in pycountry.countries:
# not all Country objects have an "official_name" attribute
try:
s.add(country.official_name) # type: ignore
except AttributeError:
pass
return s
def get_df_countries(filename: Path) -> set[str]:
# construct a set because country names may be duplicated in the spreadsheet column
# this potentially improves runtime performance when parsing the country names later
return set(pd.read_excel(FILENAME)[COUNTRYNAME])
# Function to detect language of a word
def detect_language(word):
try:
return detect(word)
except:
return 'unknown'
translator = Translator()
if __name__ == "__main__":
names = all_names() | all_official_names()
for country in get_df_countries(Path(FILENAME)):
status = "valid" if country in names else "invalid"
if status =="invalid":
#print(f"{country} is {status}")
# Translate country back into english
translated_country = (translator.translate(country).text)
print(translated_country) Any ideas how i can show either the new translations replaced but still show all others countries already in english?
