Personally, I wouldn't bother making words visually closer to what they sound. More often than not when something like this comes up, my own rendition sounds better to me than the actual one, i.e. it's subjective if the word is not encountered spoken.
Generally speaking, I think context is important, too. English is not my native language, so it might be different for others, but when I read a paragraph in English, I'm in "English reading mode", so to speak, meaning that I'll read everything like I think it would sound in English. However, when there's multiple instances of fantasy (foreign?) words, I start reading those differently. Sometimes I even read an English word wrong before I notice its meaning...
Going for this what would work for me would be that the foreign words are in italics, just so that my brain pays attention and doesn't just steamroll over these words.
About the VO I have just one concern. I think it would be a good idea if when you first hear some word that wildly differs in pronunciation it would also be first encountered in that same VO, preventing the player from already having their own version in their heads and then subsequently thinking about it every time they see/hear it.
One thing I'd also like to see is the use of loanwords in common dialogue, specially in areas where cultures come in contact with each other.