If I got the teams done correctly, someone can at least read French on both. The corpus we will work with for development is the French-English parallel corpus (103 MB, 04/1996-09/2003). When you unpack this collection you will find that it contains two directories, fr/ and en/. Each file fr/X.txt has a corresponding English translation en/X.txt (identical filename, different directory), and each X.txt appears to contain one or more CHAPTERs -- each "chapter" will be the unit we call a "document" for this project.
If your team wants to do Spanish rather than French, that's ok. I am not expecting to see anything in this project that cares about the source language, as long as we use UTF-8 encoding. But let me know ASAP so I can get you the necessary resources. (I'm hoping to put together Chinese data while you're working on the project, and ideally we can just slot it in at the end (or right afterwards) to see what happens.)
iconv --f latin1 --t utf8 < in.txt > out.txt
Note that this phrase table was automatically acquired by a statistical MT system and there is a lot of garbage in it! For efficiency you might want to filter out any entries where none of the words on the French side appear in the documents you're working with. Similarly, you might choose to ignore the large number of phrase to phrase mappings that involve punctuation.