There are two types of files involved: the vocabulary files that Phrasier imports, and the CHS files storing the flash card practice sessions. The latter is a fixed format used by Phrasier, whereas the vocabulary files may take a number of different formats since they may come from various sources.
Phrasier is able to read a few different vocabulary file formats. The original format used by Phrasier is the CHPU format, but this is now superceded by the simpler tabular format. Phrasier can also read Chinese vocabulary files in the CEDict format: these contain both traditional and simplified characters. Phrasier should also be able to read vocabulary files in the EDict format, which was originally made for Japanese but should also work with other languages, although I have not tested this very well. All these formats are text files which can be made in a text editor; for use with Phrasier, they should all be saved using UTF-8 encoding. Although the CEDict and EDict formats are technically speaking fixed, there may be variants of these in use, so I have implemented a few different rules to recognise different formats.
If you want to make your own vocabulary files, feel free to contact me any time if you have any questions. I may also be able to convert vocabulary files between different formats.
This is the favoured format by Phrasier. The tabular format is a simple tab-separated file: phrase[tab]pronounciation[tab]translation[tab]comment In addition, lines starting with # are ignored. This may be used with any language: Chinese, Japanese, European languages, ...
When importing the files into Phrasier, it is important that e.g. Chinese tabular format is selected if the files contain Chinese. This allows Phrasier to interpret the tone marks: Pinyin is written on the formword# where # is the tone, and Phrasier will convert this to Pinyin with tone marks, and can also produce Bopomofo phonetics from this.
The CEDict format is a text file with one line per term using the format traditional simplified [pinyin] /translations/ where the traditional and simplified Chinese should not contain space (since space is used as separators), the Pinyin is written using numbers at the end of each word to indicate the tone (e.g. pin1 yin1) and words (corresponding to characters) are separated by space, and there may be one or more translations separated by "/".
There are several variations of this format, e.g. using tabulators as separators and not having  around the Pinyin. Some use capital letters in the Pinyin of names, although CEDict specifies that this should preferably not be done. I have tried to implement a few different varieties.
The basic EDict format is similar to CEDict (since CEDict was inspired by EDict): phrase [pronounciation] /translations/ or phrase /translations/ although EDict also allows for the translation to start with a ()-enclosed general information field. For Japanese, which EDict was made for, the encoding is specified, and is NOT UTF-8, so you may have to convert any Japanese vocabulary files on EDict format to UTF-8 before Phrasier can read them.
The CHPU format is adapted from the CHP format used by Chinese Practice. These contain one line per term, each on the format <CH=phrase><PI=pronounciation><OR=translation><NO=comment> The order of the tags is not important, and CHP has additional tags that Phrasier just ignores.
The CHS file format is an XML format. This proved to be convenient for a number of reasons: not least, it makes it easier to make the files both forward and backward compatible. Also, it is editable in an ordinary text editor: if you edit the practice session file, however, be aware that Phrasier will warn you that the checksum does not match the data.
The basic format is as follows:
The main element is the session element. The session format version, like the Phrasier version, is used internally to identify the format used for the session data: even when a new version of Phrasier comes, the format of the session data may be the same, and this makes compatibility easier to check. The session tag also stores a checksum which is used to verify that the data has not been corrupted: this only checks within the same session format version. The encoding should be either UTF-8 or ASCII: the UTF-8 encoding is convenient for entering or editing files using an editor since non-ASCII characters may be displayed, whereas the ASCII (non-ASCII characters encoded as &#number;) is required for MobilePhrasier.
Within the session tag are two main tags: options and vocabulary. A list of options may be included in the first tag for storing information about which fields are displayed or hidden and which fonts are used in each field: these are optional and will be created when needed. The second tag contains the list of terms included in the practice session. The class option is there primarily to specify what type of terms the vocabulary contains. The locale specifies language specific rules: the only locale implemented as of now is chinese, but others may come.
The vocabulary element contains a list of terms each of which has the following format:
MobilePhrasier, in order to save space and read the file faster, condenses the tag names of the term to the first two characters: i.e. so, ph, etc. Phrasier writes full tag names, while MobilePhrasier writes condensed tag names, but both can read full as well as condensed tag names.