Coding free text documents, especially in 
medicine, has become an urgent priority as electronic medical records (EMR) mature, and the need to exchange data between EMRs becomes more acute. However, only a few automated coding systems exist, and they can only code a small portion of the free text against a limited number of codes. The precision of these systems is low and code quality is not measured. The present invention discloses a process and 
system which implements semantic coding against standard 
lexicon(s) with high precision. The standard 
lexicon can come from a number of different sources, but is usually developed by a standard's body. The 
system is semi-automated to enable medical coders or others to process free text documents at a 
rapid rate and with high precision. The 
system performs the steps of segmenting a document, 
flagging the need for corrections, validating the document against a 
data type definition, and looking up both the 
semantics and standard codes which correspond to the document's sentences. The coder has the option to intervene at any step in the process to fix mistakes made by the system. A 
knowledge base, consisting of propositions, represents the semantic knowledge in the domain. When sentences with unknown 
semantics are discovered they can be easily added to the 
knowledge base. The propositions in the 
knowledge base are associated with codes in the standard 
lexicon. The quality of each match is rated by a professional who understands the knowledge domain. The system uses this information to perform high precision coding and measure the quality of the match.