|
| 1 | +# Translation loss |
1 | 2 |
|
2 | | -TODO... |
| 3 | +This folder has datasets comparing the distribution of all pairs of color terms in two languages, calculating the LAB distance to signify the "translation loss" of going from one term to another. |
3 | 4 |
|
4 | | -See [Hue Color Comparisons](https://idl.uw.edu/color-naming-in-different-languages/vis/stacked-spectrum.html) |
5 | | - |
| 5 | +See [Color Translator](https://idl.uw.edu/color-naming-in-different-languages/vis/color_translator.html) |
6 | 6 |
|
7 | | -`translation_loss` is also an array having the translation losses between the top 100 English and Korean color name for full colors. 'dist' property indicate the distance (loss) between the English term (enTerm) and the Korean term (koTerm). |
| 7 | + |
| 8 | + |
| 9 | +And also [Korean-English Translation Comparisons](https://idl.uw.edu/color-naming-in-different-languages/vis/en-ko-translation-comparison.html) |
| 10 | + |
| 11 | + |
| 12 | + |
| 13 | +## The translation loss calculation |
| 14 | + |
| 15 | +The translation loss is calculated by using the binned full color names and comparing the probability distributions of the two terms (P(c|t): Probability of this color bin (c) given this term (t)). We use Earth Mover's Distance to compute the distance between these two probability distributions to find a final LAB distance. |
| 16 | + |
| 17 | +You can find the calculated "most accurate" translation for a term by finding the term pair with the smallest distance. |
| 18 | + |
| 19 | +You can also compare the LAB distances to the estimated "Just Noticeable Difference" value of 2.3 LAB distance (SHARMA G.: Digital Color Imaging Handbook. CRC press, 2002). |
| 20 | + |
| 21 | + |
| 22 | +## Translation Loss Files |
| 23 | +The translation loss information is stored as separate files for each translation pair: "translation_loss_LANG1_LANG2.json" where LANG1 and LANG2 are the 2 letter abbreviations fo the language. |
| 24 | + |
| 25 | +*Note: We only save files for LANG1 <= LANG2 so as not to duplicate work and information.* |
| 26 | + |
| 27 | +Each file is an array of all possible pairs of LANG1 terms to LANG2 terms. Each object in the array has the following fields: |
| 28 | +- **LANG1term:** (e.g., "enterm", "koterm", "zhterm") The simplified matching term from language 1 |
| 29 | +- **LANG2term:** (e.g., "enterm", "koterm", "zhterm") The simplified matching term from language 2 |
| 30 | +- **dist:** The LAB Earth Mover's Distance between the probability distributions (P(c|t)) of LANG1term and LANG2term |
| 31 | + |
| 32 | +*Note: When we are comparing a language with itself (LANG1 == LANG2), then instead of LANG1term and LANG2term fields, we use LANGterm and LANGterm2 (e.g., "enterm" and "enterm2" or "koterm" and "koterm2"). Also, to save work and duplicate information we only save a pair for the first term < the second term.* |
| 33 | + |
| 34 | +Created by running two scripts: |
| 35 | +- processing_scripts/03_advanced_processing/getTranslation_01.js |
| 36 | +- processing_scripts/03_advanced_processing/getTranslation_02_EMDparallel.py |
0 commit comments