Skip to content

Commit e58cf29

Browse files
committed
README.md files: Finishing filling out the basic model ones with traslation_loss explanation
1 parent 55e8eb0 commit e58cf29

File tree

1 file changed

+33
-4
lines changed

1 file changed

+33
-4
lines changed

model/translation_loss/README.md

Lines changed: 33 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,36 @@
1+
# Translation loss
12

2-
TODO...
3+
This folder has datasets comparing the distribution of all pairs of color terms in two languages, calculating the LAB distance to signify the "translation loss" of going from one term to another.
34

4-
See [Hue Color Comparisons](https://idl.uw.edu/color-naming-in-different-languages/vis/stacked-spectrum.html)
5-
![A screenshot of the hue color comparisons showing how blue is divided in Russian, Chinese, and Korean](../../vis/stacked-small.png)
5+
See [Color Translator](https://idl.uw.edu/color-naming-in-different-languages/vis/color_translator.html)
66

7-
`translation_loss` is also an array having the translation losses between the top 100 English and Korean color name for full colors. 'dist' property indicate the distance (loss) between the English term (enTerm) and the Korean term (koTerm).
7+
![A screenshot of the color translator with 2D grids of colors representing different terms](../../vis/color-translator-small.png)
8+
9+
And also [Korean-English Translation Comparisons](https://idl.uw.edu/color-naming-in-different-languages/vis/en-ko-translation-comparison.html)
10+
11+
![A screenshot of the color translation diagram with English color names on one side and Korean color names on the other side. Lines between them indicate what our calculated translation is compared to what online translation tools suggested.](../../vis/en-ko-translation-small.png)
12+
13+
## The translation loss calculation
14+
15+
The translation loss is calculated by using the binned full color names and comparing the probability distributions of the two terms (P(c|t): Probability of this color bin (c) given this term (t)). We use Earth Mover's Distance to compute the distance between these two probability distributions to find a final LAB distance.
16+
17+
You can find the calculated "most accurate" translation for a term by finding the term pair with the smallest distance.
18+
19+
You can also compare the LAB distances to the estimated "Just Noticeable Difference" value of 2.3 LAB distance (SHARMA G.: Digital Color Imaging Handbook. CRC press, 2002).
20+
21+
22+
## Translation Loss Files
23+
The translation loss information is stored as separate files for each translation pair: "translation_loss_LANG1_LANG2.json" where LANG1 and LANG2 are the 2 letter abbreviations fo the language.
24+
25+
*Note: We only save files for LANG1 <= LANG2 so as not to duplicate work and information.*
26+
27+
Each file is an array of all possible pairs of LANG1 terms to LANG2 terms. Each object in the array has the following fields:
28+
- **LANG1term:** (e.g., "enterm", "koterm", "zhterm") The simplified matching term from language 1
29+
- **LANG2term:** (e.g., "enterm", "koterm", "zhterm") The simplified matching term from language 2
30+
- **dist:** The LAB Earth Mover's Distance between the probability distributions (P(c|t)) of LANG1term and LANG2term
31+
32+
*Note: When we are comparing a language with itself (LANG1 == LANG2), then instead of LANG1term and LANG2term fields, we use LANGterm and LANGterm2 (e.g., "enterm" and "enterm2" or "koterm" and "koterm2"). Also, to save work and duplicate information we only save a pair for the first term < the second term.*
33+
34+
Created by running two scripts:
35+
- processing_scripts/03_advanced_processing/getTranslation_01.js
36+
- processing_scripts/03_advanced_processing/getTranslation_02_EMDparallel.py

0 commit comments

Comments
 (0)