Skip to content

Tag normalization: normalize abbreviation special cases #440

@Tiendil

Description

@Tiendil

The general abbreviation processing will be implemented in gh-422

But there may stay some special cases, mostly from news native tags, that we should handle separately/differently.

Examples:

  • u.s.a. -> usa
  • u. s. a. -> usa
  • u s a -> usa
  • ph.d. -> phd (there a lot of abbreviations with multi-letter parts, like ph)
  • but .net -> dot-net
  • but r programming -> r-programming

We may need to add checks in multiple places:

  • in tags.converters to detect some patterns in raw tags;
  • in tags.normalizers to detect some patterns after initial normalization.

Attention: check the statistics of such tags in the DB before implementing anything, to avoid overengineering.

Internal task id: ff-524

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions