Skip to content

Latest commit

 

History

History
107 lines (59 loc) · 4.52 KB

README.md

File metadata and controls

107 lines (59 loc) · 4.52 KB

injalid_dejice.rb

Converts UTF-8 strings to the legacy KOI-7 encoding.

KOI-7 (Russian: КОИ-7) was a 7-bit character encoding developed in the USSR and based on the US-ASCII encoding. US-ASCII has only 128 code points, which are enough to encode the Latin alphabet (with numbers and most common punctuation marks), but not enough to encode both Latin and Cyrillic alphabets with the full set of characters (uppercase and lowercase).

In order to add Cyrillic alphabet support, but also to remain US-ASCII compatible, KOI-7 designers came up with a “hack”, in which both Cyrillic and Latin characters were encoded with the same code points, while the ASCII control characters 0x0E and 0x0F were adapted to switch between the Cyrillic and the Latin code pages, respectively.

This led to very annoying, yet sometimes funny bugs when an English text was printed (on a screen, a printer, or a teletype), while the codepage was still set to Cyrillic. The «иНЖАЛИД ДЕЖИЦЕ» message (pronounced as “injalid dejitze”) is the most notable example of a such case. The message became somewhat of a meme, and also an eponym for improperly coded text that looks like gibberish. In fact, the phrase was an improperly coded error message that simply states: “Invalid device”. Hence the name of the gem.

KOI-7 was widely used on the Soviet PDP-11 clones, notably the DVK systems and the Elektronika MK90 portable computer.

Installation

Add this line to your application"s Gemfile:

gem "injalid_dejice"

And then execute:

$ bundle install

Or install it yourself as:

$ gem install injalid_dejice

Usage

Ruby doesn't support KOI-7 natively (see Encoding.name_list), but does support 7-bit US-ASCII, a KOI-7 compatible encoding (since KOI-7 was based on the US-ASCII). So the encoder returns result as an US-ASCII string, and the decoder expects an input string to be encoded in US-ASCII, or in any compatible encoding (e.g. UTF-8 with characters in the range of 0x00 - 0x7F).

Encoding a string from UTF-8 to KOI-7

InjalidDejice.utf_to_koi(string [, keyword arguments])

Arguments:

  • string

Keyword arguments:

  • :forced_latin ([])

    Characters 0x00 - 0x32 are shared by both Cyrillic and Latin code page (KOI-7 N0 and KOI-7 N1 code pages). A such character's locale is defined by the preceding character(s). But sometimes locale of such a character should be forcibly switched to the Latin, regardless of the preceding character(s).

    :forced_latin option allows to do that. It specifies characters that will be treated as Latin. I.e. if the Cyrillic locale was set with the SO (0x0E), any character specified in the forced_latin array will forcibly append the SI (0x0F) code before the character.

  • :unknown_char_rep ("?")

    A replacement character for the unsupported characters.

Returns:

  • String

Decoding a string from KOI-7 to UTF-8

InjalidDejice.koi_to_utf(string [, keyword arguments])

Arguments:

  • string

Keyword arguments:

  • :strict mode (false)

    When 'true', if locale was switched to the Cyrillic with an SO (0x0E) character, it should be switched back to the Latin locale with a SI (0x0F) character. If the condition haven't been met an ArgumentError is raised.

  • :unknown_char_rep ("?")

    A replacement character for the unsupported characters.

Returns:

  • String

Examples

require "injalid_dejice"

# Encode:
str_to_encode = "Пользователи ДВК помнят 'инжалид дежице'!"
InjalidDejice.utf_to_koi(str_to_encode)
# => "\x0EpOLXZOWATELI dwk POMNQT 'INVALID DEVICE'!\x0F"

# Decode:
str_to_decode = "In KOI-7, two control characters can turn 'bhc' into '\x0Ebhc\x0F'!"
InjalidDejice.koi_to_utf(str_to_decode)
# => "In KOI-7, two control characters can turn 'bhc' into 'БХЦ'!"

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake test to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and the created tag, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/8bit-mate/injalid_dejice.rb.

License

The gem is available as open source under the terms of the MIT License.