July 4

0 comments

How to Use OCR to Convert Japanese Images to Text

By Charles Hoshino

July 4, 2020


Hey! Want to see more of Japan? Check out Walks from a Hat, our weekly photography dispatch.

When I was learning Japanese, there OCR (optical character recognition) technology wasn’t good enough for practical use. Today, though, OCR technology is a great tool for any language learner. At Box of Manga, we believe that a huge part of Japanese learning is natural immersion through games, manga, visual novels, and TV shows. OCR lets you grab images with Japanese and convert them to text that you can copy and paste.

As we saw in our post on Japanese visual novels, having Japanese in text form makes learning much easier. Having text lets you look up words with a pop-up dictionary like rikaichan and easily paste text into a flashcard app like Anki. First, let’s look at the best OCR software out there for Japanese learners.

The best Japanese OCR software

Image

For Japanese learners, the best Japanese OCR software I’ve found is actually Google Keep. This is a free OCR tool that outclasses many paid options. Google Keep is a web app used for note-taking. To help with note searches, the app contains an excellent OCR character recognition capabilities. As Japanese learners, we will be borrowing this functionality to use as a language learning tool.

How to use Google Keep OCR to get Japanese text from images

Google Keep OCR is very easy to use. All you need to access it is a Google account. Then, navigate to the Google Keep page. There are also apps for both Android and iOS, so you can do this on both your desktop computer and your phone.

Here are the steps:

  • Take or upload a photo to Keep
  • Click the menu button (three vertical dots)
  • Select “Grab Image Text”
  • Let Google do the rest!

keep

The steps may vary slightly depending on whether you are using desktop or mobile, but the process is very intuitive on all platforms. All you need to do is take or upload a photo and click/tap “Grab Image Text” to get the text in a form you can copy and paste:

keep-2

If you are on a desktop computer, you can then use a pop-up dictionary like rikaikun to get mouse-over definitions for Japanese words:

keep-rikaikun

Very intuitive, right? Now let’s look at how the OCR works with images taken from manga, video games, and everyday life.

OCR with Japanese manga

First, let’s look at manga. We are huge fans of using manga to learn Japanese, and using OCR while you read manga saves you the hassle of manually having to look up kanji (by stroke order, hand-drawing, etc.).

We share images from manga books on our Instagram page, so I took a few photos from there and pasted them into Google Keep to see how they performed.

Here are some examples:

manga-keep-2

Though it’s not 100% perfect, these very, very good! Here are some things you might have to clean up manually:

  • Furigana over kanji can get mixed up in the text. You may have to delete or clean up some of this.
  • The less “standard” a font is, the less likely the algorithm will be able to understand it
  • Speech bubbles with awkward formatting might not be captured correctly

OCR with Japanese games and visual novels

Next, let’s try to use Google Keep’s OCR to convert images from Japanese games to text. I took screenshots of several Japanese games and threw them into the OCR. Here’s what I got:

ocr-games

This is arguably even better than what we got with manga! The OCR algorithm interpreted all of the Japanese game text perfectly! That’s a 100% hit rate, and it even worked with the text from retro Pokemon. However, again, there are some minor problems. If you have words in the background, the OCR will pick these up as well.

If you are playing a visual novel, an even better tool than OCR may be a text grabber like Textractor. This free software grabs all of the text in text boxes as you play, saving you tons of time if you plan to grab text frequently. For more information, please see our post on how to use Textractor to grab text from visual novels.

OCR with Japanese photos

Now let’s see how the OCR works with everyday photos of various things in Japan. I took these photos from the /r/JapaneseInTheWild subreddit.

Here is what I got:

keep-wild

To be honest, I wasn’t expecting  good performance, but Google Keep OCR exceeded my expectations again! The only text it had problems with was the (very) noisy sign on the right.

If you ever find yourself in Japan and need help reading something, now you can just pull out your phone and snap a photo with Google Keep. How convenient!

Happy OCR-ing

As I point out in my book on learning languages, I’m a big believer in reducing friction. When learning Japanese, this is all about making life easy for you so you can spend more time having fun and less time feeling frustrated.

An OCR tool is one way to reduce friction. It makes it easier to look up words and make flashcards, which also means more time playing games, reading manga, and getting better at Japanese. Cheers!

Like manga? Check out our monthly manga subscription box. We hand-pick titles that match your Japanese reading level :)