Japanese text parser

I’ve created a simple Japanese text parsing page to make reading easier.  Try it out by pasting some UTF8 encoded Japanese text into the box below, or just submit the example text.  You can also go directly to an uncluttered form here.


You’re probably wondering why I bothered to make another Japanese text parsing page when there are so many out there.  It’s pretty simple: none of them were exactly what I wanted.

I liked  ReadJapaneseNews.com but

  • I didn’t like that I had to pay for it
  • Their news selection is strictly limited to scraping news.amaebi.jp
  • I didn’t like their interface
  • I didn’t want to use their SRS system since I’m already heavily invested in Anki

I liked Jim Breen’s text glosser but

  • It doesn’t have inline furigana
  • The definitions stretch way down the page and I hate scrolling

I liked JapaneseClass.jp’s reading area but

  • Although they pull from a wider range of sites, they truncate all their stories because they scrape their data from the Google News RSS feed.  For example, this only shows about 15% of the actual story.
  • Even with a wider range of stories, I want to be able to process any web page I want.
  • Their quiz system is highly opaque and inflexible, probably to deter cheating since it is structured as a sort of quasi-competition.

In addition to merging the features of those sites, I’ve added these features:

  • I can toggle the display of Heisig headwords over the kanji in the dictionary view below.  This helps me a lot to memorize things like proper nouns, or sometimes just to place a kanji that isn’t immediately familiar to me.
  • The view defaults to not showing furigana but lets me toggle it on, so I can have a try at reading without any assistance which is how reading is in the real world.
  • The highly detailed annotations in EDICT are usually just clutter so they default to hidden.
  • When reading through the text at the top, clicking on a phrase will hide all the other dictionary entries. This way you can stay at the top of the page and avoid scrolling down through all the other entries.
  • I can quickly generate a CSV file containing the vocabulary that I want to study. If you were logged in as me, it would know which vocabulary are in your Anki deck and would be able to highlight new vocabulary, but since you’re not it’ll just give you a CSV containing all non-kana words.
  • A link to view WWWJDIC’s gloss
  • Quick links for each dictionary entry to look up the corresponding entries at kanji.koohii.com and jisho.org.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>