Giter Site home page Giter Site logo

ancient-chinese's Introduction

Ancient Chinese

This project contains ancient Chinese books in plain text. We provide converters written in Go programming language to convert plain texts to other format like pdf.

Manual

  1. Install golang, git:
$ sudo apt-get install golang-go git
  1. Install texlive. After installation (takes hours!), add the bin folder to path, like
$ export PATH=/usr/local/texlive/2014/bin/x86_64-linux:$PATH

NOTE: Your installation path might be different. Use "ls /usr/local/texlive" to find yours. You can put the above export command in ~/.bashrc to run it automatically. 3. Download source code of project ancient-chinese:

$ git clone https://code.google.com/p/ancient-chinese
$ cd ancient-chinese
  1. Install fonts:
$ sudo mkdir /usr/share/fonts/truetype/chinese/
$ sudo cp fonts/* /usr/share/fonts/truetype/chinese/
$ fc-cache
  1. Compile ancient-chinese:
$ cd go
$ go install tex
  1. Convert txt to tex format:
$ cd ../txt
$ ../go/bin/tex shiji-simplified.txt
  1. Convert tex to pdf format:
$ for i in 1 2 3 ; do xelatex shiji-simplified.tex ; done

NOTE: We need to xelatex three times to correctly generate TOC (table of content):

  • 1st run: generate all pages w/o TOC.
  • 2nd run: generate TOC and all pages w/o correct page numbers.
  • 3nd run: generate TOC and all pagew w/ correct page numbers.

You don't need to worry about these details, Just run xelatex three times.

Text File Rules

  1. All text files go under txt sub folder and Go code under go sub folder.
  2. Only a subset of ASCII characters are allowed in file names, including lowcase letters, numbers, - (dash) and . (dot).
  3. Use pinyin to replace Chinese character in file names. For example, "shiji" for "史记". Use suffix like "-simplified" or "-traditional" to indicate that the text is in simplified or traditional Chinese. Prefer ".txt" extension. e.g.
shiji-simplified.txt
shiji-traditional.txt
  1. All files are encoded with UTF8, W/O BOM byte.
  2. Rare characters are represented by multiple characters, enclosed by half-width parentheses. e.g.
(土慮)   --- left & right composition.
(/窮)  --- `/` means up & down composition.
(𠂆\*圭)  --- `\*` means outside / inside composition.

NOTE: Rare characters are defined by that they are not included in HanaMin(花園明朝) font, see http://www.zdic.net/appendix/f18.htm 6. Comments are put inside ().

Golang Code Rules

  1. All Go code need to formated with gofmt.
  2. Every Go source file should have minimal comment explaining how to use the code.

Text File Format

TITLE 
AUTHOR 
++CHAPTER1    // Num of '+' decides the type of structure, the more the smaller. Max 7. 
CONTENT       // One line for each paragraph. 
.... 
++CHAPTER2 
....

Table format

--- 
Column1|Column2|Column3 
Column1|Column2|Column3 
.... 
--- 

Note:

  • Tables start and end with "---".
  • Every row must have the same number of "|".

FAQs

  • Q: Why use text files?
    A: Text files save disk space. Most importantly, it's easy to edit text files and we can use source control system to record the change history of files. We require minimal formating in the text files and use Tex to format the books for different devices.

  • Q: Why use golang?
    A: No special reason. The author would like to try this relatively new language, which is cool and very concise.

ancient-chinese's People

Contributors

yijinliu avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.