Create your own awesome maps

Even on the go

with our free apps for iPhone, iPad and Android

Get Started

Already have an account?
Log In

NLPA UNIX Tools by Mind Map: NLPA UNIX Tools
0.0 stars - 0 reviews range from 0 to 5

NLPA UNIX Tools

command line

documentation

man <command>

info <command>

V7 manual pages

basic

bash, interactive shell and scripting, programming constructs, pipes, for, while, read, redirection

cat, concatenate files, cat file1 file2 ... > output

tac, reverse the lines in a file

wc, count characters, lines, words in a file

file, determine file type

file level commands

rm, remove files

ln, create links

cp, copy files

mv, move / rename files

chmod, change access mode

chown, change ownership

mkdir, create directories

searching

fgrep, Aho-Corasick String matching

grep, regular expressions translated into NDAs

egrep, regular expressions translated into NDAs

agrep, variants of Levensthein distance algorithm

file comparison

diff, Hunt-McIlroy

database-like

sort, merge sort, with special attention to external storage

cut, cout out fields or lines

paste, combine corresponding lines from files (line by line)

join, join input files based on fields, input files must be sorted on fields

uniq, report or omit repeated lines

complex modifications

sed, stream editor, performs edits on potentially very large files

awk, simple scripting language, vaguely a simple ancestor to Perl; still useful for one-liners

gzip, bzip2, ..., various compression utilities, note that compression doesn't just save storage, it usually speeds things up as well

(Python scripting)

collections of files

xargs

find

cpio

tar

scripting

Python

Perl

Ruby

other tools

lexx / yacc, parser generators, similar tools also exist for Java, python, etc

libdb, fast key-value store

sqlite, fast, server-less relational database management

mercurial, distributed version control

data formats

text files

ASCII

Unicode

other text formats

multiple fields per line, tab or comma separated, used by relational operators etc.

multimedia and scientific

standard image, audio, video formats

array data in HDF5 format

Python object dumps

databases

Berkeley DB databases (key value stores)

sqlite databases

formats indicated by...

extension

magic number

background

for a lot of work in NLP, it's useful to know the standard UNIX commands

many of these have highly optimized implementations and can process data larger than memory

modern versions handle UNICODE correctly

http://cm.bell-labs.com/7thEdMan/bswv7.html