Monday, December 16, 2013

Copy-pastable ASCII characters with pdftex/pdflatex



Trying to get a pastable URL that has tilde in it, using \verb|~| to replay the tilde make the url working when you paste it from the pdf file to a web browser.

Learned it from here:

This document presents how to obtain fully copyable PDF documents from Latex. By fully, I mean that all ASCII characters are copied into the equivalent binary character with Ctrl-C in a PDF viewer (evince, xpdf, acrobat reader, etc.) or with pdftotext in command line. Indeed, by default certain characters are transformed, eg. ' becomes .

Let's list the printable ASCII characters:
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[]^_`{|}~

Alphanumeric characters are always copyable. Many punctutation characters work out of the box if powerful fonts are correctly set up (e.g. with cm-super): . / : ; < = > ? @ [ ].

Many other characters just need to be escaped in order to be copy-pastable:
\#, \$, \%, \&, \{, \}

The problems start with " (double quotes), \ (backslash), ' (single quote, apostrophe), - (hyphen or minus), ^ (caret), | (pipe), ` (backtick, backquotes, accent grave), ~ (equivalency sign - tilde), _ (underscore).

It turns out that obtaining those characters is not straigthforward: it depends on your default fonts and on whether \usepackage[T1]{fontenc} is present.
Here are commands that might work:

  • " (double quotes): \verb|"|, \char34
  • \ (backslash): \textbackslash (with \usepackage{textcomp}), \verb|\|, \char124
  • ' (single quote, apostrophe): \textquotesingle (with \usepackage{textcomp}), \char"0D, \char39
  • - (hyphen or minus): \verb|-|, \texttt{-}, \char45
  • ^ (caret): \verb|^|, \char94
  • | (pipe): out of the box (sometimes), \texttt{|}, \char124
  • ` (backquote): \`{}, \char"0D, \char96
  • ~ (equivalency sign - tilde): \verb|~|, \char126
  • _ (underscore): \verb|~|, \char95
On my Debian Linux / TeX Live, with cm-super installed, and with \usepackage[T1]{fontenc} , the following produces a completely copy-pastable PDF:
\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{textcomp}
\begin{document}
\tiny{
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ! 
\char34\#\$\%\& \verb|\|\textquotesingle()*+,
\verb|-|./:;<=>?@[]\verb|^|\verb|_|\`{}\{\texttt{|}\}\verb|~|
}
\end{document}
Without
\usepackage[T1]{fontenc}
: the following works:
\documentclass{article}
\begin{document}
\tiny{
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!
\verb|"|\#\$\%\&\verb|\|\char"0D()*+,
\verb|-|./:;<=>?@[]\verb|^|\verb|_|\`{}\{\texttt{|}\}\verb|~|
}
\end{document}