Monday, December 16, 2013

Copy-pastable ASCII characters with pdftex/pdflatex



Trying to get a pastable URL that has tilde in it, using \verb|~| to replay the tilde make the url working when you paste it from the pdf file to a web browser.

Learned it from here:

This document presents how to obtain fully copyable PDF documents from Latex. By fully, I mean that all ASCII characters are copied into the equivalent binary character with Ctrl-C in a PDF viewer (evince, xpdf, acrobat reader, etc.) or with pdftotext in command line. Indeed, by default certain characters are transformed, eg. ' becomes .

Let's list the printable ASCII characters:
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[]^_`{|}~

Alphanumeric characters are always copyable. Many punctutation characters work out of the box if powerful fonts are correctly set up (e.g. with cm-super): . / : ; < = > ? @ [ ].

Many other characters just need to be escaped in order to be copy-pastable:
\#, \$, \%, \&, \{, \}

The problems start with " (double quotes), \ (backslash), ' (single quote, apostrophe), - (hyphen or minus), ^ (caret), | (pipe), ` (backtick, backquotes, accent grave), ~ (equivalency sign - tilde), _ (underscore).

It turns out that obtaining those characters is not straigthforward: it depends on your default fonts and on whether \usepackage[T1]{fontenc} is present.
Here are commands that might work:

  • " (double quotes): \verb|"|, \char34
  • \ (backslash): \textbackslash (with \usepackage{textcomp}), \verb|\|, \char124
  • ' (single quote, apostrophe): \textquotesingle (with \usepackage{textcomp}), \char"0D, \char39
  • - (hyphen or minus): \verb|-|, \texttt{-}, \char45
  • ^ (caret): \verb|^|, \char94
  • | (pipe): out of the box (sometimes), \texttt{|}, \char124
  • ` (backquote): \`{}, \char"0D, \char96
  • ~ (equivalency sign - tilde): \verb|~|, \char126
  • _ (underscore): \verb|~|, \char95
On my Debian Linux / TeX Live, with cm-super installed, and with \usepackage[T1]{fontenc} , the following produces a completely copy-pastable PDF:
\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{textcomp}
\begin{document}
\tiny{
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ! 
\char34\#\$\%\& \verb|\|\textquotesingle()*+,
\verb|-|./:;<=>?@[]\verb|^|\verb|_|\`{}\{\texttt{|}\}\verb|~|
}
\end{document}
Without
\usepackage[T1]{fontenc}
: the following works:
\documentclass{article}
\begin{document}
\tiny{
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!
\verb|"|\#\$\%\&\verb|\|\char"0D()*+,
\verb|-|./:;<=>?@[]\verb|^|\verb|_|\`{}\{\texttt{|}\}\verb|~|
}
\end{document}

Wednesday, February 6, 2013

Compilation Error on inet-2.1.0 32/64 bit compat of MACAddress.h MACAddress.cc

  After I updated my OMNeT++ from 4.0 to 4.2.2, I also tried to update the inet to the newest version 2.1.0. According to the INSTALL file, I run the make command. I got the following errors:

linklayer/contract/MACAddress.h line 60: error: integer constant is too large for ‘long’ type
linklayer/contract/MACAddress.h line 120: error: integer constant is too large for ‘long’ type
linklayer/contract/MACAddress.h line 125: error: integer constant is too large for ‘long’ type

I found the following post by EZ.
https://groups.google.com/forum/#!msg/omnetpp/JF_BbiqQIWg/9wgJWoezgoEJ

Compilation Warning on inet-1.99.4 32/64 bit compat of MACAddress.h
1 post by 1 author in omnetpp
I have just moved to inet-1.99.4-development-03d5d15-src.tgz
I am running on Ubuntu i686:
2.6.35-32-generic #67-Ubuntu SMP Mon Mar 5 19:35:26 UTC 2012 i686 GNU/Linux

I get the warnings:
linklayer/contract/MACAddress.h:59: warning: integer constant is too large for ‘unsigned long’ type
linklayer/contract/MACAddress.h:119: warning: integer constant is too large for ‘unsigned long’ type
linklayer/contract/MACAddress.h:124: warning: integer constant is too large for ‘unsigned long’ type

It is clearly caused by the fact long int on my machine is 32bit...

AFAIK there is no "standard" way to write portable code that will know if the long int is 64bit or 32bit.
So I am not sure what to propose here. 
I think a best fix should go into OMNET configure script itself (could be coded as a simple test 
program that checks the sizeof(long int)). That would probably add -DIS64BIT or similar.

Meanwhile (for linux) we can simply add the following code to linklayer/contract/MACAddress.h:
#if defined(_WORDSIZE) && __WORDSIZE == 64 
#   define MAC_ADDRESS_MASK 0x0000ffffffffffffUL
#else
#   define MAC_ADDRESS_MASK 0x0000ffffffffffffULL
#endif
So I  opened file /src/linklayer/contract/MACAddress.h
In the head there is this line:
#define MAC_ADDRESS_MASK 0xffffffffffffL
I changed it to:
#if defined(_WORDSIZE) && __WORDSIZE == 64
#   define MAC_ADDRESS_MASK 0x0000ffffffffffffUL
#else
#   define MAC_ADDRESS_MASK 0x0000ffffffffffffULL
#endif

Run make command again, I got one similar error:
linklayer/contract/MACAddress.cc line 137: error: integer constant is too large for ‘long’ type
In file /src/linklayer/contract/MACAddress.cc line 137,
uint64 intAddr = 0x0AAA00000000L + (autoAddressCtr & 0xffffffffL);
I changed it to:
uint64 intAddr = 0x0AAA00000000ULL + (autoAddressCtr & 0xffffffffULL);

Run make again, and compile inet-2.1.0 successfully.