iconv - how to ignore BOM

Discussion:

(too old to reply)

h***@gmail.com

2006-01-24 08:36:30 UTC

hi guy,

How to convert a UTF-16LE file to UTF-8, without the BOM

e.g. iconv -f UTF-16LE -t UTF-8 -o output.txt input.txt

this above command will output UTF-8 file with BOM

since if I use this UTF-8 with BOM file and futher convert into another
encoding, such as BIG-5

error will be returned as : illegal input sequence at position 0

thanks.

Laurenz Albe

2006-01-24 11:13:04 UTC

Permalink

Post by h***@gmail.com
How to convert a UTF-16LE file to UTF-8, without the BOM
e.g. iconv -f UTF-16LE -t UTF-8 -o output.txt input.txt
this above command will output UTF-8 file with BOM
since if I use this UTF-8 with BOM file and futher convert into another
encoding, such as BIG-5
error will be returned as : illegal input sequence at position 0

The resulting UTF-8 file will only contain a BOM if the input file
contains a BOM. This character is just translated from UTF-16 to UTF-8.

You will either need to cut the first 2 bytes from the input file before
converting or cut the first 3 bytes from the result file (this is the
BOM in both cases).

I can't think of a UNIX utility that will do such a task well, maybe
someone else can help.
You could also write a very simple C program that just outputs its
standard input except the first 2 or 3 bytes.

Yours,
Laurenz Albe

howa

2006-01-24 16:23:17 UTC

Permalink

yes, uconv can strip off the BOM automatically by using command line
option

http://www.jeffhung.idv.tw/cgi-bin/man2web?program=uconv&section=1

but i don't know if there are some options which i can set by using
iconv

seems if iconv cannot convert UTF-8 to Big-5 or other encoding due to
the BOM, a little bit funny...

Laurenz Albe 寫道：

Post by Laurenz Albe

The resulting UTF-8 file will only contain a BOM if the input file
contains a BOM. This character is just translated from UTF-16 to UTF-8.
You will either need to cut the first 2 bytes from the input file before
converting or cut the first 3 bytes from the result file (this is the
BOM in both cases).
I can't think of a UNIX utility that will do such a task well, maybe
someone else can help.
You could also write a very simple C program that just outputs its
standard input except the first 2 or 3 bytes.
Yours,
Laurenz Albe

Laurenz Albe

2006-01-25 09:00:47 UTC

Permalink

Post by howa
yes, uconv can strip off the BOM automatically by using command line
option
http://www.jeffhung.idv.tw/cgi-bin/man2web?program=uconv&section=1
but i don't know if there are some options which i can set by using
iconv
seems if iconv cannot convert UTF-8 to Big-5 or other encoding due to
the BOM, a little bit funny...
Laurenz Albe ???

Please, do not top post.

Why 'Laurenz Albe ???'? Am I that questionable?

I do not understand your post at all.

What is your problem? Are you just out to badmouth iconv?
If there is a utility called uconv that does what you want, why don't you
use it instead of complaining?

Yours,
Laurenz Albe