windows:aboutexcelcsvformat
Differences
This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
| windows:aboutexcelcsvformat [2014/10/29 08:59] – created rlunaro | windows:aboutexcelcsvformat [2022/12/02 21:02] (current) – external edit 127.0.0.1 | ||
|---|---|---|---|
| Line 7: | Line 7: | ||
| ===== The CSV format ===== | ===== The CSV format ===== | ||
| - | The [[http:// | + | The [[http:// |
| + | What I've find out researching into the way Excel stores this information and making some tries is this: | ||
| + | |||
| + | The general format is this: | ||
| + | |||
| + | < | ||
| + | value11< | ||
| + | value21< | ||
| + | </ | ||
| + | |||
| + | As ''< | ||
| + | |||
| + | If the character ''< | ||
| + | | ||
| + | < | ||
| + | " | ||
| + | value21; | ||
| + | </ | ||
| + | | ||
| + | **How the carriage returns in the values are treated??? | ||
| + | | ||
| + | |||
| + | < | ||
| + | " | ||
| + | value21; | ||
| + | </ | ||
| + | |||
| + | **What happens with the lenght of the values???** The maximum length per record is 32,768 **bytes** (sounds familiar, isn't it?). Pay attention to the " | ||
| + | | ||
| + | | ||
| + | ===== The best way to encode a CSV format ===== | ||
| + | |||
| + | After many tries, I've discovered that: | ||
| + | |||
| + | * You can use Unicode as an enconding format for your CSV files | ||
| + | * If you use Unicode, the best way to encode is to use <tab> as a field separator: the rest of separators (semicolon, comma) are not recognized for Excel as a proper CSV file, and then will force the user to use the " | ||
| + | |||
| + | ===== A reference implementation in visual basic script ===== | ||
| + | |||
| + | Here is my reference implementation in visual basic script (provided the input comes from a recordset), which can be easily converted into other programming language like java. | ||
| + | |||
| + | As a safety precaution, I've put behind double quotes any value that contains a " | ||
| + | |||
| + | <code vbs> | ||
| + | ' | ||
| + | ' | ||
| + | ' csv.vbs - helper functions for writing csv files | ||
| + | ' | ||
| + | ' | ||
| + | |||
| + | function openCsv( byval filename ) | ||
| + | |||
| + | dim fso | ||
| + | dim textFile | ||
| + | dim overwrite | ||
| + | dim useUnicode | ||
| + | |||
| + | overwrite = true | ||
| + | useUnicode = true | ||
| + | set fso = createobject( " | ||
| + | set openCsv = fso.createTextFile( filename, overWrite, useUnicode ) | ||
| + | |||
| + | end function ' | ||
| + | |||
| + | |||
| + | function closeCsv( byval fileObject ) | ||
| + | |||
| + | fileObject.close | ||
| + | |||
| + | set fileObject = nothing | ||
| + | |||
| + | end function ' fileObject | ||
| + | |||
| + | function writeHeaderCsv( byval fileObject, byval recordset ) | ||
| + | |||
| + | dim nCont | ||
| + | |||
| + | nCont = 0 | ||
| + | do while nCont < recordset.fields.count | ||
| + | fileObject.write item(recordset.fields(nCont).name) & recordSep() | ||
| + | nCont = nCont + 1 | ||
| + | loop ' nCont | ||
| + | fileObject.write lineSep() | ||
| + | |||
| + | end function ' writeHeaderCsv | ||
| + | |||
| + | ' prints the current line of the | ||
| + | ' recorset as a cvs line | ||
| + | function writeLineCsv( byval fileObject, byval recordset ) | ||
| + | |||
| + | dim nCont | ||
| + | |||
| + | nCont = 0 | ||
| + | do while nCont < recordset.fields.count | ||
| + | fileObject.write item(recordset.fields(nCont).value) & recordSep() | ||
| + | nCont = nCont + 1 | ||
| + | loop ' nCont | ||
| + | fileObject.writeline | ||
| + | |||
| + | end function ' writeLine | ||
| + | |||
| + | function lineSep() | ||
| + | lineSep = vbLf | ||
| + | end function ' lineSetp | ||
| + | |||
| + | ' after trying several alternatives, | ||
| + | ' separator that makes the excel correctly interpret | ||
| + | ' the differents cells is vbTab, so I left as | ||
| + | ' the default separator | ||
| + | function recordSep() | ||
| + | recordSep = vbTab ' ";" | ||
| + | end function ' recordSep | ||
| + | |||
| + | |||
| + | ' escape of items: | ||
| + | ' value es;to must be set as " | ||
| + | ' value es" | ||
| + | function item( byval value ) | ||
| + | |||
| + | if isempty( value ) then | ||
| + | value = "" | ||
| + | end if ' isempty | ||
| + | if isnull( value ) then | ||
| + | value = "" | ||
| + | end if ' isnull | ||
| + | |||
| + | item = value | ||
| + | ' cut items longer than 32768 bytes long | ||
| + | ' as you can imagine " | ||
| + | ' to " | ||
| + | ' to a prudent 30767 char (2000 bytes less) | ||
| + | if len( item ) > 30767 then | ||
| + | item = mid( item, 1, 30767 ) & " | ||
| + | end if ' length( item ) | ||
| + | item = replace( item, """", | ||
| + | if instr( item, vbCr ) > 0 or instr( item, vbLf ) > 0 then | ||
| + | item = """" | ||
| + | end if ' instr( item | ||
| + | if instr( item, recordSep() ) > 0 then | ||
| + | item = """" | ||
| + | end if ' instr( item | ||
| + | end function ' item | ||
| + | |||
| + | |||
| + | |||
| + | </ | ||
| + | |||
| + | |||
| + | |||
| + | ---- | ||
| + | |||
| + | ~~~DISQUS~~~ | ||
windows/aboutexcelcsvformat.1414573140.txt.gz · Last modified: 2022/12/02 21:02 (external edit)
