Knowledgebase

Problem loading the BMF from raw ASCII files


There are several stray characters in the fourth text file of the Business Master File within the field 'ARED' of the organization with EIN 940519020. In Hexidecimal , the character is 1A or 26. This character signals to SAS that the file has ended (a.k.a. End-Of-File marker, or EOF). SAS will stop reading from the flat file once it hits an EOF marker and no error will appear in your SAS program.

There several ways to locate stray EOFs in a flat ASCII file (such as the BMF or RTF files):

If each record in the file is a fixed length (vs. variable length) and the character is in the middle of a row, the minimum record length will not match the maximum record length.

If each record in the file is a fixed length, you can compute the number of records you expect to read by dividing the total file size (# bytes) by the length of each record or row + 1 (add 1 to length to account for the end of line/carrage return).

If the file has variable length records, then you can use the following code to read only the first character of each row into a temp file. Assuming the EOF isn't in the first position of a record, the number of records in the temp file should equal the total when you input all fields.

DATA _NULL_; INFILE T1 MISSOVER LS=1 LRECL=317;

INPUT @1 RECTEST $1.; RUN;

  • set lrecl to actual record length of file layout;
  • You can also use UltraEdit to open the file and conduct search/replace, or R:COREPROGRAMSstripper.prg in Foxpro to search/replace.

    Sometimes the stray EOF does not trigger SAS to stop loading a flat file. If this happens you may read all records in the flat file, but one or more fields in the SAS dataset will contain an EOF character. This may or may not cause problems or result in an error message, so it is always a good practice to check record counts in SAS logs and use PROC MEANS to generate totals at various points for comparison.

    If you suspect a specific field within a SAS dataset, you can use the following SAS code to flag the damaged record(s):

    find1a = index(put(ARED,$HEX.),'1A');

  • where ARED is the field suspected of containing the EOF and the flag 'find1a' contains a numeric value indicating the position of EOF within ARED. find1a >= 1 means a stray EOF.
  • You can use the following SAS code to 'fix' the stray EOFs by replacing each with a single space character (' '):

    ARED=input(TRANWRD (put(ARED,$HEX.),'1A','20'),$hex12.);

  • where ARED is the field containing stray EOFs, and $hex12. corresponds to the field length X 2 (1 ASCII character = 2 hex characters).
  • FINAL NOTE PER SAS TECH SUPPORT WEBSITE...

    See SAS NOTES for further information on this topic:

    SN-V8+-003632

    When reading a binary file as text, the SAS System stops reading the input file after encountering Ctrl+Z character

    If the SAS System encounters a Ctrl+Z or Hex 1a character when reading

    a binary file as text, input stops because the character is treated as

    an end-of-file character. There is a new option for Version 8.2,

    IgnoreDOSEOF, which will allow these characters to be read.

    The option is used on an infile or filename statement, as in the

    following:

    data one;

    infile 'c:myfilesdosfile.txt' ignoredoseof;

    input foo $100.;

    run;

    or

    filename angus 'c:myfilesdosfile.txt' ignoredoseof;

    data one;

    infile angus;

    input foo $100.;

    run;

    A Technical Support hot fix for Release 8.1 TSLEVEL TS1M0 for this

    issue is available at:

    http://www.sas.com/techsup/download/hotfix/81_sbcs_prod_list.html#003632

    This problem is fixed in Release 8.2.

    Attachments

     

    Added 03/05/2002 by tpollak, Modified 09/27/2011 by kuttkeUI

    Comments

    No comments.

    Please login to add your own comments.