0. 起因
因项目需要处理大批量的SOCKET交易数据,需要先将收到的大批量数据保存至文件,然后读取文件,解析文件内容进行异步数据处理。问题就出现在文件读取这一步。
问题的具体表现为:SOCKET收到的是一个完整的报文,但是,读取到的文件内容并不是完整的报文,报文最后是被截断的报文明细。
1. 排查
一开始以为问题表现出来是读取的文件内容是被截断的,后续进行排查时,偶然发现每次读取的文件内容长度都是设定数组长度的整数倍。 后续分析了代码,发现了错误。
原代码如下:
try {
RandomAccessFile raf = new RandomAccessFile(file, "rw");
FileChannel fc = raf.getChannel();
FileLock lock = fc.tryLock();
if (null == lock) {
raf.close();
fc.close();
return;
}
InputStream is = new BufferedInputStream(new FileInputStream(raf.getFD()));
byte[] readBytes = new byte[1024];
StringBuffer result = new StringBuffer();
try {
while((raf.read(readBytes) != -1) {
result.append(new String(readBytes));
}
System.out.println(result);
System.out.println(result.length());
} finally {
lock.release();
raf.close();
fc.close();
is.close();
}
} catch (FileNotFoundException e) {
System.out.println("文件不存在");
} catch(NonWritableChannelException e) {
System.out.println("文件读取失败");
} catch (IOException e) {
e.printStackTrace();
}
错误就出在加粗的那三行代码,也就是raf.read()和result.append()中。
2. 分析
首先,我们来看下raf.read()方法的注释:
/**
* Reads up to {@code b.length} bytes of data from this file
* into an array of bytes. This method blocks until at least one byte
* of input is available.
* Although {@code RandomAccessFile} is not a subclass of
* {@code InputStream}, this method behaves in exactly the
* same way as the {@link InputStream#read(byte[])} method of
* {@code InputStream}.
*
* @param b the buffer into which the data is read.
* @return the total number of bytes read into the buffer, or
* {@code -1} if there is no more data because the end of
* this file has been reached.
* @exception IOException If the first byte cannot be read for any reason
* other than end of file, or if the random access file has been closed, or if
* some other I/O error occurs.
* @exception NullPointerException If {@code b} is {@code null}.
*/
根据注释说明,可以看到RandomAccessFile.read()方法和InputStream.read()方法作用一致,所以,我们继续看InputStream.read(byte [])方法的注释:
public int read(byte[] b)
throws IOException
Reads some number of bytes from the input stream and stores them intothe buffer array b. The number of bytes actually read isreturned as an integer. This method blocks until input data isavailable, end of file is detected, or an exception is thrown.
If the length of b is zero, then no bytes are read and 0 is returned; otherwise, there is an attempt to read atleast one byte. If no byte is available because the stream is at theend of the file, the value -1 is returned; otherwise, atleast one byte is read and stored into b.
The first byte read is stored into element b[0], thenext one into b[1], and so on. The number of bytes read is,at most, equal to the length of b. Let k be thenumber of bytes actually read; these bytes will be stored in elements b[0] through b[k-1],leaving elements b[k] through b[b.length-1] unaffected.
The read(b) method for class InputStreamhas the same effect as:
read(b, 0, b.length)
Parameters:b - the buffer into which the data is read.Returns:the total number of bytes read into the buffer, or -1 if there is no more data because the end ofthe stream has been reached.Throws:IOException - If the first byte cannot be read for any reasonother than the end of the file, if the input stream has been closed, orif some other I/O error occurs.NullPointerException - if b is null.See Also:read(byte[], int, int)
注释中最关键的一句是这一句:these bytes will be stored in elements b[0] through b[k-1],leaving elements b[k] through b[b.length-1] unaffected. 如果读取的k小于设定的数组长度,则b[k]至b[b.length-1]是不会受影响的。
所以,真相大白了:由于前一次读取了1024字节,整个数组的元素全部不为空,到最后一次读取时,由于读取到的字节数少于1024字节,比如512个字节,则剩余512个字节没有更新,所以表现出来的状况就时报文被截断了,而读取的内容字节长度刚好是设定数组的整数倍。
3. 解决方案
既然知道原因了,就可以思考解决方案了。既然是多读取造成的问题,那么解决方案就是按实际读取的字节数读取数据,修改后的代码如下:
try {
RandomAccessFile raf = new RandomAccessFile(file, "rw");
FileChannel fc = raf.getChannel();
FileLock lock = fc.tryLock();
if (null == lock) {
raf.close();
fc.close();
return;
}
InputStream is = new BufferedInputStream(new FileInputStream(raf.getFD()));
byte[] readBytes = new byte[1024];
StringBuffer result = new StringBuffer();
int len = 0;
try {
while((len = raf.read(readBytes)) != -1) {
result.append(new String(readBytes, 0, len));
}
System.out.println(result);
System.out.println(result.length());
} finally {
lock.release();
raf.close();
fc.close();
is.close();
}
} catch (FileNotFoundException e) {
System.out.println("文件不存在");
} catch(NonWritableChannelException e) {
System.out.println("文件读取失败");
} catch (IOException e) {
e.printStackTrace();
}
4. 总结
以前对于输入流读取时,并不了解其他程序员为何要关心读取到的长度,经过这次踩坑,也明白了这么做的目的。其实,对于输入流,都应该按实际读取的字节数获取数组元素,确保读取的内容准确无误。