The best way to get started with this topic is to look at an example.
Let's assume that you must read a large sum of data from a binary
file and store it in an array for further processing.
Java I/O is based on streams that represent a sequence of bytes.
First, you must choose a stream type. We are working with binary
data, so the FileInputStream class
is the correct choice.
You should consider using the FileReaderclass when working with character data
streams.
We can open a connection to an actual file like this: InputStream in = new
FileInputStream (fileName);
Effective approach for reading large data when time and memory
allocation have to be considered to improve overall system performance.
Keeping performance
issues in mind
At this point, it is possible to read data from the file, but let's
take a closer look at other classes from the java.io package.
The BufferedInputStream
class is a wrapper for
input streams, allowing buffering of its input and improving
the reading process. You can connect to a file like this:
InputStream is =
new BufferedInputStream (new FileInputStream (fileName));
Reading the file
When you've connected to the file, you can start reading from it.
The InputStream class has two main methods for reading data:
int read()
int read(byte[] b,int off,int len).
The first method reads only one byte of data at a time, whereas the
second one reads up to len bytes of data
from the stream into an array of bytes.
Example 1
Obviously, the
second method gains in performance, so we'll use it as presented
in Listing A.
This listing has several interesting aspects. First, because the file
is big, we allocate a rather big buffer (20 Mb) when calling
the read method. The bigger the buffer,
the faster all data is read. Actually, it is sometimes possible to
know in advance the number of bytes that can be read from an input
stream without blocking and allocate a buffer of the same size.
This is accomplished by calling the available method.
Unfortunately, this
method does not always return correct results and can throw an
exception. This is the case while reading database data as a
long or BLOBvia
a stream.
Second, all arrays
are initialized outside of the while loop,
meaning out, buf, and tmp arrays are reused, so less
objects are to be garbage-collected.
Third, when the buffer is filled with part of the data, it is
copied into
a growing array by calling the System.arraycopy
method.
Although this algorithm is quite efficient, every read loop
creates a temporal array and performs two array copies.
Example 2
You can reduce
data copying and array allocation by modifying
the while loop as shown inListing B.
Here, instead of storing intermediate data in a big array and
extending it every time data is retrieved,
it is maintained in a list, where each element
contains only a piece of data.
When the end of the stream is reached, the data can be taken
from the list and merged into a single array. This allows you to save one array
allocation and one copy operation. If you don't immediately need a
whole data as an array, you can return the list itself and thus save
some more time and resources. Reading data using this algorithm can be
significantly faster than using the first one (Listing A). The
difference in speed depends on the buffer array size that is used
by read method.