HDF
The HDF module contains classes to aid in Parallel IO of HDF files.
It includes HDF5 support and extends it to write VTKHDF files that can be shown in Paraview.
HDF5
Module that defines the hdf5 object to be used in pysemtools
- class pysemtools.io.hdf.hdf5.HDF5File(comm: Comm, fname: str, mode: str, parallel: bool)
Class to write and read hdf5 files in parallel using h5py.
Open an hdf5 file based on inputs.
- Parameters:
- commMPI.Comm
MPI communicator.
- fnamestr
Name of the hdf5 file to read or write.
- modestr
Mode to open the file. Should be “r” for reading or “w” for writing.
- parallelbool
Whether to use parallel I/O or not.
Methods
close([clean])Close the hdf5 file object
open(fname, mode, parallel)Open an hdf5 file based on inputs.
read_dataset(dataset_name[, dtype, ...])Read a dataset from the hdf5 file object
read_slices(dataset_name[, dtype])Read the slices hyperslabs from the file
set_active_group(group_name)Set the active group to read or write data from.
set_read_slices_external(global_shape, slices)Set the slices that should be read from the file based on external input.
set_read_slices_linear_lb(global_shape, ...)Set the slices that should be read from the file.
set_write_slices(local_shape, distributed_axis)Set the slices that should be written to the file.
write_dataset(dataset_name, data[, ...])Write a dataset to the hdf5 file object
write_slices(dataset_name, data[, shape_in_file])Write the hyperslab to the file.
- close(clean: bool = True)
Close the hdf5 file object
- Parameters:
- cleanbool
Whether to clean the attributes that are assigned when opening a file. This is useful if the file object will be reused to open another file after closing the current one. Default is False.
- open(fname: str, mode: str, parallel: bool)
Open an hdf5 file based on inputs.
This can be used to open a new file after closing the previous one.
- Parameters:
- fnamestr
Name of the hdf5 file to read or write.
- modestr
Mode to open the file. Should be “r” for reading or “w” for writing.
- parallelbool
Whether to use parallel I/O or not. If True, the file will be opened using the MPI-IO driver. If False, the file will be opened using the default driver.
- read_dataset(dataset_name: str, dtype: ~numpy.dtype = <class 'numpy.float64'>, distributed_axis: int | None = None, slices: list | None = None, as_array_list_in_file: bool = False, ignore_metadata: bool = False)
Read a dataset from the hdf5 file object
- Parameters:
- dataset_namestr
Name of the dataset to read. Can include the group path, e.g. “/group1/group2/dataset”.
- dtypenp.dtype
Data type to read the dataset in. Default is np.double.
- distributed_axisint
Axis along which the data is distributed in parallel. This is required for parallel reading. Default is None.
- sliceslist
Optional. List of slices to read from the dataset. In case it is known
- as_array_list_in_filebool
Optional. default is False. Whether the data is stored as an array list in the file. This is useful if originally the data had a different shape but was flattened to 1d before writing. This will use the shape attribute stored in the file to do the partioning but will keep in mind that the data is stored as a 1d array to read properly.
- ignore_metadatabool
Optional. default is False. Force to read the data ingnoring any shape metadata. This will just read the arrays as stored and will not try to assume an original shape
- Returns:
- local_datanp.ndarray
Data read from the file. This will be a local array with the shape determined by the global shape of the dataset and the parallel distribution. If slices are provided, the shape will be determined by the slices.
- read_slices(dataset_name: str, dtype: ~numpy.dtype = <class 'numpy.float64'>)
Read the slices hyperslabs from the file
- Parameters:
- dataset_namestr
Name of the dataset to read. Can include the group path, e.g. “/group1/group2/dataset”.
- dtypenp.dtype
Data type to read the dataset in. Default is np.double.
- Returns:
- local_datanp.ndarray
Data read from the file. This will be a local array with the shape determined by the global shape of the dataset and the parallel distribution. If slices are provided, the shape will be determined by the slices.
- set_active_group(group_name: str)
Set the active group to read or write data from.
This is useful to avoid having to specify the group every time a dataset is read or written.
- Parameters:
- group_namestr
Name of the group to set as active. Can include the group path, e.g. “/group1/group2”. If the group does not exist, it will be created if the file is opened in write mode, otherwise an error will be raised.
- set_read_slices_external(global_shape: tuple, slices: list)
Set the slices that should be read from the file based on external input.
slices need to be precomputed in this case
- Parameters:
- global_shapetuple
Shape of the global array to be read.
- sliceslist
List of slices to read from the data set.
- set_read_slices_linear_lb(global_shape: tuple, distributed_axis: int, explicit_strides: bool = False, shape_in_file: list | None = None)
Set the slices that should be read from the file.
Data is distributed in a linear load balanced way.
- Parameters:
- global_shapetuple
Shape of the global array to be read. This is required to determine the local shape and the slices to read from the file.
- distributed_axisint
Axis along which the data is distributed in parallel. This is required to determine the local shape and the slices to read from the file.
- explicit_stridesbool
Whether to use explicit strides to read the data. This is useful if the data is stored as 1D in the file but originally had a different shape.
- set_write_slices(local_shape: tuple, distributed_axis: int, extra_global_entries: list[int] | None = None)
Set the slices that should be written to the file.
Obtain global shape from the local one
- Parameters:
- local_shapetuple
Shape of the local array to be written. This is required to determine the global shape and the slices to write to the file.
- distributed_axisint
Axis along which the data is distributed in parallel.
- extra_global_entrieslist[int], optional
List of extra entries to add to the global shape of the dataset. This is useful if the ranks are writing a certain amount of data but the global array should be bigger than what they collectively write. Default is None.
- write_dataset(dataset_name: str, data: ndarray, distributed_axis: int | None = None, extra_global_entries: list[int] | None = None, shape_in_ram: tuple | None = None)
Write a dataset to the hdf5 file object
- Parameters:
- dataset_namestr
Name of the dataset to write. Can include the group path, e.g. “/group1/group2/dataset”.
- datanp.ndarray
Data to write to the file.
- distributed_axisint
Axis along which the data is distributed in parallel. This is required for parallel writing. Default is None.
- extra_global_entrieslist[int]
Optional. List of extra entries to add to the global shape of the dataset. This is useful if the ranks are writing a certain amount of data but the global array should be bigger than what they collectively write.
- shape_in_ramtuple
Optional. Shape of the data in RAM. This is useful if the data is stored in a different shape that it originally had, for example, if it is stored in a 1d array but originally it had a different shape. this will be the shape that is stored in the file in the attribute “shape” and can be used to reshape the data when reading it.
- write_slices(dataset_name: str, data: ndarray, shape_in_file: tuple | None = None)
Write the hyperslab to the file.
Perform the write operations
- Parameters:
- dataset_namestr
Name of the dataset to write. Can include the group path, e.g. “/group1/group2/dataset”.
- datanp.ndarray
Data to write to the file. This should have the same shape as the local shape determined by the set_write_slices method.
- shape_in_filetuple, optional
Shape of the data to be stored in the file. This is useful if the data is stored in a different shape in the file than it is in RAM.
- pysemtools.io.hdf.hdf5.find_merged_axes(global_shape, shape_in_file)
Hleper function to determine which axis were merged between two shapes
VTKHDF
Module that defines the hdf5 object to be used in pysemtools
- class pysemtools.io.hdf.vtkhdf.VTKHDFFile(comm: Comm, fname: str, mode: str, parallel: bool)
Class to write and read vtkhdf files in parallel using h5py. Open an hdf5 file based on inputs.
- Parameters:
- commMPI.Comm
MPI communicator.
- fnamestr
Name of the hdf5 file to read or write.
- modestr
Mode to open the file. Should be “r” for reading or “w” for writing.
- parallelbool
Whether to use parallel I/O or not.
Methods
close([clean])Close the hdf5 file object
link_to_existing_mesh(mesh_name)Link to an existing mesh
open(fname, mode, parallel)Open an hdf5 file based on inputs.
read_dataset(dataset_name[, dtype, ...])Read a dataset from the hdf5 file object
read_mesh_data([dtype, distributed_axis])Read the mesh data from the hdf5 file
read_point_data(dataset_name[, dtype, ...])Read point data from the hdf5 file
read_slices(dataset_name[, dtype])Read the slices hyperslabs from the file
set_active_group(group_name)Set the active group to read or write data from.
set_read_slices_external(global_shape, slices)Set the slices that should be read from the file based on external input.
set_read_slices_linear_lb(global_shape, ...)Set the slices that should be read from the file.
set_write_slices(local_shape, distributed_axis)Set the slices that should be written to the file.
write_dataset(dataset_name, data[, ...])Write a dataset to the hdf5 file object
write_mesh_data(x, y, z[, distributed_axis])Write the mesh data to the hdf5 file
write_point_data(dataset_name, data[, ...])Write point data to the hdf5 file
write_slices(dataset_name, data[, shape_in_file])Write the hyperslab to the file.
- close(clean: bool = True)
Close the hdf5 file object
- Parameters:
- cleanbool
Whether to clean up the file after closing. This will delete the file from disk. Should only be used for testing.
- link_to_existing_mesh(mesh_name: str)
Link to an existing mesh
Avoid rewriting mesh data if not necessary. It can be quite costly in storage.
- Parameters:
- mesh_namestr
Name of the hdf5 file to link to.
- read_mesh_data(dtype: ~numpy.dtype = <class 'numpy.float64'>, distributed_axis: int = 0)
Read the mesh data from the hdf5 file
- Parameters:
- dtypenp.dtype
Data type to read the mesh data as. Should be a floating point type.
- distributed_axisint
Axis along which the data is distributed in parallel. Should be 0 for now.
- Returns:
- xnp.ndarray
The x coordinates of the mesh points.
- ynp.ndarray
The y coordinates of the mesh points.
- znp.ndarray
The z coordinates of the mesh points.
- read_point_data(dataset_name: str, dtype: ~numpy.dtype = <class 'numpy.float64'>, distributed_axis: int = 0)
Read point data from the hdf5 file
- Parameters:
- dataset_namestr
Name of the dataset to read. This should be the name of the dataset in the hdf5 file.
- dtypenp.dtype
Data type to read the dataset as. Should be a floating point type.
- distributed_axisint
Axis along which the data is distributed in parallel. Should be 0 for now.
- Returns:
- np.ndarray
The point data read from the hdf5 file. Will have the same shape as the mesh points.
- write_mesh_data(x: ndarray, y: ndarray, z: ndarray, distributed_axis: int = 0)
Write the mesh data to the hdf5 file
- Parameters:
- xnp.ndarray
The x coordinates of the mesh points.
- ynp.ndarray
The y coordinates of the mesh points.
- znp.ndarray
The z coordinates of the mesh points.
- distributed_axisint
Axis along which the data is distributed in parallel. Should be 0 for now.
- write_point_data(dataset_name: str, data: ndarray, distributed_axis: int = 0)
Write point data to the hdf5 file
- Parameters:
- dataset_namestr
Name of the dataset to write. This will be used as the name of the dataset in the hdf5 file.
- datanp.ndarray
Data to write. Should have the same number of points as the mesh.
- distributed_axisint
Axis along which the data is distributed in parallel. Should be 0 for now.