HDF

The HDF module contains classes to aid in Parallel IO of HDF files.

It includes HDF5 support and extends it to write VTKHDF files that can be shown in Paraview.

HDF5

Module that defines the hdf5 object to be used in pysemtools

class pysemtools.io.hdf.hdf5.HDF5File(comm: Comm, fname: str, mode: str, parallel: bool)

Class to write and read hdf5 files in parallel using h5py.

Open an hdf5 file based on inputs.

Parameters:

commMPI.Comm: MPI communicator.
fnamestr: Name of the hdf5 file to read or write.
modestr: Mode to open the file. Should be “r” for reading or “w” for writing.
parallelbool: Whether to use parallel I/O or not.

Methods

`close`([clean])	Close the hdf5 file object
`open`(fname, mode, parallel)	Open an hdf5 file based on inputs.
`read_dataset`(dataset_name[, dtype, ...])	Read a dataset from the hdf5 file object
`read_slices`(dataset_name[, dtype])	Read the slices hyperslabs from the file
`set_active_group`(group_name)	Set the active group to read or write data from.
`set_read_slices_external`(global_shape, slices)	Set the slices that should be read from the file based on external input.
`set_read_slices_linear_lb`(global_shape, ...)	Set the slices that should be read from the file.
`set_write_slices`(local_shape, distributed_axis)	Set the slices that should be written to the file.
`write_dataset`(dataset_name, data[, ...])	Write a dataset to the hdf5 file object
`write_slices`(dataset_name, data[, shape_in_file])	Write the hyperslab to the file.

close(clean: bool = True)

Close the hdf5 file object

Parameters:

cleanbool: Whether to clean the attributes that are assigned when opening a file. This is useful if the file object will be reused to open another file after closing the current one. Default is False.

open(fname: str, mode: str, parallel: bool)

Open an hdf5 file based on inputs.

This can be used to open a new file after closing the previous one.

Parameters:

fnamestr: Name of the hdf5 file to read or write.
modestr: Mode to open the file. Should be “r” for reading or “w” for writing.
parallelbool: Whether to use parallel I/O or not. If True, the file will be opened using the MPI-IO driver. If False, the file will be opened using the default driver.

read_dataset(dataset_name: str, dtype: ~numpy.dtype = <class 'numpy.float64'>, distributed_axis: int | None = None, slices: list | None = None, as_array_list_in_file: bool = False, ignore_metadata: bool = False)

Read a dataset from the hdf5 file object

Parameters:

dataset_namestr: Name of the dataset to read. Can include the group path, e.g. “/group1/group2/dataset”.
dtypenp.dtype: Data type to read the dataset in. Default is np.double.
distributed_axisint: Axis along which the data is distributed in parallel. This is required for parallel reading. Default is None.
sliceslist: Optional. List of slices to read from the dataset. In case it is known
as_array_list_in_filebool: Optional. default is False. Whether the data is stored as an array list in the file. This is useful if originally the data had a different shape but was flattened to 1d before writing. This will use the shape attribute stored in the file to do the partioning but will keep in mind that the data is stored as a 1d array to read properly.
ignore_metadatabool: Optional. default is False. Force to read the data ingnoring any shape metadata. This will just read the arrays as stored and will not try to assume an original shape

Returns:

local_datanp.ndarray: Data read from the file. This will be a local array with the shape determined by the global shape of the dataset and the parallel distribution. If slices are provided, the shape will be determined by the slices.

read_slices(dataset_name: str, dtype: ~numpy.dtype = <class 'numpy.float64'>)

Read the slices hyperslabs from the file

Parameters:

dataset_namestr: Name of the dataset to read. Can include the group path, e.g. “/group1/group2/dataset”.
dtypenp.dtype: Data type to read the dataset in. Default is np.double.

Returns:

local_datanp.ndarray: Data read from the file. This will be a local array with the shape determined by the global shape of the dataset and the parallel distribution. If slices are provided, the shape will be determined by the slices.

set_active_group(group_name: str)

Set the active group to read or write data from.

This is useful to avoid having to specify the group every time a dataset is read or written.

Parameters:

group_namestr: Name of the group to set as active. Can include the group path, e.g. “/group1/group2”. If the group does not exist, it will be created if the file is opened in write mode, otherwise an error will be raised.

set_read_slices_external(global_shape: tuple, slices: list)

Set the slices that should be read from the file based on external input.

slices need to be precomputed in this case

Parameters:

global_shapetuple: Shape of the global array to be read.
sliceslist: List of slices to read from the data set.

set_read_slices_linear_lb(global_shape: tuple, distributed_axis: int, explicit_strides: bool = False, shape_in_file: list | None = None)

Set the slices that should be read from the file.

Data is distributed in a linear load balanced way.

Parameters:

global_shapetuple: Shape of the global array to be read. This is required to determine the local shape and the slices to read from the file.
distributed_axisint: Axis along which the data is distributed in parallel. This is required to determine the local shape and the slices to read from the file.
explicit_stridesbool: Whether to use explicit strides to read the data. This is useful if the data is stored as 1D in the file but originally had a different shape.

set_write_slices(local_shape: tuple, distributed_axis: int, extra_global_entries: list[int] | None = None)

Set the slices that should be written to the file.

Obtain global shape from the local one

Parameters:

local_shapetuple: Shape of the local array to be written. This is required to determine the global shape and the slices to write to the file.
distributed_axisint: Axis along which the data is distributed in parallel.
extra_global_entrieslist[int], optional: List of extra entries to add to the global shape of the dataset. This is useful if the ranks are writing a certain amount of data but the global array should be bigger than what they collectively write. Default is None.

write_dataset(dataset_name: str, data: ndarray, distributed_axis: int | None = None, extra_global_entries: list[int] | None = None, shape_in_ram: tuple | None = None)

Write a dataset to the hdf5 file object

Parameters:

dataset_namestr: Name of the dataset to write. Can include the group path, e.g. “/group1/group2/dataset”.
datanp.ndarray: Data to write to the file.
distributed_axisint: Axis along which the data is distributed in parallel. This is required for parallel writing. Default is None.
extra_global_entrieslist[int]: Optional. List of extra entries to add to the global shape of the dataset. This is useful if the ranks are writing a certain amount of data but the global array should be bigger than what they collectively write.
shape_in_ramtuple: Optional. Shape of the data in RAM. This is useful if the data is stored in a different shape that it originally had, for example, if it is stored in a 1d array but originally it had a different shape. this will be the shape that is stored in the file in the attribute “shape” and can be used to reshape the data when reading it.

write_slices(dataset_name: str, data: ndarray, shape_in_file: tuple | None = None)

Write the hyperslab to the file.

Perform the write operations

Parameters:

dataset_namestr: Name of the dataset to write. Can include the group path, e.g. “/group1/group2/dataset”.
datanp.ndarray: Data to write to the file. This should have the same shape as the local shape determined by the set_write_slices method.
shape_in_filetuple, optional: Shape of the data to be stored in the file. This is useful if the data is stored in a different shape in the file than it is in RAM.

pysemtools.io.hdf.hdf5.find_merged_axes(global_shape, shape_in_file): Hleper function to determine which axis were merged between two shapes

VTKHDF

Module that defines the hdf5 object to be used in pysemtools

class pysemtools.io.hdf.vtkhdf.VTKHDFFile(comm: Comm, fname: str, mode: str, parallel: bool)

Class to write and read vtkhdf files in parallel using h5py. Open an hdf5 file based on inputs.

Parameters:

commMPI.Comm: MPI communicator.
fnamestr: Name of the hdf5 file to read or write.
modestr: Mode to open the file. Should be “r” for reading or “w” for writing.
parallelbool: Whether to use parallel I/O or not.

Methods

`close`([clean])	Close the hdf5 file object
`link_to_existing_mesh`(mesh_name)	Link to an existing mesh
`open`(fname, mode, parallel)	Open an hdf5 file based on inputs.
`read_dataset`(dataset_name[, dtype, ...])	Read a dataset from the hdf5 file object
`read_mesh_data`([dtype, distributed_axis])	Read the mesh data from the hdf5 file
`read_point_data`(dataset_name[, dtype, ...])	Read point data from the hdf5 file
`read_slices`(dataset_name[, dtype])	Read the slices hyperslabs from the file
`set_active_group`(group_name)	Set the active group to read or write data from.
`set_read_slices_external`(global_shape, slices)	Set the slices that should be read from the file based on external input.
`set_read_slices_linear_lb`(global_shape, ...)	Set the slices that should be read from the file.
`set_write_slices`(local_shape, distributed_axis)	Set the slices that should be written to the file.
`write_dataset`(dataset_name, data[, ...])	Write a dataset to the hdf5 file object
`write_mesh_data`(x, y, z[, distributed_axis])	Write the mesh data to the hdf5 file
`write_point_data`(dataset_name, data[, ...])	Write point data to the hdf5 file
`write_slices`(dataset_name, data[, shape_in_file])	Write the hyperslab to the file.

close(clean: bool = True)

Close the hdf5 file object

Parameters:

cleanbool: Whether to clean up the file after closing. This will delete the file from disk. Should only be used for testing.

link_to_existing_mesh(mesh_name: str)

Link to an existing mesh

Avoid rewriting mesh data if not necessary. It can be quite costly in storage.

Parameters:

mesh_namestr: Name of the hdf5 file to link to.

read_mesh_data(dtype: ~numpy.dtype = <class 'numpy.float64'>, distributed_axis: int = 0)

Read the mesh data from the hdf5 file

Parameters:

dtypenp.dtype: Data type to read the mesh data as. Should be a floating point type.
distributed_axisint: Axis along which the data is distributed in parallel. Should be 0 for now.

Returns:

xnp.ndarray: The x coordinates of the mesh points.
ynp.ndarray: The y coordinates of the mesh points.
znp.ndarray: The z coordinates of the mesh points.

read_point_data(dataset_name: str, dtype: ~numpy.dtype = <class 'numpy.float64'>, distributed_axis: int = 0)

Read point data from the hdf5 file

Parameters:

dataset_namestr: Name of the dataset to read. This should be the name of the dataset in the hdf5 file.
dtypenp.dtype: Data type to read the dataset as. Should be a floating point type.
distributed_axisint: Axis along which the data is distributed in parallel. Should be 0 for now.

Returns:

np.ndarray: The point data read from the hdf5 file. Will have the same shape as the mesh points.

write_mesh_data(x: ndarray, y: ndarray, z: ndarray, distributed_axis: int = 0)

Write the mesh data to the hdf5 file

Parameters:

xnp.ndarray: The x coordinates of the mesh points.
ynp.ndarray: The y coordinates of the mesh points.
znp.ndarray: The z coordinates of the mesh points.
distributed_axisint: Axis along which the data is distributed in parallel. Should be 0 for now.

write_point_data(dataset_name: str, data: ndarray, distributed_axis: int = 0)

Write point data to the hdf5 file

Parameters:

dataset_namestr: Name of the dataset to write. This will be used as the name of the dataset in the hdf5 file.
datanp.ndarray: Data to write. Should have the same number of points as the mesh.
distributed_axisint: Axis along which the data is distributed in parallel. Should be 0 for now.