HDF

The HDF module contains classes to aid in Parallel IO of HDF files.

It includes HDF5 support and extends it to write VTKHDF files that can be shown in Paraview.

HDF5

Module that defines the hdf5 object to be used in pysemtools

class pysemtools.io.hdf.hdf5.HDF5File(comm: Comm, fname: str, mode: str, parallel: bool)

Class to write and read hdf5 files in parallel using h5py.

Open an hdf5 file based on inputs.

Parameters:
commMPI.Comm

MPI communicator.

fnamestr

Name of the hdf5 file to read or write.

modestr

Mode to open the file. Should be “r” for reading or “w” for writing.

parallelbool

Whether to use parallel I/O or not.

Methods

close([clean])

Close the hdf5 file object

open(fname, mode, parallel)

Open an hdf5 file based on inputs.

read_dataset(dataset_name[, dtype, ...])

Read a dataset from the hdf5 file object

read_slices(dataset_name[, dtype])

Read the slices hyperslabs from the file

set_active_group(group_name)

Set the active group to read or write data from.

set_read_slices_external(global_shape, slices)

Set the slices that should be read from the file based on external input.

set_read_slices_linear_lb(global_shape, ...)

Set the slices that should be read from the file.

set_write_slices(local_shape, distributed_axis)

Set the slices that should be written to the file.

write_dataset(dataset_name, data[, ...])

Write a dataset to the hdf5 file object

write_slices(dataset_name, data[, shape_in_file])

Write the hyperslab to the file.

close(clean: bool = True)

Close the hdf5 file object

Parameters:
cleanbool

Whether to clean the attributes that are assigned when opening a file. This is useful if the file object will be reused to open another file after closing the current one. Default is False.

open(fname: str, mode: str, parallel: bool)

Open an hdf5 file based on inputs.

This can be used to open a new file after closing the previous one.

Parameters:
fnamestr

Name of the hdf5 file to read or write.

modestr

Mode to open the file. Should be “r” for reading or “w” for writing.

parallelbool

Whether to use parallel I/O or not. If True, the file will be opened using the MPI-IO driver. If False, the file will be opened using the default driver.

read_dataset(dataset_name: str, dtype: ~numpy.dtype = <class 'numpy.float64'>, distributed_axis: int | None = None, slices: list | None = None, as_array_list_in_file: bool = False, ignore_metadata: bool = False)

Read a dataset from the hdf5 file object

Parameters:
dataset_namestr

Name of the dataset to read. Can include the group path, e.g. “/group1/group2/dataset”.

dtypenp.dtype

Data type to read the dataset in. Default is np.double.

distributed_axisint

Axis along which the data is distributed in parallel. This is required for parallel reading. Default is None.

sliceslist

Optional. List of slices to read from the dataset. In case it is known

as_array_list_in_filebool

Optional. default is False. Whether the data is stored as an array list in the file. This is useful if originally the data had a different shape but was flattened to 1d before writing. This will use the shape attribute stored in the file to do the partioning but will keep in mind that the data is stored as a 1d array to read properly.

ignore_metadatabool

Optional. default is False. Force to read the data ingnoring any shape metadata. This will just read the arrays as stored and will not try to assume an original shape

Returns:
local_datanp.ndarray

Data read from the file. This will be a local array with the shape determined by the global shape of the dataset and the parallel distribution. If slices are provided, the shape will be determined by the slices.

read_slices(dataset_name: str, dtype: ~numpy.dtype = <class 'numpy.float64'>)

Read the slices hyperslabs from the file

Parameters:
dataset_namestr

Name of the dataset to read. Can include the group path, e.g. “/group1/group2/dataset”.

dtypenp.dtype

Data type to read the dataset in. Default is np.double.

Returns:
local_datanp.ndarray

Data read from the file. This will be a local array with the shape determined by the global shape of the dataset and the parallel distribution. If slices are provided, the shape will be determined by the slices.

set_active_group(group_name: str)

Set the active group to read or write data from.

This is useful to avoid having to specify the group every time a dataset is read or written.

Parameters:
group_namestr

Name of the group to set as active. Can include the group path, e.g. “/group1/group2”. If the group does not exist, it will be created if the file is opened in write mode, otherwise an error will be raised.

set_read_slices_external(global_shape: tuple, slices: list)

Set the slices that should be read from the file based on external input.

slices need to be precomputed in this case

Parameters:
global_shapetuple

Shape of the global array to be read.

sliceslist

List of slices to read from the data set.

set_read_slices_linear_lb(global_shape: tuple, distributed_axis: int, explicit_strides: bool = False, shape_in_file: list | None = None)

Set the slices that should be read from the file.

Data is distributed in a linear load balanced way.

Parameters:
global_shapetuple

Shape of the global array to be read. This is required to determine the local shape and the slices to read from the file.

distributed_axisint

Axis along which the data is distributed in parallel. This is required to determine the local shape and the slices to read from the file.

explicit_stridesbool

Whether to use explicit strides to read the data. This is useful if the data is stored as 1D in the file but originally had a different shape.

set_write_slices(local_shape: tuple, distributed_axis: int, extra_global_entries: list[int] | None = None)

Set the slices that should be written to the file.

Obtain global shape from the local one

Parameters:
local_shapetuple

Shape of the local array to be written. This is required to determine the global shape and the slices to write to the file.

distributed_axisint

Axis along which the data is distributed in parallel.

extra_global_entrieslist[int], optional

List of extra entries to add to the global shape of the dataset. This is useful if the ranks are writing a certain amount of data but the global array should be bigger than what they collectively write. Default is None.

write_dataset(dataset_name: str, data: ndarray, distributed_axis: int | None = None, extra_global_entries: list[int] | None = None, shape_in_ram: tuple | None = None)

Write a dataset to the hdf5 file object

Parameters:
dataset_namestr

Name of the dataset to write. Can include the group path, e.g. “/group1/group2/dataset”.

datanp.ndarray

Data to write to the file.

distributed_axisint

Axis along which the data is distributed in parallel. This is required for parallel writing. Default is None.

extra_global_entrieslist[int]

Optional. List of extra entries to add to the global shape of the dataset. This is useful if the ranks are writing a certain amount of data but the global array should be bigger than what they collectively write.

shape_in_ramtuple

Optional. Shape of the data in RAM. This is useful if the data is stored in a different shape that it originally had, for example, if it is stored in a 1d array but originally it had a different shape. this will be the shape that is stored in the file in the attribute “shape” and can be used to reshape the data when reading it.

write_slices(dataset_name: str, data: ndarray, shape_in_file: tuple | None = None)

Write the hyperslab to the file.

Perform the write operations

Parameters:
dataset_namestr

Name of the dataset to write. Can include the group path, e.g. “/group1/group2/dataset”.

datanp.ndarray

Data to write to the file. This should have the same shape as the local shape determined by the set_write_slices method.

shape_in_filetuple, optional

Shape of the data to be stored in the file. This is useful if the data is stored in a different shape in the file than it is in RAM.

pysemtools.io.hdf.hdf5.find_merged_axes(global_shape, shape_in_file)

Hleper function to determine which axis were merged between two shapes

VTKHDF

Module that defines the hdf5 object to be used in pysemtools

class pysemtools.io.hdf.vtkhdf.VTKHDFFile(comm: Comm, fname: str, mode: str, parallel: bool)

Class to write and read vtkhdf files in parallel using h5py. Open an hdf5 file based on inputs.

Parameters:
commMPI.Comm

MPI communicator.

fnamestr

Name of the hdf5 file to read or write.

modestr

Mode to open the file. Should be “r” for reading or “w” for writing.

parallelbool

Whether to use parallel I/O or not.

Methods

close([clean])

Close the hdf5 file object

link_to_existing_mesh(mesh_name)

Link to an existing mesh

open(fname, mode, parallel)

Open an hdf5 file based on inputs.

read_dataset(dataset_name[, dtype, ...])

Read a dataset from the hdf5 file object

read_mesh_data([dtype, distributed_axis])

Read the mesh data from the hdf5 file

read_point_data(dataset_name[, dtype, ...])

Read point data from the hdf5 file

read_slices(dataset_name[, dtype])

Read the slices hyperslabs from the file

set_active_group(group_name)

Set the active group to read or write data from.

set_read_slices_external(global_shape, slices)

Set the slices that should be read from the file based on external input.

set_read_slices_linear_lb(global_shape, ...)

Set the slices that should be read from the file.

set_write_slices(local_shape, distributed_axis)

Set the slices that should be written to the file.

write_dataset(dataset_name, data[, ...])

Write a dataset to the hdf5 file object

write_mesh_data(x, y, z[, distributed_axis])

Write the mesh data to the hdf5 file

write_point_data(dataset_name, data[, ...])

Write point data to the hdf5 file

write_slices(dataset_name, data[, shape_in_file])

Write the hyperslab to the file.

close(clean: bool = True)

Close the hdf5 file object

Parameters:
cleanbool

Whether to clean up the file after closing. This will delete the file from disk. Should only be used for testing.

Link to an existing mesh

Avoid rewriting mesh data if not necessary. It can be quite costly in storage.

Parameters:
mesh_namestr

Name of the hdf5 file to link to.

read_mesh_data(dtype: ~numpy.dtype = <class 'numpy.float64'>, distributed_axis: int = 0)

Read the mesh data from the hdf5 file

Parameters:
dtypenp.dtype

Data type to read the mesh data as. Should be a floating point type.

distributed_axisint

Axis along which the data is distributed in parallel. Should be 0 for now.

Returns:
xnp.ndarray

The x coordinates of the mesh points.

ynp.ndarray

The y coordinates of the mesh points.

znp.ndarray

The z coordinates of the mesh points.

read_point_data(dataset_name: str, dtype: ~numpy.dtype = <class 'numpy.float64'>, distributed_axis: int = 0)

Read point data from the hdf5 file

Parameters:
dataset_namestr

Name of the dataset to read. This should be the name of the dataset in the hdf5 file.

dtypenp.dtype

Data type to read the dataset as. Should be a floating point type.

distributed_axisint

Axis along which the data is distributed in parallel. Should be 0 for now.

Returns:
np.ndarray

The point data read from the hdf5 file. Will have the same shape as the mesh points.

write_mesh_data(x: ndarray, y: ndarray, z: ndarray, distributed_axis: int = 0)

Write the mesh data to the hdf5 file

Parameters:
xnp.ndarray

The x coordinates of the mesh points.

ynp.ndarray

The y coordinates of the mesh points.

znp.ndarray

The z coordinates of the mesh points.

distributed_axisint

Axis along which the data is distributed in parallel. Should be 0 for now.

write_point_data(dataset_name: str, data: ndarray, distributed_axis: int = 0)

Write point data to the hdf5 file

Parameters:
dataset_namestr

Name of the dataset to write. This will be used as the name of the dataset in the hdf5 file.

datanp.ndarray

Data to write. Should have the same number of points as the mesh.

distributed_axisint

Axis along which the data is distributed in parallel. Should be 0 for now.