dataPreparation package
Submodules
dataPreparation.FileGenerator module
- dataPreparation.FileGenerator.BatchGenerator(fileList, batch_size, separator, validation)
Generator for batch files. Iterates over the list of files and yields batches and labels in a generator
- Parameters:
fileList – List of files to process
batch_size – Size of each batch in bytes ( default 10 )
separator – Separator between files ( default’’ ) e. g.
validation – True if validation is enabled False if not ( default
- dataPreparation.FileGenerator.FileNumLines(filename)
Count the number of newlines in a file. This is useful for checking the size of a file in order to avoid reading the whole file multiple times.
- Parameters:
filename – name of file to check. Must be a string
- Returns:
number of newlines in
- dataPreparation.FileGenerator.FilesNumLines(fileList)
Get the number of lines in a list of files. This is useful for debugging and to determine how many lines are in each file
- Parameters:
fileList – list of files to count
- Returns:
int of total number of lines in each file in the list in the same order as they appeared
- dataPreparation.FileGenerator.ReadBatchFromFile(openFileDescriptor, batch_size, separator)
Reads a batch of data from a file. This is a generator function that will be called by the Read () function of the data source.
- Parameters:
openFileDescriptor – An open file descriptor to the file to read
batch_size – The number of samples to read
separator – The separatin between samples and labels e. g.
- Returns:
Numpy arrays containg the xBatch y
- Return type:
A tenga definidos y yBatch
- dataPreparation.FileGenerator.SpectrumPreprocessor(spectrum)
Preprocesses a spectrum to make it easier to visualize. Savitzky - Golay filters are applied to the spectrum before scaling.
- Parameters:
spectrum – The spectrum to preprocess. Must be a list of length 1.
- Returns:
A list of length 1 containing the preprocessed spectrum. It is assumed that the spectrum has been filtered
- dataPreparation.FileGenerator.SpectrumsCounter(fileList)
Counts the number of spectra in each file and returns the total number of lines. This is useful for debugging
- Parameters:
fileList – List of files to be analysed
- Returns:
Total number of Spectrums in each file ( 1 or 0 if there are no files in the
- dataPreparation.FileGenerator.ValidationGenerator(fileList, batch_size, separator)
Generator for validation. This is a generator that yields batches of data and labels from a list of files
- Parameters:
fileList – List of files to process
batch_size – Batch size in rows and columns ( int )
separator – Separator between file names ( str ) e. g