|
Sort Plug-In, Faster Transforms, and Safe Test
Data
Challenges:
Large data volumes (i.e. more than one million rows) can be slow to transform, even after consulting and tuning are employed. Particular bottlenecks are large sorts, joins, aggregations, loads, and sometimes unloads. Parallelization or optimization in other layers or tools can be somewhat unwieldy, if not expensive, and may create adverse performance impacts on other users.
Solutions:
1) CoSort Sort Stage Plug-In for DataStage
Speed sorting directly within DataStage Server Edition with CoSort's unique
Sort Stage Plug-In for DataStage. This can improve sort performance up
to 10X with no interface changes. Subsequent join, aggregation, and load
runtimes should also benefit.
2) Fast Transformations alongside DataStage
By running the CoSort product's Sort Control Language (SortCL) program alongside IBM
WebSphere DataStage Server or Enterprise Edition in the file system, you
can perform fast sorts, joins, and aggregations -- all in the same job
script and I/O pass. While running large data transformation tasks in
parallel, you can also specify file-format and data-type conversions,
field-level encryption and other data privacy functions, custom reports,
and pre-sorted load files.
If you still wish to use the aggregation stage in DataStage, CoSort can
help you improve its performance. Add a sequential file stage prior to
the aggregation stage, and run a SortCL script to externally pre-sort
the file on break keys. Then, define the sorted fields in the aggregation
stage.
3) Safe Test Data for DataStage
IRI's RowGen package can generate safe, realistic test data against CoSort
metadta, .dsx-defined files, and your RDB data models. RowGen users can
create intelligent test data from random computation and/or set-file selection,
and they can further format that data with the same data manipulation
and formatting capabilities within CoSort.
To facilitate CoSort operations alongside DataStage, as well as the creation
of realistic test data for DataStage, Meta Integration Technology's Model
Bridge (MIMB) software can create SortCL and RowGen data definition files
from the flat-file layouts you have already defined in .dsx format. This
saves you from having to manually re-write all your input and output file
field layouts, making it easier to run these tools with DataStage!
See also:
FAQ
> DataStage
Solutions > Data Transformation
Solutions > Field Protection
Solutions > Business Intelligence
Products > CoSort > SortCL
Products > CoSort > SortCL Metadata
Products > RowGen (Test Data)
Meta Integration Model Bridge
|
1-800-333-SORT
1-321-777-8889
|