- Overview
- Model building
- Model validation
- Model deployment
- Overview
- Publishing model versions
- Managing published model versions
- Building and consuming a workflow
- CLI
- API
- Frequently asked questions

Unstructured and complex documents user guide
You can consume the predictions of a published Unstructured and Complex Documents model version by building a workflow in UiPath Studio.
- Package installation
- Taxonomy definition
- Document digitization
- Document classification
- Document extraction
- Document validation
You must have a model published in an Unstructured and Complex Documents project.
When you start building your Studio workflow, you must decide what type of project you want to run: Windows or Cross-platform. Each project type requires different packages.
Regardless of the project type you choose, you can install the packages:
- Automatically - Use the Document Understanding Process template. For more details on how to search and install templates in Studio, check Project templates.
- Manually - For more details, check Installing packages. If you choose to manually install the packages, make sure you install the following versions or newer, based on the project type:
Windows
- UiPath.IntelligentOCR.Activities 6.22.0
- UiPath.System.Activities 24.10.6
Cross-platform
- UiPath.DocumentUnderstanding.Activities 2.12.0
- UiPath.System.Activities 24.10.6
- The IntelligentOCR package is compatible with Windows projects, not with cross-platform ones.
- You can build cross-platform workflows and use other templates in Studio Web.
The sections that follow contain the steps to apply if you choose not to use one of the Studio templates and start from scratch.
To build an IXP workflow for Windows projects, proceed as follows:
1. Installing the packages
Make sure you install the packages mentioned in the Prerequisites section.
2. Defining the taxonomy
- In Studio Desktop, create a basic process.
- When configuring your process, in the Compatibility field, select what type of workflow you want to build: Windows or Cross-platform. For more details, check About automation projects.
- Open Taxonomy Manager from the Design tab and set up your table fields as follows:
- Create a table field for every field group in your IXP project taxonomy.
- Add a column in the respective table field for each field defined in the field group.
Note:Taxonomy Manager:
- supports creating tables and fields. When you create IXP Unstructured and Complex Documents workflows, it is recommended to create table fields instead of just fields.
- is available only when the IntelligentOCR package is installed. This means that it is only available on Windows projects, not Cross-platform.
- Next, you must have a location where you can read documents from. For example, in the project folder, create a new folder named documents, and add a few files.
- In the Sequence, add an Assign activity to specify where you want to read documents from. Configure the following fields:
- Save to - Create and add a variable of type System.String[]. In this example, the variable is called docs.
- Value to save - Add
Directory.GetFiles("./documents").
- Add a Load Taxonomy activity to store the configured taxonomy in a variable to reference it in the rest of the automation. Create and add a variable
of type DocumentTaxonomy. In this example, the variable is called taxo.
Note: You need to map the variable to the output of the activity.
3. Digitizing a document
- Add a For Each activity to go through each document. For the input, add the docs variable you previously created.
- Drag and drop the following activity within For Each:
- Digitize document - Allows you to read the documents you provided and obtain the Document Object Model (DOM) output. Configure the following fields:
- Document Path - Add the doc variable. You can find the variable in the Item name you configured in the For Each activity. In this example, the item name is doc and represents the file path of the document you want to digitize.
- Document Text - Create and add the text variable.
- Document Object Model (DOM) - Create and add the dom variable.
- Digitize document - Allows you to read the documents you provided and obtain the Document Object Model (DOM) output. Configure the following fields:
4. Classifying a document
- Classify Document Scope - Allows you to classify the document being processed into one of the defined document types in your taxonomy.
For the inputs, add the following:
- Document Path - Add the doc variable.
- Document Text - Add the text variable.
- Document Object Model (DOM) - Add the dom variable.
- Taxonomy - Add the taxo variable.
For the outputs, add the following:
- Classification Results - Create and add a new variable ClassificationResults.In the Classify Document Scope, add the Generative Classifier activity to classify documents using generative models. Configure the activity as follows:
- Select Manage Field Details.
- In the Document Type column, select a document type.
- In the Field Details column, add an optional value to define additional details about the document type. This can be a short description of the document type. The maximum number of characters allowed is 1000.
- Select Save.
5. Extracting details from a document
- Drag and drop the following activity within For Each:
- Data Extraction Scope - Allows you to configure extractor activities.
For the inputs, add the following:
- Document Path – Add the doc variable.
- Document Text – Add the text variable.
- Document Object Model (DOM) – Add the dom variable.
- Taxonomy – Add the taxo variable.
- Classification Result – Add the ClassificationResults variable.
For the output, add the following:
- Extraction Results – Create and add a new variable ExtractionResults.
- Data Extraction Scope - Allows you to configure extractor activities.
- Within the Data Extraction Scope, add the Document Understanding Project Extractor activity to extract the document data.
When you add the project extractor activity within the scope, the Get Capabilities configuration window should open automatically.
- If the published project is hosted in a different organization or tenant, or is used in a hybrid setup, add the required details
in Get Capabilties as follows:
- Create an external application in the Automation Cloud Admin page. For more details, check Adding an external application.
- Copy the App ID and App Secret, where the app secret is the password.
- Create an Orchestrator asset credential.
- In the Get Capabilities window, add the credentials, the application ID and secret.
- Configure the rest of the fields as described in Document Understanding Project Extractor.
Note: When you copy the tenant URL, make sure it includes the organization and tenant names. For example,
https://staging.uipath.com/communicationsminingteam/IXPTesting, wherecommunicationsminingteamis the organization, andIXPTestingis the tenant.
- In the Document Understanding Project Extractor activity, add the asset path as input in the Runtime Credentials Asset property of the Document Understanding Project Extractor activity. The path should be in the form of <OrchestratorFolderName>/<AssetName>.
- If the workflow runs in the same organization and tenant where the project was published, select the published project in
the Document Understanding Project Extractor activity.
Note: The published model appears in the dropdown options if Studio is connected to the same organization or tenant that the model was published in. If the model does not appear, it may be because it is published in a different organization or tenant. In this case, apply the instructions from the section that follows, Consuming predictions from cross-organization, cross-tenant, or hybrid projects.
- Select Configure Extractors and use the wizard to map your taxonomy fields to the fields defined in the Unstructured and Complex Documents project.
Figure 1. The Studio Configure Extractors wizard
6. Validating a document
Optionally, you can configure decision criteria to determine whether human validation is required for the classification output.
This can be done using custom business rules or post-processing logic. You can also use custom decision criteria in a workflow to trigger validation, or you can set up field-level confidence thresholds. This decision criteria is contingent on the business process requirements and your use case's allowance for false positives, that is results that skip human validation but have been extracted incorrectly.
Based on these rules, you can control whether a document is automatically validated or is routed to human validation. For more details, check the section Validation settings in Establishing the structure.
- Add the Present Validation Station activity to validate in Validation Station.
The output ExtractionResults of the Data Extraction Scope activity will be the input of the Present Validation Station activity. For the input, add the ExtractionResults variable. For the output, create and add a new variable ValidatedExtractionResults.
- For the inputs, add the following:
- Document Path – Add the doc variable.
- Document Text – Add the text variable.
- Document Object Model (DOM) – Add the dom variable.
- Taxonomy – Add the taxo variable.
- Automatic Extraction Results – Add the ExtractionResults variable.
- For the output, add the following:
- Validated Extraction Results – Create and add a new variable ValidatedExtractionResults.
- Validated Extraction Results – Create and add a new variable ValidatedExtractionResults.
- Create Validation Artifacts or Create App Task when you have Apps that use the Content Validation component.
- Create Document Validation Action or Wait For Document Validation Action And Resume when you want to send the Validation Station to Action Center.
-
Validation Station
Human validation of the classification output is triggered by applying decision logic after the classification step, before the workflow proceeds to extraction. The decision is not automatic by default, it is explicitly controlled through confidence thresholds and business rules defined in the workflow.
The following list shows how human validation can be triggered:
- Classification confidence
evaluation
Each classification result includes confidence scores that indicate how certain the model is about the predicted document type. These scores are evaluated in the workflow to determine whether the classification is reliable.
- Confidence thresholds
You can define a minimum confidence threshold for classification. If the confidence score for the predicted document type falls below this threshold, the classification is considered uncertain, and the document is flagged for human validation.
- Business rules and conditional
logic
In addition to confidence thresholds, you can apply custom business rules, such as:
- Specific document types that always require manual review.
- Mismatches between expected and predicted document types.
- Rules based on how the document will be processed later. For example, documents that must be verified before extraction or approval.
- Triggering the validation
step
When the defined criteria are met, the workflow routes the document to a human validation step by invoking one of the validation mechanisms:
- Present Validation Station for in-robot validation.
- Create Validation Task for Action Center-based validation.
- Create Document Validation Artifacts for validation in Apps.
- Human confirmation or
correction
During validation, the human reviewer confirms or corrects the document type. The validated classification result is then used by subsequent steps, such as data extraction, ensuring that downstream processing is based on an approved document type.
To conclude, human validation for classification is triggered by workflow-controlled rules, typically based on confidence scores and business logic, which determine when a classification result requires manual review before the process continues.
When using workflows that leverage models for IXP Unstructured and complex documents, the Validation Station serves as a crucial interface for reviewing, confirming, and refining the extracted data. Validation Station shows how the model interpreted the document, allowing you to understand the extraction accuracy, identify uncertain areas, and make corrections where needed.
In Validation Station, the document type and its corresponding fields are displayed alongside the extracted values and confidence indicators.
For more details on the validation process, check the following resources:
The following table shows a comparison between the IXP workflows for Windows and Cross-platform projects:
| Windows | Cross-platform | |
|---|---|---|
| Packages required | Intelligent OCR | Document Understanding |
| Defining the taxonomy | The Taxonomy Manager option allows you to define the list of fields that will show in the Validation Station or included in the extraction results
object.
Note: Taxonomy Manager is available only when the Intelligent OCR package is installed.
| The Document Understanding package automatically reads and displays the fields defined in the IXP model schema. These fields are not configured through the workflow. |
- Overview
- Prerequisites
- Building an IXP workflow for Windows projects
- 1. Installing the packages
- 2. Defining the taxonomy
- 3. Digitizing a document
- 4. Classifying a document
- 5. Extracting details from a document
- 6. Validating a document
- Triggering human validation
- Interpreting Validation Station results from IXP models
- Comparing Windows and Cross-platform project workflows