Windows AI – Click to Do with an Uno Platform Application as an IActionProvider

In my previous post on Click to Do we created an application that is activated via a URI triggered by Click to Do. In this post we’ll register with Click to Do by implementing the IActionProvider interface and registering as COM Server. As with the previous post on Click to Do, this only works with the WinAppSdk target for the Uno Platform application.

Similar to how we implemented Click to Do using URI protocol activation, there are three steps in setting up the Uno Platform application (well, in fact, any Windows App Sdk based application) to integrate with Click to Do using the IActionProvider interface:

  1. Include action registration JSON file as Content in your application.
  2. Register for the com.microsoft.windows.ai.actions extension in the packageappx.manifest, as well as defining a windows.comServer extension.
  3. Implement the IActionProvider interface, including starting and registering the COM Server.

We’ll start with another newly created Uno Platform application, BusinessSearchActionProviderApp, using the Blank preset and making sure we add Windows App SDK in the Platforms section of the wizard.

Actions Registration

Like we did previously, add a new file into the Assets folder. We’ll call it actions.json but you can call it what you want, just make sure you use the same name later in the packagappx.manifest. In this file, include the registration for your actions, for example.

{
  "version": 1,
  "actions": [
    {
      "id": "BusinessSearchActionProvider.Actions.NameSearch",
      "description": "Name search with action provider",
      "inputs": [
        {
          "name": "Message",
          "kind": "Text"
        }
      ],
      "outputs": [],
      "instantiationDescription": "Business Name Search Action Provider: '${Message.ShortText}'",
      "invocation": {
        "type": "COM",
        "clsid": "4531D13F-5953-432E-8841-53A58EA26DFE"
      }
    }
  ]
}

The details for the JSON schema can found in the documentation but in brief, this JSON snippet registers a single action, BusinessSearchActionProvider.UriActions.NameSearch, that accepts a single input, Message, that is Text. The application will be invoked via COM – make sure you select a new GUID for your action provider (In Visual Studio, you can use the Create GUID item from the Tools menu).

Package Manifest

As we did in the previous post, we need to register the com.microsoft.windows.ai.actions extension, specifying the name of the JSON file that includes the actions to register.

The following packageappx.manifest has been modified to include additional namespacess uap3, com, com2 and com3, and an Extensions block that registers the actions extension, specifying the Actions.json file. It also registers a windows.comServer extensions, which will act as the entry point for the application when invoked from Click to Do. It’s important that the GUID specified in this extension matches the GUID specified in the Actions.json file.

<?xml version="1.0" encoding="utf-8"?>
<Package
  xmlns="http://schemas.microsoft.com/appx/manifest/foundation/windows10"
  xmlns:uap="http://schemas.microsoft.com/appx/manifest/uap/windows10"
  xmlns:uap3="http://schemas.microsoft.com/appx/manifest/uap/windows10/3"
  xmlns:com="http://schemas.microsoft.com/appx/manifest/com/windows10"
  xmlns:com2="http://schemas.microsoft.com/appx/manifest/com/windows10/2"
  xmlns:com3="http://schemas.microsoft.com/appx/manifest/com/windows10/3"
  xmlns:rescap="http://schemas.microsoft.com/appx/manifest/foundation/windows10/restrictedcapabilities"
  IgnorableNamespaces="uap rescap">

    <Identity />
    <Properties />

    <Dependencies>
        <TargetDeviceFamily Name="Windows.Universal" MinVersion="10.0.17763.0" MaxVersionTested="10.0.19041.0" />
        <TargetDeviceFamily Name="Windows.Desktop" MinVersion="10.0.17763.0" MaxVersionTested="10.0.19041.0" />
    </Dependencies>

    <Resources>
        <Resource Language="x-generate"/>
    </Resources>

    <Applications>
        <Application Id="App"
          Executable="$targetnametoken$.exe"
          EntryPoint="$targetentrypoint$">
            <uap:VisualElements />
            <Extensions>
                <uap3:Extension Category="windows.appExtension">
                    <uap3:AppExtension
                    Name="com.microsoft.windows.ai.actions"
                    Id="BusinessSearchActionsCOMServer"
                    DisplayName="Actions for Business Search COM Server"
                    PublicFolder="Assets">
                        <uap3:Properties>
                            <Registration>Actions.json</Registration>
                        </uap3:Properties>
                    </uap3:AppExtension>
                </uap3:Extension>
                <com2:Extension Category="windows.comServer">
                    <com2:ComServer>
                        <com3:ExeServer Executable="BusinessSearchActionProviderApp.exe" DisplayName="Business Search App COM Server">
                            <com:Class Id="4531D13F-5953-432E-8841-53A58EA26DFE" DisplayName="Business Search App COM Server" />
                        </com3:ExeServer>
                    </com2:ComServer>
                </com2:Extension>
            </Extensions>
        </Application>
    </Applications>

    <Capabilities>
        <rescap:Capability Name="runFullTrust" />
    </Capabilities>
</Package>

Application and Startup Logic

In order to register the COM server, we first need to disable the startup logic that the WinAppSdk tooling generates. Add the following property to the csproj

<DefineConstants>DISABLE_XAML_GENERATED_MAIN</DefineConstants> 

With the property set, you’re now responsible for the startup logic for the application. Let’s create Program.cs in the Platforms\Windows folder and add the following code.

using Shmuelie.WinRTServer;
using System;
using System.Threading.Tasks;
using Shmuelie.WinRTServer.CsWinRT;
using System.Diagnostics;


namespace BusinessSearchActionProviderApp;

public static class Program
{
    [STAThread]
    static async Task Main(string[] args)
    {
        WinRT.ComWrappersSupport.InitializeComWrappers();

        await using (ComServer server = new())
        {
            var sampleActionProvider = new BusinessSearchActionProvider();
            server.RegisterClass<BusinessSearchActionProvider, Windows.AI.Actions.Provider.IActionProvider>(() => sampleActionProvider);
            server.Start();

            Application.Start((p) =>
            {
                var context = new Microsoft.UI.Dispatching.DispatcherQueueSynchronizationContext(Microsoft.UI.Dispatching.DispatcherQueue.GetForCurrentThread());
                SynchronizationContext.SetSynchronizationContext(context);
                new App();
            });

            // Regular disposing of the ComServer is broken (https://github.com/shmuelie/Shmuelie.WinRTServer/issues/28).
            // We instead call the UnsafeDispose method to dispose it.
            server.UnsafeDispose();
        }
    }
}

In order for this to compile you’ll need to add some package references (which includes the ComServer implementation that simplifies registering a COM Server) and the BusinessSearchActionProvider (which implements the IActionProvider interface). Here are the package references you’ll need

    <ItemGroup Condition="'$(TargetFramework)'=='net9.0-windows10.0.26100'">
        <PackageReference Include="Microsoft.Windows.CsWin32">
            <PrivateAssets>all</PrivateAssets>
            <IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets>
        </PackageReference>
        <PackageReference Include="Microsoft.Windows.CsWinRT" />
        <PackageReference Include="Shmuelie.WinRTServer" />
    </ItemGroup>

And then the IActionProvider implementation. Make sure the Guid attribute value matches the Guid used in the Actions.json and the packageappx.manifest.

using System.Runtime.InteropServices;
using Windows.AI.Actions;
using Windows.Foundation;
using WinRT;

namespace BusinessSearchActionProviderApp;

// Class is declared partial to allow cswinrt insert generated code to enable marshalling without using reflection
[Guid("4531D13F-5953-432E-8841-53A58EA26DFE")]
public partial class BusinessSearchActionProvider : Windows.AI.Actions.Provider.IActionProvider
{
    public IAsyncAction InvokeAsync(ActionInvocationContext context)
    {
        return InvokeAsyncHelper(context).AsAsyncAction();
    }

    private static async Task InvokeAsyncHelper(ActionInvocationContext context)
    {
        string result = "UnknownResult";

        if (context.ActionId.StartsWith("BusinessSearchActionProvider.Actions.NameSearch", StringComparison.Ordinal))
        {
            bool found = false;
            NamedActionEntity[] inputs = context.GetInputEntities();
            foreach (NamedActionEntity namedEntity in inputs)
            {
                if ((namedEntity.Name.Equals("Message") || namedEntity.Name.Equals("Contact")) && namedEntity.Entity.Kind == ActionEntityKind.Text)
                {
                    found = true;

                    TextActionEntity textEntity = CastToType<ActionEntity, TextActionEntity>(namedEntity.Entity);
                    string message = textEntity.Text;

                    await EnsureAppIsInitialized();

                    var completion = new TaskCompletionSource<string>();
                    App.MainWindow.DispatcherQueue.TryEnqueue(async () =>
                    {
                        var messageResult = await ((App.MainWindow!.Content as Frame)!.Content as MainPage)!.AddMessageAsync(message);
                        completion.SetResult(messageResult);
                    });
                    result = await completion.Task;
                }
            }

            if (!found)
            {
                context.ExtendedError = new KeyNotFoundException();
                context.Result = ActionInvocationResult.Unsupported;
            }
        }
        else
        {
            context.ExtendedError = new NotImplementedException();
            context.Result = ActionInvocationResult.Unsupported;
        }

        ActionEntity responseEntity = context.EntityFactory.CreateTextEntity(result);
        context.SetOutputEntity("MessageCount", responseEntity);
    }

    private static async Task EnsureAppIsInitialized()
    {
        if (App.loaded.Task.IsCompleted)
        {
            return;
        }

        await App.loaded.Task;
    }

    public static TTo CastToType<TFrom, TTo>(TFrom obj)
    {
        IntPtr abiPtr = default;
        try
        {
            abiPtr = MarshalInspectable<TFrom>.FromManaged(obj);
            return MarshalInspectable<TTo>.FromAbi(abiPtr);
        }
        finally
        {
            MarshalInspectable<object>.DisposeAbi(abiPtr);
        }
    }
}

As you can see from this code, this relies on being able to call the AddMessageAsync on the MainPage (the current Window Content). Here’s the codebehind for MainPage:

using System.Collections.ObjectModel;

namespace BusinessSearchActionProviderApp;

public sealed partial class MainPage : Page
{
    public MainPage()
    {
        this.InitializeComponent();
    }

    private ObservableCollection<string> Messages { get; } = new ObservableCollection<string>();


    public async Task<string> AddMessageAsync(string message)
    {
        TaskCompletionSource<string> tcs = new();
        DispatcherQueue.TryEnqueue(() =>
        {
            Messages.Add(message);
            tcs.SetResult("Message received");
        });

        return await tcs.Task;
    }
}

And that’s it for the major components required to create and register an IActionProvider implementation to be invoked by Click to Do.

Note: As at the time of writing Click to Do seems to be broken in the latest dev insider preview. With each iteration of Windows there appears to be improvements across the board, so one can only hope that Click to Do will continue to evolve and improve.

Leave a comment