Fixing drag and drop in Electron

By Erik Jälevik on 22 April 2019

Drag and drop from Electron apps to other applications is broken. One possible way to work around it is by writing native Node modules replacing Electron’s implementation. That way we can support dragging multiple files out of our app with full support for modifier keys. This article describes how to do this for Windows and MacOS.

But be warned, you have to want drag and drop really, really badly to go down this route. It is a lot of work, as it involves use of native Node modules, the Win32 and Cocoa APIs, and the C, C++ and Objective-C languagues. And in the end it’s still not quite perfect. If that’s not enough to deter you, do read on to embark on a journey to the heart of darkness.

Setting the scene

For a file browser like Fileside, drag and drop is a crucial core feature. It needs to support dragging of any selection of files and folders to other panes within its own window, as well as out to other applications running on the computer. Likewise, it needs to accept files dragged into it from external programs like Finder or Explorer.

Under the hood, Fileside uses GitHub’s Electron framework, which allows you to build cross-platform desktop applications using web technologies. Electron is essentially a Chromium browser instance which has been retro-fitted with Node.js and a few Electron-specific APIs for interacting with the desktop OS on which it runs.

Drag and drop in the browser

Dragging files around between different panes within the app can be adequately implemented using the constructs made available by the HTML5 drag and drop API.

To initiate a drag, we listen for the dragstart event on a DOM element which has been marked as draggable, in response to which we fill in its dataTransfer property with information about the data to be dragged and what to display as a drag image.

To receive a drop, we register for the dragover, dragenter, dragleave and drop events on another DOM element. There is some ballroom dancing involved and a few gotchas to grok (the most annoying one being having to count the number of entries and exits in order to highlight a drop target under a dragged item), but since we only have one browser to worry about, this is manageable.

By design, a web browser does not allow drags to leave a web page and enter the world of the operating system underneath. Electron does extend Chromium with this functionality, in the form of the webContents.startDrag function, but unfortunately it comes with some limitations.

What’s the problem?

Using Electron’s startDrag, the following caveats apply:

It’s only possible to drag one file at a time on Windows.
It’s not possible to hold down modifier keys (Alt, Cmd etc) to control the type of the drag (copy, move, link etc.), and the mouse cursor always shows a plus icon.

This isn’t really going to cut it for an application dedicated to managing files.

To work around these limitations, we will take over the initiating side of a drag with our custom implementation of startDrag. The receiving side, i.e. the drop handling, can remain as a standard HTML5 implementation.

At this point, I should mention that another option would be to fork the Electron codebase itself, and modify it to fit our needs. However, since this adds the extra burden of having to maintain our own custom version of Electron, it should really be seen as a last resort.

Going native

What does going native mean in the context of an Electron app? There are two different options depending on how native you want to go.

You can either use a foreign function interface (FFI) wrapper, obviating the need to get your hands dirty with C++ code, or write a C++ native Node module talking to the OS directly.

Diet native - Foreign function interface

A foreign function interface acts as a bridge from one language or runtime environment to another. The FFI approach ought to be the quicker route to native, hence it makes sense to try it first.

There are a few NPM packages that make native APIs available to Node applications through this technique. The two I’ve spent some time investigating are NodeRT for Windows and objc for Mac.

Out of the two, I’ve so far got a good impression of objc, despite it being very young and more or less experimental. I previously used it to access the MacOS Trash API successfully.

NodeRT, I found poorly documented, and quite a pain to get up and running. There are many hidden assumptions, strict version requirements, config files with hard-coded paths to particular Visual Studio installs etc. Extensive fiddling was required just to get it to build. In addition, the relatively new WinRT API whose functionality it exposes does not seem to be widely used, and its version of the drag and drop API is quite thin on documentation if you’re trying to use it outside of UWP.

Skipping this extra layer of somebody else’s code for the flexibility of the fully native approach seemed to be the most sensible way forward at this point.

Full fat native - C++ Node module

Time to roll up our sleeves and learn how to write native Node modules, or C++ addons as they are called in the Node documentation.

This article will not go into detail on the mechanics of native Node modules themselves, as there are already plenty of good tutorials available to get up and running. Here are a few that I found useful:

However, there are a few different ways of doing things also in the native module world. Node handles the plumbing, in that it allows you to just require a compiled module (essentially a DLL on Windows and a dylib on Mac, only with the file extension .node instead) into JavaScript code. But on the C++ side, we have a selection of different libraries to choose from for converting values between JavaScript and C++.

You can either do it by using the JavaScript engine V8’s APIs directly, by using a wrapper called NAN (Native Abstractions for Node.js), or by using the newer N-API wrapper. Most native module projects have been using NAN up until now, but N-API is currently the officially recommended way for new projects. So I went with N-API.

It comes in two forms. Either as a pure C API or as a C++ wrapper called node-addon-api, letting you work at a slightly higher level. We’ll be using node-addon-api.

A good first step into the native module waters, would be getting a Hello World module to compile and run. The GitHub repo node-addon-examples has a very helpful one.

The tool used for compiling native Node modules is called node-gyp. Compared to trying to get NodeRT to build, working with node-gyp is a breeze. It has a lot of intelligent defaults, and has been designed to figure things out on its own depending on what’s already present on your system. It’s also clever enough to download and install the Node header files that it needs, so you don’t have to worry much about dependency management.

Electron specifics

Because each Electron version is tied to a specific version of Node, it’s important to use the same version of Node when building the native module. If that’s not possible, see Electron’s documentation on the topic for the available options.

Webpack woes

If you’re using Webpack, you need to take some extra steps to integrate your new module into your project. If you’re not, feel free to skip this section.

After some trial and error, I found the Webpack loader electron-native-loader, which is specifically designed for integrating native modules into Electron projects. The following modifications to your Webpack configuration will be needed.

1. Add electron-native-loader as a loader for .node files.

module: {
  rules: [
    {
      test: /\.node$/,
      use: "electron-native-loader"
    }
  ]
}

2. Use the CopyWebpackPlugin to copy the compiled .node files from their project build folders to the Webpack output folder.

plugins: [
  new CopyWebpackPlugin(
    os.platform() === "darwin" ?
      [{
        from: "src/native/mac/build/Release/mac.node",
        to: "native/mac.node"
      }] :
      [{
        from: "src/native/win/build/Release/win.node",
        to: "native/win.node"
      }];
  )
]

3. Specify the exact require statements used in the code as externals.

externals: {
  "native/win": 'require("./native/win")',
  "native/mac": 'require("./native/mac")',
}

The author of electron-native-loader also provides the electron-native-plugin and the electron-native-patch-loader, which he recommends using together with electron-native-loader, but for my particular setup, it was easier to just use the above approach.

The real work can begin

Finally, we can move on to focusing on our actual goal, drag and drop. As already mentioned, we only need to worry about drags that are initiated from within our app for the native implementations.

High-level plan

Here’s the outline of what we need to do.

In Electron renderer process

Specify an element as draggable and register for dragstart events.
Handle dragstart and call preventDefault().
Prepare an array with the file paths to drag.
Create a drag image representing the dragged content.
Pass the paths and the drag image to the main process via IPC.

In Electron main process

Require the native module.
Get a reference to the browser window from Electron's getNativeWindowHandle().
Call the native module with the paths array, drag image and window reference.

In native module

Parse arguments coming in from JavaScript.
Prepare the drag payload as required by the OS.
Set up any listeners or delegates required.
Call the OS's native API for initiating a drag operation.
Communicate result of the drag back to Electron app.

The module API

Our native module only needs to expose one function startDrag to JavaScript.

export interface NativeModule {
  startDrag: (
    winHandle: Uint8Array,
    files: string[],
    dragImage?: Uint8Array,
    width?: number,
    height?: number
  ) => NativeDragResult;
}

As we shall see, dragImage, width and height will only be used on Windows, so they are optional. The Uint8Arrays are plain byte arrays which will need some interpretation on the C++ side. The NativeDragResult is just a string used to communicate if a drop happened and what kind of drop it was.

The internals of this function vary depending on the operating system.

Marshalling and interpretation of arrays

But before we get into the specifics of Windows and MacOS respectively, let’s look at some particulars around how we convert (or marshal as the computer scientists like to call it) values across the JavaScript-C++ boundary.

It’s not immediately obvious how to use N-API to turn the arrays into C++ equivalents so I will share the code here. (I ended up mostly using N-API’s plain C API for this as I ran out of patience trying to figure out how to do it with the C++ wrapper.)

To convert a Uint8Array into an unsigned char*:

void ParseUint8Array(Napi::Env env, Napi::Value array)
{
  napi_typedarray_type type;
  size_t length;
  void* data;
  napi_value arrayBuffer;
  size_t byteOffset;

  napi_status s = napi_get_typedarray_info(
    env, array, &type, &length, &data, &arrayBuffer, &byteOffset);

  if (s == napi_ok)
  {
    unsigned char* bytes = (unsigned char*)data;
    // Do something with the bytes...
  }
}

To read a string array into an STL vector of wide strings:

void ParseArray(Napi::Env env, Napi::Value array)
{
  std::vector<std::wstring> wideStrings;

  uint32_t numStrings;
  napi_get_array_length(env, array, &numStrings);

  for (unsigned int i = 0; i < numStrings; ++i)
  {
    napi_value napiValue;
    napi_get_element(env, array, i, &napiValue);

    Napi::String napiString(env, napiValue);
    std::string utf8String = napiString.Utf8Value();

    std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> convert;
    std::wstring wideString = convert.from_bytes(utf8String);

    wideStrings.push_back(wideString);
  }
}

The wide strings are needed to be able to call Windows APIs that take WCHAR strings. On Mac, we don’t need the final conversion step, we can stop once we have the UTF-8 strings.

Worth noting as well is how to convert the winHandle reference from the bytes array of unsigned char produced above to an HWND and NSView* respectively.

Windows

unsigned long handle = *reinterpret_cast<unsigned long*>(bytes);
HWND hwnd = (HWND)handle;

Mac

NSView* view = *reinterpret_cast<NSView**>(bytes);

The Windows module

The time has come to make a deep dive into the ancient caves of desktop operating system APIs. We’ll start with the big bad Windows dragon and its infamous Win32 API.

The Win32 API was launched with a lot of fanfare as far back as 1992, and still forms the backbone of Windows. Sure, Microsoft has added various technologies on top of it over the years, like MFC, .NET, and more recently UWP and WinRT, but Win32 is still the canonical way of interfacing with the Windows operating system from C++.

It has its origins in an era when C was considered a high-level language, and seems to be the result of a disparate team of developers who never talked to each other and who each had their own ideas about how things ought to be done. It is truly an instrument of torture. But since the ancient sages teach us that the path to enlightenment leads through suffering, we shall embrace our fate and soldier on.

A good starting point is Microsoft’s own extensive documentation on drag and drop. While not bad, it always seems to stop just short of providing that final crucial detail that’s needed to get things to work.

The shoulders of giants

So like any self-respecting developer, I started off by googling for an existing solution, and soon came across a 2002 Code Project article with attached sample code, seemingly offering the quickest route to a proof of concept.

The demo project still compiles, but it is based on MFC and ends up making calls to functions such as AfxGetMainWnd(), which just merrily crash when run inside a Node module. However, this project still offered a vital clue to our final solution, namely the preparation of the DROPFILES structure.

Prepare DROPFILES

DROPFILES is the name of a C struct that contains the list of file paths to include in the drag. As a visitor from 2019, you’d be forgiven for thinking that specifying a list of files would involve maybe creating an array of strings or something equally rational and sane, but no, we’re now in the dark ages, and the way this needs be done involves the following black magic:

Add up the number of characters of all the file paths.
Add 1 for a separator between each path and 2 for a terminator at the end.
Allocate memory for the DROPFILES struct itself.
Allocate memory equal to the character count from 2 just past the end of the DROPFILES struct. (But not just any old RAM, it has to be allocated using the GlobalAlloc call rather than the standard malloc.)
Copy the concatenated file paths separated by null characters and ending with two null characters into the dangling memory allocated.
Assign to DROPFILES.pFiles the offset from the starting address of DROPFILES to the beginning of the memory chunk containing the file paths.

With the aid of the STL vector and wstring classes for at least a semblance of modern convenience, this ends up looking a little something like this:

DROPFILES* CreateDropFiles(std::vector<std::wstring>& files)
{
  size_t numChars = 0;
  for (auto path : files)
  {
    numChars += path.length() + 1; // +1 for terminating \0
  }

  // Add 1 extra for the final extra \0
  numChars += 1;

  size_t bufferSize = sizeof(DROPFILES) + (sizeof(wchar_t) * numChars);

  // Allocate memory from the heap
  HGLOBAL hGlobal = GlobalAlloc(GPTR, bufferSize);

  // Point pDrop to this memory
  DROPFILES* pDrop = (DROPFILES*)hGlobal;

  // pFiles is the offset from the beginning of the struct where the
  // file list starts. Yes, it's just a bit of RAM tacked onto the end
  // of the DROPFILES struct.
  pDrop->pFiles = sizeof(DROPFILES);

  // Set the Unicode flag
  pDrop->fWide = TRUE;

  // Copy all the filenames into memory after the end of the DROPFILES struct
  wchar_t* pBuf = (wchar_t*)(LPBYTE(pDrop) + sizeof(DROPFILES));
  for (auto path : files)
  {
    const wchar_t* pPath = path.c_str();
    StringCchCopyW(pBuf, bufferSize, pPath);
    pBuf = 1 + wcschr(pBuf, '\0'); // find the next null char and add one
  }

  return pDrop;
}

Diving deeper

Now, since MFC won’t work, we need to hunt for some code showing how to initiate the drag and drop using only Win32 functions. Time to call the pros.

Raymond Chen is a near-legendary Microsoft developer who’s been part of the Windows team since the early 90s, and apparently still is. He’s been publishing a few blog posts per week for more than 20 years at his blog The Old New Thing, a treasure trove for all things Win32. The pain of having to use Win32 is somewhat lessened by Raymond’s irreverent and light-hearted style, and he’s got some amazing stories to boot.

Regarding drag and drop, he has two article series on the topic:

Once we’ve managed to track down the earlier blog post currently referenced via a broken link from “Dragging a shell object”, featuring the GetUIObjectOfFile function, we realise that there are more ways than one to skin also this particular cat. For Raymond doesn’t bother with a DROPFILES struct at all, instead opting for an approach consisting of this inscrutable bit of code:

HRESULT GetUIObjectOfFile(HWND hwnd, LPCWSTR pszPath, REFIID riid, void **ppv)
{
  *ppv = NULL;
  HRESULT hr;
  LPITEMIDLIST pidl;
  SFGAOF sfgao;
  if (SUCCEEDED(hr = SHParseDisplayName(pszPath, NULL, &pidl, 0, &sfgao))) {
    IShellFolder *psf;
    LPCITEMIDLIST pidlChild;
    if (SUCCEEDED(hr = SHBindToParent(pidl, IID_IShellFolder,
                                      (void**)&psf, &pidlChild))) {
      hr = psf->GetUIObjectOf(hwnd, 1, &pidlChild, riid, NULL, ppv);
      psf->Release();
    }
    CoTaskMemFree(pidl);
  }
  return hr;
}

To summarise, what’s happening here is that the out parameter called ppv gets assigned an object holding a type of reference to a file known as an item ID list, derived from the supplied pszPath parameter. This object can then be used to start the drag in place of one containing DROPFILES. However, since this approach seems likely to have to hit the disk for each path we want to include, we will stick to the DROPFILES approach.

The waters are clearing

Reading on, it becomes clear that the key components of a Win32 drag are:

The DoDragDrop function
The IDataObject interface
The IDropSource interface

To call DoDragDrop, we need an instance of both an IDataObject and an IDropSource. The drop source is a kind of delegate object controlling certain aspects of the drag, and the data object is what contains the payload to drag. This is the object that will hold our lovingly crafted DROPFILES struct.

The IDataObject

In fact, the object masquerading as void **ppv (because why wait for the annual obfuscation contest to get an outlet for your sadistic tendencies) in the parameter list of GetUIObjectOfFile above is in fact an instance of an IDataObject, saving us the hassle of writing one ourselves. But since we want to avoid the overhead of querying the disk for each file path converstion, we need a different way to create the data object.

IDataObject is a specification for a general-purpose container for clipboard and drag data of any kind, and can thus hold many different types of data in different formats. To create an object adhering to the specification, we need to implement methods like SetData and GetData, along with others for querying and enumerating the data types it holds.

Fortunately, it turns out there’s a shortcut here as well, meaning we don’t have to write this entirely from scratch. Some further googling turned up the DragDropVisuals sample project in which we find a DataObject.cpp that recruits a function called SHCreateDataObject for the heavy lifting. This function is intended for creating a data object from a list of item IDs, but apparently you can trick it into returning a general-purpose data object by passing null for most of its parameters.

The data object from the sample project is not yet quite fit for our purposes however. For some reason, its EnumFormatEtc method states that it supports only Unicode text payloads, which is a brazen lie, since the data object created by SHCreateDataObject can store and return any data type. So we need to change EnumFormatEtc to just hand over to the internal data object:

IFACEMETHODIMP CDataObject::EnumFormatEtc(DWORD dwDirection, IEnumFORMATETC **ppEnumFormatEtc)
{
  return _pdtobjShell->EnumFormatEtc(dwDirection, ppEnumFormatEtc);
}

With this modification, we can put DataObject.cpp to use in our own solution.

The IDropSource

To create our initial drop source, we can just copy the one Raymond provides here. We will need to make some tweaks later but this is good enough to get going.

Plan of action

Armed with this knowledge, we can now put together a plan of what startDrag needs to look like inside of the Windows module.

Parse input arguments.
Create an instance of our data object.
Add the list of files and the drag image to the data object.
Create an instance of our drop source.
Call DoDragDrop with the data object and the drop source.

Adding the files to the data object

To add our DROPFILES struct to the data object, we need to cast the following spell:

DROPFILES* pDrop = CreateDropFiles(files);

// Prepare FORMATETC and STGMEDIUM to set up a file drag and drop
FORMATETC format = { CF_HDROP, NULL, DVASPECT_CONTENT, -1, TYMED_HGLOBAL };
STGMEDIUM medium;
medium.tymed = TYMED_HGLOBAL;
medium.hGlobal = pDrop;
medium.pUnkForRelease = NULL;

// Create the IDataObject and give it the data
IDataObject* pDataObj = new CDataObject();
BOOL releaseMem = TRUE;
HRESULT hr = pDataObj->SetData(&format, &medium, releaseMem);

if (!SUCCEEDED(hr))
{
  GlobalFree(pDrop);
}

Here, we need to bring in further cryptic structs in the form of FORMATETC and STGMEDIUM. These are used to tell the data object what kind of data it is we are giving it. The third parameter to SetData specifies whether the data object should release the memory for our added data payload. We set it to true, which means we only have to call GlobalFree for the pDrop in case the SetData call fails.

Adding the drag image to the data object

Unfortunately, few things are straight-forward when working with Win32, and adding the drag image is no different. The pixel data passed through from Electron in the form of an Uint8Array needs some careful massaging to shape it into the particular form the IDataObject requires.

To set the image, we need to convert it into an HBITMAP and add it to a structure called SHDRAGIMAGE which we can then include in our data object through the use of the IDragSourceHelper helper object.

Our earlier N-API-assisted conversion of the dragImage parameter left us with an unsigned char* of bytes representing the image. Each 4-byte sequence in this array is one pixel made up of its R, G, B and A components respectively. Our job is now to turn this sequence of bytes into an HBITMAP.

After barking up various more or less misinformed trees, some further research led me to this example code from the LodePNG project. It decodes a PNG into a byte array and then converts the raw bytes into a BMP, which is pretty much the same format used for an HBITMAP.

The encodeBMP function here is just the ticket. But since it prepares output ready for writing to disk, it contains some extraneous header data, which in our case is specified separately as part of the SHDRAGIMAGE structure. We also need to modify it to expect RGBA rather than just RGB. Thankfully, this was easy due to the foresight of the original developer. Here’s what we end up with to rearrange our pixels to fit the HBITMAP format:

void EncodeBmp(std::vector<unsigned char>& bmp, const unsigned char* image, int w, int h)
{
  // Bytes per pixel used
  int inputChannels = 4;
  int outputChannels = 4;

  int imageRowBytes = outputChannels * w;
  imageRowBytes = imageRowBytes % 4 == 0 ?
    imageRowBytes :
    imageRowBytes + (4 - imageRowBytes % 4); // must be multiple of 4

  for (int y = 0; y < h; y++)
  {
    int c = 0;
    for (int x = 0; x < imageRowBytes; x++)
    {
      if (x < w * outputChannels)
      {
        int inc = c;
        // Convert RGB(A) into BGR(A)
        if (c == 0) inc = 2;
        else if (c == 2) inc = 0;
        bmp.push_back(image[inputChannels * (w * y + x / outputChannels) + inc]);
      }
      else bmp.push_back(0);
      c++;
      if (c >= outputChannels) c = 0;
    }
  }
}

Then we can use the resulting vector to create our SHDRAGIMAGE and add it to the data object:

std::vector<unsigned char> bmp;
EncodeBmp(bmp, pixelData, width, height);

HBITMAP hBmp = CreateBitmap(width, height, 1, 32, &bmp[0]);

// Create drag image
SHDRAGIMAGE dragImage;
dragImage.hbmpDragImage = hBmp;

dragImage.sizeDragImage.cx = (LONG)width;
dragImage.sizeDragImage.cy = (LONG)height;

// Mouse cursor offset
dragImage.ptOffset.x = (LONG)width / 2;
dragImage.ptOffset.y = 10;

dragImage.crColorKey = CLR_NONE;

// Add image to data object with the aid of drag source helper
IDragSourceHelper *pDragSourceHelper;
HRESULT hr = CoCreateInstance(
  CLSID_DragDropHelper,
  NULL,
  CLSCTX_ALL,
  IID_IDragSourceHelper,
  (void**)&pDragSourceHelper);

if (SUCCEEDED(hr))
{
  pDragSourceHelper->InitializeFromBitmap(&dragImage, pDataObj);
  DeleteObject(dragImage.hbmpDragImage);
  pDragSourceHelper->Release();
}

I know…

But pDataObj now includes the drag image.

Calling DoDragDrop

Now all that’s left is to construct the drop source and call DoDragDrop:

CDropSource* pDropSource = new CDropSource(hwnd);
DWORD dwEffect;
DoDragDrop(pDataObj, pDropSource, DROPEFFECT_COPY | DROPEFFECT_MOVE, &dwEffect);

The third parameter indicates which types of drop are allowed, and the fourth is an out value telling us which type of drop was actually performed at the other end. “But wait…”, you say, “does that mean that DoDragDrop is synchronous?” Indeed it is, and that will be our next source of headache.

But with these pieces in place, we are able to call into the native module from our Electron app, and rejoice at the fact that it’s now possible to drag multiple files out and drop them onto any other application!

Frozen vistas

We are calling into DoDragDrop at the moment we detect a drag start. Unfortunately, that leads to our Electron app freezing up completely for the duration of the drag, presumably related to the synchronous nature of said function. No events are delivered whatsoever! Instead they get buffered up and arrive all at once, once the drop has happened. This makes it impossible to highlight drop targets, or make any other updates to the UI in response to drag events. This is clearly not good enough for dragging things within the app.

The Microsoft documentation states that DoDragDrop initiates a drag loop, which calls particular functions on the drop target to notify it of drags entering, leaving etc. For whatever reason, this isn’t working when dragging over an Electron app that itself initiates the drag.

What to do? In order of least effort, these were the three potential workarounds I could think of:

Use IAsyncOperation to initiate an asynchronous drag
Implement our own IDropTarget and override the one installed by Electron
Don’t call into the native module until the drag leaves the Fileside window

IAsyncOperation

The docs mention an IAsyncOperation interface, which sounds like it could be a way forward. On closer inspection, it turns out that the asynchronicity only refers to the process of extracting data from the data object post-drop, and not to the drag itself. Dead end.

Our own drop target

In the Win32 model, an application must implement and register an object conforming to the IDropTarget interface in order to accept drops. Since our native module is technically running inside the Electron process, if we could only switch out whatever drop target has been registered by Electron, we could maybe make sure that the app reacts appropriately to our drag movements?

It turns out that this is actually possible, by calling the Win32 functions RevokeDragDrop followed by RegisterDragDrop passing in our own drop target instance. And by implementing its methods DragEnter, DragOver and DragLeave, we are able to break the impasse! Our app comes back to life and responds to drag events.

There’s only one problem. And it’s a serious one. How can we pass the data received by the drop target through to Electron for delivery as events over in JavaScript land? The ideal solution would be if we could keep hold of Electron’s original drop target, and then call through to it for delivery of our intercepted events, while retaining control of the return values we give to Windows. Alas, no Windows API exists for retrieving an already registered drop target from a process.

Only start native drag at window boundary

That leaves us with only the iffiest of the three workarounds left to try. This involves starting the drag within the application using JavaScript, and then switch over to the native OS version when the mouse crosses the boundary of the application window. If the drag is dropped outside, we cancel the internal drag. If the drag comes back in without a drop, we cancel the external drag and resume the JavaScript drag.

This sounds all good and well in theory, but once I had it all wired up using the HTML5 drag and drop constructs, I could find no reliable way of cancelling the drag on an outside drop, hence the internal drag continued as soon as the mouse cursor moved back into the window. Which made for a pretty broken experience.

Emulate internal drags

To achieve our aim, we need more control over the initiated drag than the drag and drop API gives us. By only registering for mouse events (instead of drag events), we can emulate what the drag and drop API does and thus also decide ourselves when to cancel a drag. The drag and drop API is after all just a convenience abstraction on top of the mouse events.

In our case, the following steps are necessary:

Use ondragstart for the draggable element but call event.preventDefault() to prevent the browser’s drag and drop implementation from kicking in.
Register a window listener for mouseout.
Register document listeners for events mouseenter, mouseleave, mousemove, mouseup, keydown keyup.
Register a handler onDragResult for delivery of the native module’s drag result.
Manually create a DOM node to use as a drag image and set it to position: absolute.
On each mousemove, set the drag image’s left and top properties to correspond to the mouse pointer position.
On each mouseenter and mouseleave, synthesize dragenter and dragleave and dispatch them to the events’ respective target elements.
Handle keydown and keyup events to detect modifier keys being held and update the mouse pointer accordingly.
On mouseout, check if event.relatedTarget is “HTML”, and if so, call the native module’s startDrag. Set a flag that we’re now in an external drag.
Trigger the drop when mouseup is received, if we’re not in an external drag.
On a drag result indicating an external drop, reset all state related to our internal drag.

See this tutorial for more detail about emulating drag and drop.

Synthesizing the drag events might sound complicated but it essentially just involves copying over the properties of the mouse event into an event of a different type:

// type is an event name like "dragenter", "dragleave" etc.
// e is a MouseEvent received through a mouse event handler.
function synthesizeDragEvent(type: string, e: MouseEvent) {
  return new DragEvent(
    type,
    {
      bubbles: true,
      cancelable: true,
      view: window,
      detail: e.detail,
      screenX: e.screenX,
      screenY: e.screenY,
      clientX: e.clientX,
      clientY: e.clientY,
      ctrlKey: e.ctrlKey,
      altKey: e.altKey,
      shiftKey: e.shiftKey,
      metaKey: e.metaKey,
      button: e.button,
      relatedTarget: e.relatedTarget
    };
  );
}

One further complication when emulating drags is that any hover states will get triggered on elements over which the drag passes. One way to work around this, is to put transparent overlays on top of the areas of the application that contain hoverable elements. These overlays are only in place for as long as a drag is in progress. The overlay swallows the mouse events and prevents the hover states from being activated.

Switching over

The most reliable way to detect the mouse exiting the browser window is, according to the wise people of the world wide web, the mouseout event being fired on the window object. If we receive such an event and its relatedTarget property equals “HTML”, then we have left the window. This is our signal to trigger the sequence of events that needs to happen to transition into the native drag:

Hide the element used as a drag image for the internal drag by setting its opacity to 0.
Create a bitmap with a screenshot of the drag image.
Call the native module’s startDrag with the array of dragged paths, the bitmap, and its width and height.

For creating the bitmap, we can use the dom-to-image library and its toPixelData() function. This gives us the required Uint8Array of RGBA pixels that can be passed to the native module.

The drop

Now, if a drop happens outside of the app, we need to communicate that back from our native module, so that we can cancel our internal drag on the JavaScript side. The same applies if the external drag is cancelled by pressing Escape.

On Windows, the drag result can be communicated back to Electron by just returning it from startDrag. Exactly how to calculate it in the native module requires some thought however.

DoDragDrop tells us directly whether the drop was a copy or a move, but we also want to detect whether the external drag moved back in over the application window, in which case we need to cancel it (to prevent another deep freeze) and return a drag result indicating re-entry. The internal drag code can then just pick up where it left off and continue the internal drag.

The code for detecting this is worth having a look at in some detail. It lives in our drop source’s implementation of QueryContinueDrag:

HRESULT CDropSource::QueryContinueDrag(BOOL fEscapePressed, DWORD grfKeyState)
{
  POINT pointerPos;
  if (GetCursorPos(&pointerPos))
  {
    HWND hwndUnderPointer = WindowFromPoint(pointerPos);
    HWND rootHwndUnderPointer = GetAncestor(hwndUnderPointer, GA_ROOT);
    if (rootHwndUnderPointer == mHwnd)
    {
      if (mHasBeenOutside)
      {
        mDidReEnter = true;
        return DRAGDROP_S_CANCEL;
      }
      else
      {
        return S_OK;
      }
    }
    else
    {
      mHasBeenOutside = true;
    }
  }

  if (fEscapePressed) {
    mCancelled = true;
    return DRAGDROP_S_CANCEL;
  }

  if (!(grfKeyState & (MK_LBUTTON | MK_RBUTTON)))
    return DRAGDROP_S_DROP;

  return S_OK;
}

This function is called repeatedly by the OS to give our drop source a say in whether the drag should continue given certain conditions. The mHasBeenOutside boolean is initialised to false and is needed at the beginning of the external drag, since Windows and the browser don’t quite agree on what the boundaries of the window are. The area just outside of it still seems to be considered part of the window by Windows, so we use this boolean to avoid triggering re-entry immediately upon leaving the application window.

Once we’ve been outside and we again detect that the root window under the mouse is our mother HWND, we set another internal boolean mDidReEnter to true and return DRAGDROP_S_CANCEL to inform Windows that we’ve lost our will to live as this particular drag incarnation.

The main startDrag function can then query the drop source for its state after DoDragDrop returns and return the appropriate drag result.

Yay, it works! Kinda

Phew. After all that, we now have working drag and drop on Windows, with a relevant drag image and support for modifier keys.

All is however still not quite perfect. If dragging out of the app, and then back in again without a drop, the internal drag resumes. But if we then try to move back out once more during the same drag, the required mouseout event just doesn’t get delivered, and the drag is trapped. I haven’t been able to figure out why this happens yet, but since this particular interaction is probably quite unlikely during normal use, we can live with it for the time being.

Another lingering glitch shows up if we first make a drag out of our app into Explorer and drop it. Then grab something else from the same Explorer window and drag it back into Fileside. Once we cross the boundary into the app window, the Explorer-provided drag image is replaced by a stretched version of whatever the previous drag image generated by the app was. Just dragging across the boundary once more fixes it, but it’s an unfortunate cosmetic blight, which is yet to find a proper fix.

The Mac module

Accomplishing the same thing on MacOS is an altogether easier feat. Instead of dealing with hairy old monsters from the 1980s, we’re dealing with the Cocoa API, which, despite also having its roots in that glorious decade, is a much more friendly beast, in large part due to Apple’s very different approach to backwards-compatibility. In the Apple universe, APIs get maintained, updated and deprecated in due course, putting some burden on developers to follow along and update their code on the one hand, but makes for a much smoother development experience on the other.

The only slight oddity here is having to deal with Objective-C, but it’s very much in the same family as C and C++ (we can even mix the three freely in an Objective-C source file with the extension .mm). All we really need to know for this exercise is that methods on objects are called like this: [object methodWithArgument: argument], and that there’s a very helpful website out there for dealing with Objective-C’s version of anonymous functions (or closures), known as blocks.

Cocoa’s drag and drop APIs went through an update a few years ago, and the official documentation for the new API does unfortunately leave something to be desired. This tutorial from raywenderlich.com was the most comprehensive I found for the modern API, and despite being written in Swift, is still a great help in explaining the concepts involved.

Plan of action

On the Mac, we don’t have the problem of the app freezing up when initiating a drag, so we can just call the native startDrag immediately on detecting a drag start; no need for all that iffy switching around when going in and out.

Here’s an outline of the steps involved:

Parse input arguments.
Prepare an NSDraggingItem for each file containing its path and an icon.
Create an NSDraggingSource object to control the drag.
Synthesize an NSLeftMouseDragged event containing the current window and mouse position.
Call [NSView beginDraggingSessionWithItems] giving it an array of NSDraggingItems, the dragging source and the synthesized drag event.

Once it’s been given a list of NSDraggingItems, the OS will arrange them in a neat list (which we can also customise via the property draggingFormation of the NSDraggingSession object returned by beginDraggingSessionWithItems) that follows the mouse pointer around. It would of course also be possible to provide a custom drag image here, but the OS is doing an elegant enough job of this with the native file icons, that we don’t really need it.

The NSDraggingItem

The NSDraggingItem takes a file path in the form of an NSURL and an icon representing it as an NSImage.

It’s important that we use the dragging item’s imageComponentProvider with its slightly tricky syntax, and not the simpler setDraggingFrame when setting the image. This allows the OS to optimise the retrieval of the images, so as to not get bogged down when initialising a drag with a large number of files.

Assuming files is a std::vector of std::strings in UTF-8 format, this is how an array of NSDraggingItems is created:

NSMutableArray* dragItems = [[NSMutableArray alloc] init];
for (auto& file : files) {
  NSString* nsFile = [[NSString alloc] initWithUTF8String:file.c_str()];
  NSURL* fileURL = [NSURL fileURLWithPath: nsFile];

  NSImage* icon = [[NSWorkspace sharedWorkspace] iconForFile:nsFile];
  NSSize iconSize = NSMakeSize(32, 32); // according to documentation

  NSArray* (^providerBlock)() = ^NSArray*() {
    NSDraggingImageComponent* comp = [[[NSDraggingImageComponent alloc]
      initWithKey: NSDraggingImageComponentIconKey] retain];

    // The x, y here seem to control the offset from the mouse pointer
    comp.frame = NSMakeRect(0, 0, iconSize.width, iconSize.height);
    comp.contents = icon;
    return @[comp];
  };

  NSDraggingItem* dragItem = [[NSDraggingItem alloc] initWithPasteboardWriter: fileURL];

  // The x, y here determine from what point the images fly in at the beginning
  // of the drag. The size determines the space each DraggingImage has, so can
  // be used to create overlapping icons or spacing between them.
  dragItem.draggingFrame = NSMakeRect(
    mousePos.x, mousePos.y, iconSize.width, iconSize.height);
  dragItem.imageComponentsProvider = providerBlock;
  [dragItems addObject: dragItem];
}

The NSDraggingSource

This is the MacOS equivalent of the Win32 IDropSource object and here we need to implement two methods:

- (NSDragOperation) draggingSession:(NSDraggingSession *)session
  sourceOperationMaskForDraggingContext:(NSDraggingContext)context;
- (BOOL)ignoreModifierKeysForDraggingSession:(NSDraggingSession *)session

The second one will just return NO (Objective-C speak for false), but the first one is of some importance. It allows us to specify which drag operations should be permitted. Here I only arrived at the required combination of flags through a process of trial and error:

- (NSDragOperation) draggingSession:(NSDraggingSession *)session
  sourceOperationMaskForDraggingContext:(NSDraggingContext)context
{
  // This combination of flags gives the behaviour we want, somehow:
  //   - it uses move pointer by default (no plus)
  //   - plus appears when pressing Alt and drop is allowed
  //   - pointer stays unchanged when pressing Cmd and drop is allowed
  //   - pointer stays unchanged when holding Ctrl and drop is not allowed
  //
  // If using NSDragOperationEvery, this is not the case as we then get
  // the plus pointer by default.
  return NSDragOperationCopy |
         NSDragOperationMove |
         NSDragOperationGeneric |
         NSDragOperationMove |
         NSDragOperationDelete;
}

Synthesizing the drag event

This is slightly more involved than synthesizing an event in the browser, here’s what it looks like:

NSEvent* SynthesizeEvent(NSView* view) {
  NSWindow* window = [view window];
  NSPoint position = [window mouseLocationOutsideOfEventStream];
  NSTimeInterval eventTime = [[NSApp currentEvent] timestamp];
  NSEvent* dragEvent = [NSEvent mouseEventWithType: NSLeftMouseDragged
                                          location: position
                                     modifierFlags: NSLeftMouseDraggedMask
                                         timestamp: eventTime
                                      windowNumber: [window windowNumber]
                                           context: nil
                                       eventNumber: 0
                                        clickCount: 1
                                          pressure: 1.0];
  return dragEvent;
}

The view parameter is the application window reference passed across from Electron.

Putting it all together

Then we use the pieces we’ve created and initiate the drag:

  DraggingSource* customSource = [DraggingSource new];
  NSEvent* dragEvent = SynthesizeEvent(view);
  NSDraggingSession* session = [view
    beginDraggingSessionWithItems: dragItems
                            event: dragEvent
                           source: customSource];
  session.draggingFormation = NSDraggingFormationList;

And there we are. Drag and drop on Mac is done.

The pesky plus

An extra hack was needed back in JavaScript land to prevent the mouse pointer from showing a plus by default when hovering over non-droppable areas. Adding an ondragover handler to the root-level element in our app and having it call event.preventDefault() fixes this. It seems to prod it into taking the allowed NSDragOperations from our NSDraggingSource into account.

Not even MacOS is perfect

Even with the hack above, there are occasional flashes of plus icon when dragging something around within the app.

And when dragging something out of the app, the pointer icon sometimes doesn’t immediately update on pressing a modifier key, but requires a slight wiggle of the mouse to update. This feels buggy, however the exact same thing happens when dragging a file from Xcode’s file tree out to Finder, so the bug is probably not ours.

The end

So there we are. Now you know what to do if you really, really, really want working drag and drop from an Electron app on Windows and Mac. In hindsight, forking Electron and modifying its startDrag to support drag and drop properly might actually have been the less time-consuming route, but maintaining a fork of a huge framework like Electron is not something to be taken lightly. At least we don’t have a maintenance burden with our stand-alone solution.

Setting the scene

Drag and drop in the browser

What’s the problem?

Going native

Diet native - Foreign function interface

Full fat native - C++ Node module

Electron specifics

Webpack woes

1. Add electron-native-loader as a loader for .node files.

2. Use the CopyWebpackPlugin to copy the compiled .node files from their project build folders to the Webpack output folder.

3. Specify the exact require statements used in the code as externals.

The real work can begin

High-level plan

In Electron renderer process

In Electron main process

In native module

The module API

Marshalling and interpretation of arrays

Windows

Mac

The Windows module

The shoulders of giants

Prepare DROPFILES

Diving deeper

The waters are clearing

The IDataObject

The IDropSource

Plan of action

Adding the files to the data object

Adding the drag image to the data object

Calling DoDragDrop

Frozen vistas

IAsyncOperation

Our own drop target

Only start native drag at window boundary

Emulate internal drags

Switching over

The drop

Yay, it works! Kinda

The Mac module

Plan of action

The NSDraggingItem

The NSDraggingSource

Synthesizing the drag event

Putting it all together

The pesky plus

Not even MacOS is perfect

The end

Looking for a better file manager?

More from the blog