DsWord is a collection of cross-platform .NET class libraries written in C#, that provides API to create DOCX/DOCM MS Word files from scratch. The library also allows to load, analyze and modify existing Word documents.
DsWord is compatible with .NET Core 2.x/3.x, .NET Standard 2.x, .NET Framework 4.6.1 or higher, and .NET 6 or higher.
DsWord and supporting packages are available on nuget.org:
Package | Description |
---|---|
DS.Documents.Word |
Main package which automatically pulls in the other required infrastructure packages. |
DS.Documents.Word.Layout |
Enables saving Word documents as PDF. |
DS.Documents.Imaging |
Provides image handling. |
DS.Documents.DX.Windows |
Provides access to the native graphics APIs when running on a Windows system. |
A Word document in DsWord is represented by an instance of the GrapeCity.Documents.Word.GcWordDocument class.
The object model of the GcWordDocument class corresponds to the structure of a Word document, with the following properties corresponding to major parts of the document:
Property | Description |
---|---|
The main document story | |
Styles | A collection of document styles to format document content |
ListTemplates | A collection of list templates to format list content in the document |
Settings | Provides options to control view, compatibility and other settings |
Theme | Provides the different formatting options available to a document through a theme |
CustomXMLParts | Provides the collection of CustomXMLPart objects. |
GlossaryDocument | Provides the supplementary document storage which stores the content for future insertion. |
Body is the place where the content elements (representing the actual content of a document) are stored. GcWordDocument.Body represents the main content of the document, but other parts of the document (such as headers/footers, comments, footnotes/endnotes) also have bodies to store their content, the specific body type is indicated by the GrapeCity.Documents.Word.BodyType enumeration, which has the following members:
Member | Description |
---|---|
Main | Body of main document part |
Header | Body of section header |
Footer | Body of section footer |
Comment | Body of comment |
BuildingBlock | Body of building block |
Footnote | Body of footnote |
FootnoteSeparator | Body of footnote separator |
FootnoteContinuationSeparator | Body of footnote continuation separator |
FootnoteContinuationNotice | Body of footnote continuation notice |
Endnote | Body of endnote |
EndnoteSeparator | Body of endnote separator |
EndnoteContinuationSeparator | Body of endnote continuation separator |
EndnoteContinuationNotice | Body of endnote continuation notice |
Unlike other body types, the main body has Sections as the top level content elements. It also contains comments, footnotes and endnotes collections. There are three types of content elements that can be stored in a body:
Content Element Type | Description | Content Elements |
---|---|---|
Block elements | Top level elements |
|
Inline elements | Elements that must be placed inside another elements |
|
Reference elements | Elements that do not have its own content in the body (except for complex fields, see Complex Fields) but are represented by start/end markers. |
|
The following sections explain how to access and work with various content elements of a body.
A range is a sequence of content elements in a body. The body itself is a kind of range that holds all the content elements. In DsWord, the Range class is the main feature providing access to the various content elements in a document.
All content elements have the GetRange() method, using which it is possible to access and modify collections of elements of specific types inside the content element's range, since the Range object has properties returning collections of specific types of objects included in the range. These collections allow to add/insert elements using the Add() and Insert() methods.
A range provides the following two overloads to get new ranges based on it:
Method | Description |
---|---|
GetRange (ContentObject first, ContentObject last) | Gets a range that extends from the 'first' content object to the 'last' |
GetRange(Marker start, Marker end) | Gets a range providing a fine-grained control over the range's bounds, e.g. GetRange(first.End, last.Start). For more information, see DsWord API Reference. |
To clear all content in a range use the Range.Clear() method. Range, being a collection of ContentObject, allows to enumerate the content elements included in it.
Block and inline elements are derived from the ContentObject class which provides access to the start and end position of an element in a document. Also, it allows to get the parent content element and enumerate the element's children.
In addition, all content objects have the Next and Previous properties which allow to enumerate objects of the same content type through the whole body.
The Delete() method of the ContentObject class removes the element itself and all its inner content from the body.
Reference elements, bookmarks, comments, and complex fields, are slightly different from simple ContentObject. This kind of elements do not have a parent content since the element can start and end anywhere. For example, it can start in one section and end in another. Instead, reference elements provide a pair of ContentObjects named ContentMark that define the start and end of the element. The ContentMark has Owner property that points to the ContentRange element. Removing a ContentMark from the body also removes its owner element. The Delete() method on a ContentRange usually removes its ContentMarks only. Complex fields are an exception to this as its actual internal content is also deleted.
Despite the fact that the complex field inherits from ContentRange, it actually is a combination of ContentRange and ContentObject. Bounds of a complex field are defined by special field characters (see the FieldChar class and the associated enum that defines the type of the field character as Begin, Separator or End values). The complex field can contain two ranges, code range and result range, separated by a Separator field character.
The code range usually contains one or several codes (see FieldCode class) that in turn contain instructions on how to calculate the field's result. The result range contains cached result of the instructions. In the current version, DsWord does not yet calculate instructions, so it does not update the result.
As mentioned above, unlike other ContentRange elements, the Delete() method on a complex field removes not only the field characters from the body but the field codes and the result too.
Sections can only be present in the main body, and any document must have at least one section.
Sections allow to change page formatting for the document parts; PageSetup property and headers or footers collections of a section provide the means to do that. Each section can have its own headers or footers and page formatting.
Headers and footers display on each page of the section and they have their own bodies to store their content. There are several types of headers or footers in a section (see HeaderFooterType enum) and each header or footer can be linked to the same type from a previous section, so you do not have to create identical headers or footers for each section.
A run is a contiguous fragment of a body content with uniform formatting. So, a run is the primary means to change character formatting. It is also a container for all other inline elements (excluding simple fields and hyperlinks).
The top elements in the main body are the sections. For other body types, the top elements can be paragraphs, tables and content marks (see ContentRange).
Usually elements with the same type cannot be nested (for example, a Run cannot be nested within another Run). Only SimpleField and Hyperlink can be nested. Also, a cell in a table can contain another table within its own cells.
Styles is the main means allowing to apply formatting to a document's content. DsWord provides 375 built-in styles. There are different style types (see StyleType enumeration). Each type of style can be applied only to the corresponding content type. You can get any built-in type using BuiltInStyleId enumeration.
The StyleCollection class has default styles which can be fetched or set using its GetDefaultStyle(StyleType) or SetDefaultStyle(StyleType, Style) methods. These styles are applied to content that does not have an explicitly specified style. StyleCollection provides the DefaultFont and DefaultParagraphFormat properties which are used by default for the default styles.
Some styles are linked. A linked style is a grouping of a paragraph style and character style which is used in a user interface to allow the same set of formatting properties. For example, if you want to apply Heading 1 paragraph style to a run, you can apply it using Document.Styles[BuiltInStyleId.Heading1].LinkStyle.
DsWord allows to get the actual formatting values of elements. It takes into account the formatting inheritance from default document formatting, base style formatting, applied style formatting, parent content formatting and direct formatting of the element.
DsWord provides 21 built-in list templates to create lists in the document. The formatting of these templates is the same as in Microsoft Word built-in list templates. There is no "list" class in DsWord. To create a list you need to set ListFormat.Template and ListFormat.LevelNumber (for multilevel lists) properties on each paragraph that should be in the list.
The Settings class allows to set properties that apply to the whole document, add custom document properties, control document variables, detect and remove document macros, and change view options.