-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathdesign.txt
More file actions
417 lines (323 loc) · 18.1 KB
/
design.txt
File metadata and controls
417 lines (323 loc) · 18.1 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
Design of ggplot2 for Go
========================
Data Input
----------
Data input is either Slice-Of-Measurements (SOM) of the form
type Measurement struct {
Age int
Weight float64
Height float64
Born time.Time
Gender string
}
func (m Measurement) BMI() float64 {
return m.Weight/(m.Height*m.Height)
}
var data []Measurement
or of the Collection-Of-Slices (COS) form
type Measurement struct {
Age []int
Weight []float64
Height []float64
Born []time.Time
Gender []string
}
func (m Measurement) BMI(i int) float64 {
return m.Weight[i]/(m.Height[i]*m.Height[i])
}
var data Measurement
Both forms are converted to a Data Frame which is basically a generic
representation in the COS form. It resembles a R data frame.
Plot Creation
-------------
Plot Creation is the following process:
1. Split data according to facetting specification.
Happens only on facetted plots, each facett is basically
treated as its own plot and represented as a panel.
Result: A n x m data frames in Domain Units, original field names
2. Prepare Data
▔▔▔▔▔▔▔▔▔▔▔
2a. Some or all fields are mapped to aesthetics.
Mapped fields are renamed to the aesthetic.
Unmapped fields are removed from the data frame.
Result: Data frame in Domain Units, unmapped fields have
been removed, other have the mapped aes name and are
scale transformed
2b. Scales are added for the "known" aesthetics if not
jet present.
The mapped fields are transformed according to the
scale transformation (identitiy, log, sqrt, 1/x, ...)
Pre-Train Scales. Usefull if upcumming stat wants to knows what
the full x-range will be.
3. Statistical transformation
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
3a. It is checked that the statistical transform can be
applied to the given data frame
3b. The data frame is partitioned on any additional discrete
fields.
The stat transform is applied to each partition.
The partitions are joined to produce a result.
3. A statistical transform is applied, typically some kind of
summary statistics like binning, boxplot or smoothing.
Result: A completely new data frame with new field names.
(Or the original input if no stat is requested.)
4. Wiring stat to geom
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
4a. Some (typically new) fields might be mapped to (new) aestetics.
Scale transformations are performed during this mapping.
4b. Rename fields to match expected input from Geom in next step.
Result: A data frame with field names suitable to be rendered
as a specific Geom.
5. Geom constructio TODO: no longer accurate
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
5a. Apply geom specific position adjustments.
5b. Train scales.
5c. Reparametrise Geoms to a set of fundamental geoms.
Result: One or several fundamantal geoms with their data frames
in Domain Units with field names suitable for primitive geoms.
6. Prepare scales
▔▔▔▔▔▔▔▔▔▔▔▔▔
Set up the remaining fields in each scale so that the scales
can be used.
7. Render fundamental geoms
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Uses these scales to render the fundamental geoms produced in
step 5c into Grobs (Graphical Objects).
Result: Structure of Grobs, coordinates/values in Range Units.
8 Render rest of plot
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
8a. Render Guides (Axes and Legends)
8b. Render title, facett boxes.
8c. Determine space needed for titel, label, guides, etc.
From this calculate panel size and positions.
9. Draw layer grobs
▔▔▔▔▔▔▔▔▔▔▔▔▔▔
9a. Apply Coordinate Transformations: Interprete <x,y> pairs in
a Grob eg. as <y,x> (flip coordinates), as <r,ϑ> (polar coordiantes)
or <x,-y> (reversed y) and so on.
Munching (interpolate lines by lots of small segments.) for
non-cartesian coordinates.
9b. Render into panel viewport.
Statistical Transforms
----------------------
Basic idea is dead simple. Take one data frame and construct an other data
frame which summarizes/describes the original one.
Only complication: What to do if the input data frame contains more
slots/fields than the stat can operate on? Example: StatBin needs
"x" and can use optional "weight". What to do if extra field "Gender"
is present in input data frame?
The following may happen:
- Completely ignore the additional fields, pretend they are just
not there.
- Fail. Do not process such data.
- Group (facett) input data on the additional field(s) if the
additional fields are discrete.
Ignore and fail are simple; grouping n the StatBin with extra Gender example:
Split input data frame into two frames Gender=='f' and Gender=='m'
apply Simple StatBin to both and combine both results afterwards to something
like
x count density Gender
5 0 0.0 m
10 6 0.2 m
15 11 0.4 m
20 9 0.3 m
5 2 0.1 f
10 5 0.2 f
15 15 0.5 f
20 0 0.0 f
The fields x and Gender where present in the input to StatBin, the fields
count and density (also ncount and ndensity) are new fields.
Gender is passed through pretty much unchanged while x is aggregated,
here the center of the bins, i.e. none of the resulting x might occur
in the input data frame.
Statistical transforms cannot be chained. (Well not automatically:
If data frame generation and applying a stat is exported the user could
chain several stats manually.)
Aesthetics Mapping
------------------
Mapping of Aesthetics may happen before and after statistical transform.
Example StatBin: x must be mapped before (as StatBin needs x as input)
while mapping the generated count to y can happen only after computing
the stat.
What happens if one of the known real aesthetics is mapped:
- The field in the data frame is renamed to the aesthetics
(e.g. BMI is replaced by y)
- A scale is added for this aesthetics if not present in the plot.
Depending on the type of the data frame field this might be continous
or discrete.
- The scale is pre-trained: TODO: really here? not own phase?
Complications, differences from ggplot2 for R
---------------------------------------------
All of the following do _not_ produce points for the calculated bins:
ggplot(diamonds, aes(x=x)) + stat_bin() + geom_point()
ggplot(diamonds, aes(x=x)) + stat_bin(aes(y=..count..)) + geom_point()
ggplot(diamonds, aes(x=x)) + stat_bin(aes(y=..count..)) + geom_point(count=count)
ggplot(diamonds, aes(x=x)) + stat_bin(aes(y=..count..)) + geom_point(y=..count..)
ggplot(diamonds, aes(x=x)) + stat_bin(aes(y=..count..)) + geom_point(y=count)
ggplot(diamonds, aes(x=x)) + stat_bin(aes(y=..count..)) + geom_point(y=count)
... and so on.
The ggplot2 way of doing this is by use of
ggplot(diamonds, aes(x=x)) + stat_bin(aes(y=..count..), geom="point")
which is pretty straight forward for an interactive tool but does not
fit well into a prepared plot with fixed layers, stats and geoms.
It seem impossible in ggplot for R to plot geom_crossbar to a calculated
stat_boxplot as the calculated middle cannot be wired to the required
y value. The standard way in ggplot2 for R is to add either a stat or
a geom but not both (maybe change the geom in the stat). This won't
work properly here.
[[
Maybe stuff like plot.AddHistogram or plot.AddBoxplot which
adds suitable stat and geom in one step might be nice
convenience functions, maybe in a subpackage, once...
]]
Position Adjustments
--------------------
All four adjustments (dodge, jitter, stack and fill) work some geoms
only in ggplot2 for R. Boxplots can use identity and dodge. Stacking (and
filling) works properly only for geom_bar. The ggplot for R code is a bit
shaky here: It works hardwired on ymin and ymax so that stacking crossbars
looses the the crossbar at the y value as this one is not moved...
It kinda works for ggplot2 for R as the normal way is to add either a
stat _or_ a geom, but not both.
Conclusion: Original ggplot2 for R solution is not practical here.
Solution: Make position adjustment a property of the individial
geoms. Some geoms might not allow and position adjustments at all
or just a subset.
Discrete Position Scales
------------------------
Discrete position scales provide a major obstacle in geom construction:
Assume a discrete string variable stored like this
"foo" 3.0
"bar" 4.0
"waz" 7.0
and try to construct a boxplot (or bars) with this string variable
mapped to x-scale. How to set up, compute, transmit and draw the
with of the boxes, especially with a dodging position adjustment?
Idea:
- Nothing is really changed during geom construction or rendering.
- Geom construction produces continuous fields in the data frames,
e.g. a dodged boxplot might produce two boxes for "foo" positioned
at 2.7 and 3.3 each of width 0.2
- When these continuous fields are drawn on a discrete scale any
value x will be decomposed into x == xi + dx with xi an integer
and dx in (-0.5,+0.5). Mapping x (e.g. == 3.2) to the continuous screen
coordinates is a two step process:
a) look up xi (== 3) in the discrete scale levels: 3==foo --> 1
b) add dx, thus producing 1.2
and map to natural units.
Geoms
-----
There are aa handfull of fundamental geoms, these correspond directly
to grobs. Other geoms might be simple reparametrizations of a fundamental
grob: E.g. GeomTile, described by <x,y,width,height,fill,...> is just
a different parametrization of a GeomRect of <xmin,ymin,xmax,ymax,fill...>.
Other geoms can be represented by a set of other geoms, e.g. GeomBoxplot
consists of GeomRects, GeomLines and GeomPoints.
Constructing geoms performs the following parts:
- Adjust positions (dodge, stack, ...).
- Compute "bounding boxes", e.g. a bar at x=1 might reach from.
0.25 to 1.75.
- Train scales with these bounding boxes.
- Emit fundamental geoms.
Grobs, Graphical Objects
------------------------
Grobs are the primitive building blocks for graphics
- Points. <x,y, style, size, color, alpha>
- Text. <x,y, text, color, size, v-/h-align, font, alpha>
- Lines. <x0,y0,x1,y1,...,xn,yn, style, color, size/width>
- Polygon <x0,y0,x1,y1,...,xn,yn, fill, alpha>
- Rectangle <xmin,ymin, xmax,ymax, style, color, width, fill, alpha>
- Container <xmin,ymin, xmax,ymax>
All grob coordinates are in the interval [0,1]. Container group grobs
and provide a convenient way to reparam different grobs of e.g. one geom.
TODO: Rectangle Needed? Achievable through Grob-Reparametrisation of
Polygon but may be convenient...
Boxplot
-------
Maybe the most complex stuff. If this works everythings should be fine.
Prerequisite:
- One discrete field is mapped to x.
- One field is mapped to y.
- Other fields may be mapped to color, fill, linetype
Statistical transform:
- Partition input data by color, fill and linetype.
- For each partition:
* Group on the discrete x values
* For each group compute min, low, med, high, max
(and the outliers) based on the set of ys.
* Return DF with (x, min,q1,median,q3,max)
- Combine partitions to single DF which might look like
(x, min,q1,median,q3,max, color, linetype)
Geom construction:
- Wire generated stat fields to geom input fields, e.g.
rename q1-->low, median-->mid, q3-->high
- For same-x entries in the DF: Apply position adjustments.
The only sane here is dodging, but nevertheless, why not
stack them....
- While building the geom: Train the scales, especially
the y scale.
- Produce a set of "Basic-Geoms" which biject to grobs and
do not need a reparametrisation: Produce rectangle,
vertical lines, horizontal median and outlier dots.
- Outlier dots could be read from the original DF?
Grob construction:
- All scales should be trained now, prepare projection
functions.
- Use projection functions to turn values in the different
geoms to real grobs in viewport coordinates.
Example of Plot Construction
----------------------------
Step Example A Example B Example C Example D
°°°° °°°°°°°°° °°°°°°°°° °°°°°°°°° °°°°°°°°°
x=Age, y=BMI, fill=Gender x=Age, fill=Gender x=Age, fill=Gender x=Age, y=BMI, fill=Gender, linestyle=Smoker
Stat Identity Stat Bin Stat Bin Stat Boxplot
Geom Point Geom Bar y=count Geom Bar y=density Geom Boxplot
Position Identity Position Stack Position Fill Position Dodge
LogScale on x
Prepare Data rename field rename field rename field rename field
°°°°°°°°°°°° Age->x, BMI->y, Gender->fill Age->x, Gender->fill Age->x, Gender->fill Age->x, BMI->y, Gender->fill, Smoker->linestyle
keep only the field keep only the field keep only the field keep only the field
x, y, fill x, fill x, fill x, y, fill, linestyle
set x <- log(x)
Prepare Scale (add Scale x), pretrain add Scale x, pretrain add Scale x, pretrain add Scale x, pretrain
°°°°°°°°°°°°° add Scale y, pretrain add Scale y, pretrain
add Scale fill, pretrain add Scale fill, pretrain add Scale fill, pretrain add Scale fill, pretrain
add Scale linestyle, pretrain
Stat Trans. noop partition on fill partition on fill partition on fill
°°°°°°°°°°° per fill: per fill: per fill
binify (x,count,density) binify (x,count,density) partition on linestyle
combine to: combine to: per linestyle
(x,count,density,fill) (x,count,density,fill) boxify (x,min,q1,med,q3,max)
combine to:
(x,min,q1,med,q3,max, linestyle)
combine
(x,min,q1,med,q3,max, linestyle, fill)
Rewire noop count->y, drop density density->y, drop count q1->low, q3->high
°°°°°° add Scale y add Scale y (add Scale y)
Geom Constr forach row in data: foreach rows foreach rows foreach rows
°°°°°°°°°°° create one point (x,y,fill) create bars (x,0,y,fill) create bars (x,0,y,fill) create box (x,width,min,...,max,ls,fill)
if same x exists: if same x exists: if same x exists:
raise to (x,a,a+y,fill) raise to (x,a,a+y,fill) move to (x+a, width/b,min,...,max,ls,fill)
train x,y,fill train x, y, fill foreach unique x: train x,y,fill,ls
rescale y-vals
reset y-scale and retrain
The oxboys example on grouping
------------------------------
Original DF with fields:
- Subject
- Height
- Age
- Occasion
Discrete Scales mapped to shape and linetype
--------------------------------------------
ggplot2 limits number of shapes and linetypes to a handful (otherwise the plot
becomes to complicated and unreadable). This behaviour is okay for an
interactive tool but might be frustating in code.
But even if we do not clip the number of shapes or linestyles there is just
a limited number: What to do if we run out of shapes? Wrap around? Most
probably not as this is ambiguous. Add color if available?
Themes
------
Styling of plots is done by specifiying various properties of individual
plot elements. The plot elements are grouped hierarchically and settings
are inherited from above (a bit like CSS).